Home Hygiene Estimation of regression equation parameters. Estimation of Linear Regression Parameters

Estimation of regression equation parameters. Estimation of Linear Regression Parameters

The regression equation is always supplemented with an indicator of the closeness of the connection. Using linear regression such an indicator is the linear correlation coefficient r yt. There are different modifications of the formula linear coefficient correlations.

It should be borne in mind that the value of the linear correlation coefficient assesses the closeness of the connection between the characteristics under consideration in its linear form. Therefore closeness absolute value linear correlation coefficient to zero does not mean that there is no connection between the characteristics.

To assess the quality of selection linear function the square of the linear correlation coefficient r yt 2, called the coefficient of determination, is calculated. The coefficient of determination characterizes the proportion of the variance of the effective characteristic at t explained by regression in the total variance of the effective characteristic.

The nonlinear regression equation, as in linear dependence, is supplemented by a correlation indicator, namely the correlation index R.

A parabola of the second order, like a polynomial of more high order, when linearized, takes the form of the equation multiple regression. If nonlinear relative to the explained variable equation regression during linearization takes the form of a linear equation of paired regression, then to assess the closeness of the relationship, a linear correlation coefficient can be used, the value of which in this case will coincide with the correlation index.

The situation is different when transformations of the equation into linear form involve a dependent variable. In this case, the linear correlation coefficient based on the transformed feature values ​​gives only an approximate estimate of the closeness of the relationship and does not numerically coincide with the correlation index. Yes, for power function

after passing to the logarithmically linear equation

lny = lna + blnx

a linear correlation coefficient can be found not for the actual values ​​of the variables x and y, but for their logarithms, that is, r lnylnx. Accordingly, the square of its value will characterize the ratio of the factor sum of squared deviations to the total, but not for y, but for its logarithms:

Meanwhile, when calculating the correlation index, the sums of squared deviations of the characteristic y are used, and not their logarithms. For this purpose, the theoretical values ​​of the resulting characteristic are determined, that is, as the antilogarithm of the value calculated by the equation and the residual sum of squares as.

The denominator of the calculation R 2 yx involves the total sum of squared deviations of the actual values ​​y from their average value, and the denominator r 2 lnxlny participates in the calculation. The numerators and denominators of the indicators under consideration differ accordingly:

  • - in the correlation index and
  • - in the correlation coefficient.

Due to the similarity of the results and the simplicity of calculations using computer programs, the linear correlation coefficient is widely used to characterize the closeness of the connection for nonlinear functions.

Despite the closeness of the values ​​of R and r or R and r in nonlinear functions with transformation of the value of the characteristic y, it should be remembered that if, with a linear dependence of the characteristics, the same correlation coefficient characterizes the regression, it should be remembered that if, with a linear dependence of the characteristics, one and the same correlation coefficient characterizes regression both and, since, then with a curvilinear dependence for the function y=j(x) is not equal for regression x=f(y).

Since the calculation of the correlation index uses the ratio of factor and total amount squared deviations, then has the same meaning as the coefficient of determination. In special studies, the value for nonlinear relationships is called the determination index.

The assessment of the significance of the correlation index is carried out in the same way as the assessment of the reliability of the correlation coefficient.

The correlation index is used to test the significance of the overall nonlinear regression equation using the Fisher F test.

The value m characterizes the number of degrees of freedom for the factor sum of squares, and (n - m - 1) - the number of degrees of freedom for the residual sum of squares.

For a power function m = 1 and the formula of the F-criterion takes the same form as for a linear dependence:

For a parabola of the second degree

y = a 0 + a 1 x + a 2 x 2 +em = 2

The F-criterion can also be calculated in the table analysis of variance regression results, as shown for the linear function.

The index of determination can be compared with the coefficient of determination to justify the possibility of using a linear function. The greater the curvature of the regression line, the less the determination coefficient is the determination index. The similarity of these indicators means that there is no need to complicate the form of the regression equation and a linear function can be used.

In practice, if the difference between the determination index and the coefficient of determination does not exceed 0.1, then the assumption of a linear form of the relationship is considered justified.

If t fact >t table, then the differences between the considered correlation indicators are significant and replacing nonlinear regression with a linear function equation is impossible. Practically, if the value t< 2, то различия между R yx и r yx несущественны, и, следовательно, возможно применение линейной регрессии, даже если есть предположения о некоторой нелинейности рассматриваемых соотношений признаков фактора и результата.

Correlation analysis.

Paired Regression Equation.

Using the graphical method.

This method is used to visually depict the form of connection between the studied economic indicators. To do this, a graph is drawn in a rectangular coordinate system, the individual values ​​of the resultant characteristic Y are plotted along the ordinate axis, and the individual values ​​of the factor characteristic X are plotted along the abscissa axis.

The set of points of the resultant and factor characteristics is called correlation field.

Based on the correlation field, a hypothesis can be put forward (for population) that the relationship between all possible values ​​of X and Y is linear.

The linear regression equation is y = bx + a + ε

Here ε is a random error (deviation, disturbance).

Reasons for the existence of a random error:

1. Failure to include significant explanatory variables in the regression model;

2. Aggregation of variables. For example, the total consumption function is an attempt general expression aggregate of individual spending decisions. This is only an approximation of individual relations that have different parameters.

3. Incorrect description of the model structure;

4. Incorrect functional specification;

5. Measurement errors.

Since deviations ε i for each specific observation i are random and their values ​​in the sample are unknown, then:

1) from observations x i and y i only estimates of parameters α and β can be obtained

2) Estimates of parameters α and β regression model are respectively the values ​​of a and b, which are random in nature, because correspond to a random sample;

Then the estimating regression equation (constructed from sample data) will have the form y = bx + a + ε, where e i are the observed values ​​(estimates) of the errors ε i , and a and b are, respectively, estimates of the parameters α and β of the regression model that should be found.

To estimate the parameters α and β - the least squares method (least squares method) is used. Method least squares gives the best (consistent, efficient and unbiased) estimates of the parameters of the regression equation.

But only if certain premises are met regarding the random term (ε) and the independent variable (x).

Formally, the OLS criterion can be written as follows:

S = ∑(y i - y * i) 2 → min

System of normal equations.

a n + b∑x = ∑y

a∑x + b∑x 2 = ∑y x

For our data, the system of equations has the form

15a + 186.4 b = 17.01

186.4 a + 2360.9 b = 208.25

From the first equation we express A and substitute into the second equation:

We obtain empirical regression coefficients: b = -0.07024, a = 2.0069

Regression equation (empirical regression equation):

y = -0.07024 x + 2.0069

Empirical regression coefficients a And b are only estimates of theoretical coefficients β i, and the equation itself reflects only the general trend in the behavior of the variables under consideration.

To calculate the regression parameters, we will build a calculation table (Table 1)

1. Regression equation parameters.

Sample means.

Sample variances:

Standard deviation

1.1. Correlation coefficient

Covariance.

We calculate the indicator of connection closeness. This indicator is the sample linear correlation coefficient, which is calculated by the formula:

The linear correlation coefficient takes values ​​from –1 to +1.

Connections between characteristics can be weak and strong (close). Their criteria are assessed on the Chaddock scale:

0.1 < r xy < 0.3: слабая;

0.3 < r xy < 0.5: умеренная;

0.5 < r xy < 0.7: заметная;

0.7 < r xy < 0.9: высокая;

0.9 < r xy < 1: весьма высокая;

In our example, the relationship between trait Y and factor X is high and inverse.

In addition, the linear pair correlation coefficient can be determined through the regression coefficient b:

1.2. Regression equation(estimation of regression equation).

The linear regression equation is y = -0.0702 x + 2.01

The coefficients of a linear regression equation can be given economic meaning.

The regression coefficient b = -0.0702 shows the average change in the effective indicator (in units of measurement y) with an increase or decrease in the value of factor x per unit of its measurement. In this example, with an increase of 1 unit, y decreases by -0.0702 on average.

The coefficient a = 2.01 formally shows the predicted level of y, but only if x = 0 is close to the sample values.

But if x=0 is far from the sample values ​​of x, then a literal interpretation may lead to incorrect results, and even if the regression line describes the observed sample values ​​fairly accurately, there is no guarantee that this will also be the case when extrapolating left or right.

By substituting the appropriate x values ​​into the regression equation, we can determine the aligned (predicted) values ​​of the performance indicator y(x) for each observation.

The relationship between y and x determines the sign of the regression coefficient b (if > 0 - direct relationship, otherwise - inverse). In our example, the connection is reverse.

1.3. Elasticity coefficient.

It is not advisable to use regression coefficients (in example b) to directly assess the influence of factors on a resultant characteristic if there is a difference in the units of measurement of the resultant indicator y and the factor characteristic x.

For these purposes, elasticity coefficients and beta coefficients are calculated.

The average elasticity coefficient E shows by what percentage on average the result will change in the aggregate at from its average value when the factor changes x by 1% of its average value.

The elasticity coefficient is found by the formula:

The elasticity coefficient is less than 1. Therefore, if X changes by 1%, Y will change by less than 1%. In other words, the influence of X on Y is not significant.

Beta coefficient

Beta coefficient shows by what part of the value of its standard deviation the average value of the resulting characteristic will change when the factor characteristic changes by the value of its standard deviation with the value of the remaining independent variables fixed at a constant level:

Those. an increase in x by the standard deviation S x will lead to a decrease in the average value of Y by 0.82 standard deviations S y .

1.4. Approximation error.

Let us evaluate the quality of the regression equation using the error of absolute approximation. Average approximation error - average deviation of calculated values ​​from actual ones:

An approximation error within 5%-7% indicates a good fit of the regression equation to the original data.

Since the error is less than 7%, this equation can be used as regression.

Linear regression comes down to finding an equation of the form

The first expression allows for given factor values x calculate the theoretical values ​​of the resulting characteristic by substituting the actual values ​​of the factor into it x. In the graph, the theoretical values ​​lie on a straight line, which represents the regression line.

The construction of linear regression comes down to estimating its parameters - A And b. The classical approach to estimating linear regression parameters is based on least squares method (LSM).

To find the minimum, it is necessary to calculate the partial derivatives of the sum (4) for each of the parameters - A And b- and equate them to zero.

(5)

Let's transform, we get system of normal equations:

(6)

In this system n- sample size, amounts are easily calculated from the original data. We solve the system with respect to A And b, we get:

(7)

. (8)

Expression (7) can be written in another form:

(9)

Where trait covariance, factor dispersion x.

Parameter b called regression coefficient. Its value shows the average change in the result with a change in the factor by one unit. The possibility of a clear economic interpretation of the regression coefficient has made linear equation regression is quite common in econometric studies.

Formally a- meaning y at x=0. If x does not and cannot have a zero value, then this interpretation of the free term a doesn't make sense. Parameter a may have no economic content. Attempts to interpret it economically can lead to absurdity, especially when a< 0. Интерпретировать можно лишь знак при параметре a. If a> 0, then the relative change in the result occurs more slowly than the change in the factor. Let's compare these relative changes:

< при > 0, > 0

Sometimes a linear pairwise regression equation is written for deviations from the mean:

Where , . In this case, the free term is equal to zero, which is reflected in expression (10). This fact follows from geometric considerations: the same straight line (3) corresponds to the regression equation, but when estimating regression in deviations, the origin of coordinates moves to the point with coordinates . In this case, in expression (8) both sums will be equal to zero, which will entail the equality of the free term to zero.

Let us consider, as an example, for a group of enterprises producing one type of product, the cost function



Table 1.

Product output thousand units() Production costs, million rubles()
31,1
67,9
141,6
104,7
178,4
104,7
141,6
Total: 22 770,0

The system of normal equations will look like:

Solving it, we get a= -5.79, b=36.84.

The regression equation is:

Substituting the values ​​into the equation X, let's find the theoretical values y(last column of the table).

Magnitude a makes no economic sense. If the variables x And y expressed in terms of deviations from average levels, then the regression line on the graph will pass through the origin of coordinates. The regression coefficient estimate will not change:

, Where , .

As another example, consider the consumption function of the form:

,

where C is consumption, y-income, K,L- options. This linear regression equation is usually used in conjunction with the balance sheet equation:

,

Where I– size of investment, r- savings.

For simplicity, assume that income is spent on consumption and investment. Thus, the system of equations is considered:

The presence of balance sheet equality imposes restrictions on the value of the regression coefficient, which cannot be greater than one, i.e. .

Let's assume that the consumption function is:

.

The regression coefficient characterizes the propensity to consume. It shows that out of every thousand rubles of income, an average of 650 rubles is spent on consumption, and 350 rubles. invested. If we calculate the regression of investment size on income, i.e. , then the regression equation will be . This equation need not be defined, since it is derived from the consumption function. The regression coefficients of these two equations are related by the equality:

If the regression coefficient turns out to be greater than one, then , and not only income, but also savings are spent on consumption.



The regression coefficient in the consumption function is used to calculate the multiplier:

Here m≈2.86, so the additional investment is 1 thousand rubles. on long term will lead, other things being equal, to an additional income of 2.86 thousand rubles.

In linear regression, the linear correlation coefficient acts as an indicator of the closeness of the connection r:

Its values ​​are within the boundaries: . If b> 0, then when b< 0 . According to the example, this means a very close dependence of production costs on the volume of output.

To assess the quality of fitting a linear function, calculate coefficient of determination as the square of the linear correlation coefficient r 2. It characterizes the share of variance of the resulting characteristic y explained by regression in the total variance of the resulting trait:

The value characterizes the share of variance y, caused by the influence of other factors not taken into account in the model.

In the example. The regression equation explains 98.2% of the variance, and other factors account for 1.8%, this is the residual variance.

Preconditions of OLS (Gauss-Markov conditions)

As mentioned above, the connection between y And x in pairwise regression is not functional, but correlational. Therefore, parameter estimates a And b are random variables, the properties of which significantly depend on the properties of the random component ε. To obtain the best results using least squares, the following prerequisites regarding random deviation (Gauss–Markov conditions) must be met:

1 0 . Expected value random deviation is zero for all observations: .

20 . The variance of random deviations is constant: .

The feasibility of this prerequisite is called homoscedasticity(constancy of deviation variance). The impossibility of this premise is called heteroscedasticity(inconstancy of deviation variance)

thirty . Random deviations ε i And ε j are independent of each other for:

The feasibility of this condition is called absence of autocorrelation.

4 0 . The random variance must be independent of the explanatory variables.

Typically, this condition is satisfied automatically if the explanatory variables in a given model are not random. In addition, the feasibility of this prerequisite for econometric models is not as critical as compared to the first three.

If the specified prerequisites are met, then Gauss's theorem-Markova: Estimates (7) and (8) obtained using OLS have the smallest variance in the class of all linear unbiased estimates .

Thus, if the Gauss-Markov conditions are met, estimates (7) and (8) are not only unbiased estimates of regression coefficients, but also the most effective, i.e. have the smallest dispersion compared to any other estimates of these parameters that are linear with respect to the values y i.

It is the understanding of the importance of Gauss-Markov conditions that distinguishes a competent researcher using regression analysis from an incompetent one. If these conditions are not met, the researcher must be aware of this. If corrective action is possible, then the analyst should be able to take it. If the situation cannot be corrected, the researcher must be able to assess how seriously this might affect the results.

To predict using a regression equation, you need to calculate the regression coefficients and equations. And here there is another problem affecting the accuracy of forecasting. It lies in the fact that usually not everyone possible values variables X and Y, i.e. the general population of the joint distribution in forecasting problems is not known, only a sample from this general population is known. As a result, when forecasting, in addition to the random component, another source of errors arises - errors caused by incomplete correspondence of the sample to the general population and the resulting errors in determining the coefficients of the regression equation.

In other words, due to the fact that the population is unknown, exact values coefficients and regression equations cannot be determined. Using a sample from this unknown population, one can only obtain estimates of the true coefficients and.

In order for the prediction errors as a result of such a replacement to be minimal, the assessment must be carried out using a method that guarantees the unbiased and efficient values ​​obtained. The method provides unbiased estimates if, when repeated several times with new samples from the same population, the condition and is satisfied. The method provides effective estimates if, when repeated several times with new samples from the same population, the minimum dispersion of the coefficients a and b is ensured, i.e. conditions and are met.

In probability theory, a theorem has been proven according to which the efficiency and unbiased estimates of the coefficients of the linear regression equation based on sample data is ensured by applying the least squares method.

The essence of the least squares method is as follows. For each sample point, an equation of the form is written . Then the error between the calculated and actual values ​​is found. Solution of the optimization problem of finding such values ​​and that provide the minimum sum of squared errors for all n points, i.e. solution to the search problem , gives unbiased and efficient estimates of the coefficients and . For the case of paired linear regression, this solution has the form:

It should be noted that the unbiased and effective estimates of the true values ​​of regression coefficients for the general population obtained in this way from a sample do not at all guarantee against errors when applied once. The guarantee is that, as a result of repeated repetition of this operation with other samples from the same population, a smaller amount of errors is guaranteed compared to any other method and the spread of these errors will be minimal.

The obtained coefficients of the regression equation determine the position of the regression line; it is the main axis of the cloud formed by the points of the original sample. Both coefficients have a very definite meaning. The coefficient shows the value at , but in many cases it does not make sense; in addition, it often does not make sense either; therefore, the given interpretation of the coefficient must be used carefully. A more universal interpretation of meaning is as follows. If , then the relative change in the independent variable (percentage change) is always less than the relative change in the dependent variable.

The coefficient shows how many units the dependent variable will change when the independent variable changes by one unit. The coefficient is often called the regression coefficient, emphasizing that it is more important than . In particular, if instead of the values ​​of the dependent and independent variables we take their deviations from their average values, then the regression equation is transformed to the form . In other words, in the transformed coordinate system, any regression line passes through the origin of coordinates (Fig. 13) and there is no coefficient.

Figure 13. Position of the regression dependence in the transformed coordinate system.

The parameters of the regression equation tell us how the dependent and independent variables are related to each other, but do not tell us anything about the degree of closeness of the relationship, i.e. show the position of the main axis of the data cloud, but does not say anything about the degree of tightness of the connection (how narrow or wide the cloud is).

For the territories of the region, data for 200X is provided.

Region number Average per capita living wage per day of one able-bodied person, rub., x Average daily wage, rub., y
1 78 133
2 82 148
3 87 134
4 79 154
5 89 162
6 106 195
7 67 139
8 88 158
9 73 152
10 87 162
11 76 159
12 115 173

Exercise:

1. Construct a correlation field and formulate a hypothesis about the form of the connection.

2. Calculate the parameters of the linear regression equation

4. Using the average (general) elasticity coefficient, give a comparative assessment of the strength of the relationship between the factor and the result.

7. Calculate the predicted value of the result if the predicted value of the factor increases by 10% from its average level. Determine the forecast confidence interval for the significance level.

Solution:

Let's decide this task using Excel.

1. By comparing the available data x and y, for example, ranking them in increasing order of factor x, one can observe the presence of a direct relationship between the characteristics, when an increase in the average per capita subsistence level increases the average daily wage. Based on this, we can make the assumption that the relationship between the characteristics is direct and can be described by a straight line equation. The same conclusion is confirmed based on graphical analysis.

To build a correlation field, you can use Excel PPP. Enter the initial data in sequence: first x, then y.

Select the area of ​​cells that contains data.

Then choose: Insert / Scatter Plot / Scatter with Markers as shown in Figure 1.

Figure 1 Construction of the correlation field

Analysis of the correlation field shows the presence of a close to rectilinear dependence, since the points are located almost in a straight line.

2. To calculate the parameters of the linear regression equation
Let's use the built-in statistical function LINEST.

For this:

1) Open an existing file containing the analyzed data;
2) Select a 5x2 area of ​​empty cells (5 rows, 2 columns) to display the results of regression statistics.
3) Activate Function Wizard: in the main menu select Formulas / Insert Function.
4) In the window Category you are taking Statistical, in the function window - LINEST. Click the button OK as shown in Figure 2;

Figure 2 Function Wizard Dialog Box

5) Fill in the function arguments:

Known values ​​for

Known values ​​of x

Constant - boolean value, which indicates the presence or absence of a free term in the equation; if Constant = 1, then the free term is calculated in the usual way, if Constant = 0, then the free term is 0;

Statistics- a logical value that indicates whether to display additional information on regression analysis or not. If Statistics = 1, then Additional Information is displayed, if Statistics = 0, then only estimates of the equation parameters are displayed.

Click the button OK;

Figure 3 LINEST Function Arguments Dialog Box

6) The first element of the final table will appear in the upper left cell of the selected area. To open the entire table, press the key , and then to the key combination ++ .

Additional regression statistics will be output in the order shown in the following diagram:

Coefficient value b Coefficient a value
Standard error b Standard error a
Standard error y
F-statistic
Regression sum of squares

Figure 4 Result of calculating the LINEST function

We got the regression level:

We conclude: With an increase in the average per capita subsistence level by 1 rub. the average daily wage increases by an average of 0.92 rubles.

Means 52% variation wages(y) is explained by the variation of factor x - the average per capita subsistence level, and 48% - by the action of other factors not included in the model.

Using the calculated coefficient of determination, the correlation coefficient can be calculated: .

The connection is assessed as close.

4. Using the average (general) elasticity coefficient, we determine the strength of the factor’s influence on the result.

For a straight line equation, we determine the average (total) elasticity coefficient using the formula:

We will find the average values ​​by selecting the area of ​​cells with x values ​​and selecting Formulas / AutoSum / Average, and we will do the same with the values ​​of y.

Figure 5 Calculation of average function values ​​and argument

Thus, if the average per capita cost of living changes by 1% from its average value, the average daily wage will change by an average of 0.51%.

Using a data analysis tool Regression available:
- results of regression statistics,
- results of analysis of variance,
- results confidence intervals,
- residuals and regression line fitting graphs,
- residuals and normal probability.

The procedure is as follows:

1) check access to Analysis package. In the main menu, select: File/Options/Add-ons.

2) In the dropdown list Control select item Excel add-ins and press the button Go.

3) In the window Add-ons check the box Analysis package and then click the button OK.

If Analysis package not in the field list Available add-ons, press the button Review to perform a search.

If you receive a message indicating that the analysis package is not installed on your computer, click Yes to install it.

4) In the main menu, select: Data / Data Analysis / Analysis Tools / Regression and then click the button OK.

5) Fill out the data input and output parameters dialog box:

Input interval Y- range containing data of the resultant attribute;

Input interval X- range containing data of the factor characteristic;

Tags- a flag that indicates whether the first line contains column names or not;

Constant - zero- a flag indicating the presence or absence of a free term in the equation;

Output interval- it is enough to indicate the upper left cell of the future range;

6) New worksheet - you can specify an arbitrary name for the new sheet.

Then click the button OK.

Figure 6 Dialog box for entering parameters for the Regression tool

The results of the regression analysis for the problem data are presented in Figure 7.

Figure 7 Result of using the regression tool

5. Let's evaluate using average error approximation quality of equations. Let's use the results of the regression analysis presented in Figure 8.

Figure 8 Result of using the regression tool “Withdrawal of remainder”

Let's create a new table as shown in Figure 9. In column C we calculate relative error approximation according to the formula:

Figure 9 Calculation of average approximation error

The average approximation error is calculated using the formula:

The quality of the constructed model is assessed as good, since it does not exceed 8 - 10%.

6. From table c regression statistics(Figure 4) we write down the actual value of Fisher’s F-test:

Because the at a 5% significance level, then we can conclude that the regression equation is significant (the relationship has been proven).

8. Evaluation statistical significance We will carry out regression parameters using Student’s t-statistics and by calculating the confidence interval of each indicator.

We put forward the hypothesis H 0 about a statistically insignificant difference between the indicators and zero:

.

for the number of degrees of freedom

Figure 7 has the actual t-statistic values:

The t-test for the correlation coefficient can be calculated in two ways:

Method I:

Where - random error of the correlation coefficient.

We will take the data for calculation from the table in Figure 7.

Method II:

The actual t-statistic values ​​exceed the table values:

Therefore, the hypothesis H 0 is rejected, that is, the regression parameters and the correlation coefficient do not differ from zero by chance, but are statistically significant.

The confidence interval for parameter a is defined as

For parameter a, the 95% limits as shown in Figure 7 were:

The confidence interval for the regression coefficient is defined as

For the regression coefficient b, the 95% limits as shown in Figure 7 were:

Analysis of the upper and lower limits of confidence intervals leads to the conclusion that with probability parameters a and b, being within the specified limits, do not take zero values, i.e. are not statistically insignificant and significantly different from zero.

7. The obtained estimates of the regression equation allow it to be used for forecasting. If the predicted cost of living is:

Then the predicted value of the cost of living will be:

We calculate the forecast error using the formula:

Where

We will also calculate the variance using Excel PPP. For this:

1) Activate Function Wizard: in the main menu select Formulas / Insert Function.

3) Fill in the range containing the numerical data of the factor characteristic. Click OK.

Figure 10 Calculation of variance

We got the variance value

To calculate the residual variance per degree of freedom, we will use the results of analysis of variance as shown in Figure 7.

Confidence intervals for predicting individual values ​​of y with a probability of 0.95 are determined by the expression:

The interval is quite wide, primarily due to the small volume of observations. In general, the forecast for the average monthly salary turned out to be reliable.

The condition of the problem is taken from: Workshop on econometrics: Proc. allowance / I.I. Eliseeva, S.V. Kurysheva, N.M. Gordeenko and others; Ed. I.I. Eliseeva. - M.: Finance and Statistics, 2003. - 192 p.: ill.



New on the site

>

Most popular