Home Children's dentistry Construct an interval distribution series. Construction of interval variation series for continuous quantitative data

Construct an interval distribution series. Construction of interval variation series for continuous quantitative data

In many cases, the cat's statistical population includes a large or even more infinite number option, which is most often found with continuous variation, it is practically impossible and impractical to form a group of units for each option. In such cases, combining statistical units into groups is possible only on the basis of an interval, i.e. such a group that has certain limits for the values ​​of a varying characteristic. These limits are indicated by two numbers indicating the upper and lower limits of each group. The use of intervals leads to the formation of an interval distribution series.

Interval rad is a variation series, the variants of which are presented in the form of intervals.

An interval series can be formed with equal and unequal intervals, while the choice of the principle for constructing this series depends mainly on the degree of representativeness and convenience of the statistical population. If the population is large enough (representative) in terms of the number of units and is completely homogeneous in its composition, then it is advisable to base the formation of an interval series on equality of intervals. Usually, using this principle, an interval series is formed for those populations where the range of variation is relatively small, i.e. the maximum and minimum options usually differ from each other several times. In this case, the value of equal intervals is calculated by the ratio of the range of variation of a characteristic to a given number of formed intervals. To determine equal And interval, the Sturgess formula can be used (usually with a small variation of interval characteristics and a large number of units in the statistical population):

where x i - equal interval value; X max, X min - maximum and minimum options in a statistical aggregate; n . - the number of units in the aggregate.

Example. It is advisable to calculate the size of an equal interval according to the density of radioactive contamination with cesium - 137 in 100 settlements of the Krasnopolsky district of the Mogilev region, if it is known that the initial (minimum) option is equal to I km / km 2, the final ( maximum) - 65 ki/km 2. Using formula 5.1. we get:

Therefore, to form an interval series with at equal intervals according to the density of cesium contamination - 137 settlements of the Krasnopolsky region, the size of an equal interval can be 8 ki/km 2.

Under conditions of uneven distribution, i.e. when the maximum and minimum options are hundreds of times, when forming an interval series, you can apply the principle unequal intervals. Unequal intervals usually increase as we move to larger values ​​of the characteristic.

The shape of the intervals can be closed or open. Closed It is customary to call intervals that have both lower and upper boundaries. Open intervals have only one boundary: in the first interval there is an upper boundary, in the last one there is a lower boundary.

Evaluation interval series, especially at unequal intervals, it is advisable to carry out taking into account distribution density, the simplest way to calculate which is the ratio of the local frequency (or frequency) to the size of the interval.

To practically form an interval series, you can use the table layout. 5.3.

Table 5.3. The procedure for forming an interval series settlements Krasnopolsky district according to the density of radioactive contamination with cesium -137

The main advantage of the interval series is its maximum compactness. at the same time, in the interval distribution series, individual variants of the characteristic are hidden in the corresponding intervals

When graphically depicting an interval series in a system of rectangular coordinates, the upper boundaries of the intervals are plotted on the abscissa axis, and the local frequencies of the series are plotted on the ordinate axis. The graphical construction of an interval series differs from the construction of a distribution polygon in that each interval has lower and upper boundaries, and two abscissas correspond to one ordinate value. Therefore, on the graph of an interval series, not a point is marked, as in a polygon, but a line connecting two points. These horizontal lines are connected to each other by vertical lines and the figure of a stepped polygon is obtained, which is commonly called histogram distribution (Fig. 5.3).

At graphic construction interval series over a sufficiently large statistical population, the histogram approaches symmetrical form of distribution. In those cases where the statistical population is small, as a rule, asymmetrical bar chart.

In some cases, it is advisable to form a number of accumulated frequencies, i.e. cumulative row. A cumulative series can be formed on the basis of a discrete or interval distribution series. When graphically depicting a cumulative series in a system of rectangular coordinates, variants are plotted on the abscissa axis, and accumulated frequencies (frequencies) are plotted on the ordinate axis. The resulting curved line is usually called cumulative distribution (Fig. 5.4).

Formation and graphic representation various types variation series contributes to a simplified calculation of the main statistical characteristics, which are discussed in detail in topic 6, helps to better understand the essence of the laws of distribution of the statistical population. Analysis variation series acquires particular importance in cases where it is necessary to identify and trace the relationship between options and frequencies (frequencies). This dependence is manifested in the fact that the number of cases per option is in a certain way related to the size of this option, i.e. with increasing values ​​of the varying characteristic, the frequencies (frequencies) of these values ​​experience certain, systematic changes. This means that the numbers in the frequency (frequency) column do not fluctuate chaotically, but change in a certain direction, in a certain order and sequence.

If the frequencies show a certain systematicity in their changes, then this means that we are on the way to identifying a pattern. System, order, sequence in changing frequencies is a reflection common reasons, general conditions, characteristic of the entire population.

It should not be assumed that the distribution pattern is always given in ready-made form. There are quite a lot of variation series in which the frequencies bizarrely jump, sometimes increasing, sometimes decreasing. In such cases, it is advisable to find out what kind of distribution the researcher is dealing with: either this distribution does not have any inherent patterns at all, or its nature has not yet been revealed: The first case is rare, but the second case is a fairly common and very widespread phenomenon.

Thus, when forming an interval series, the total number of statistical units may be small, and each interval contains a small number of variants (for example, 1-3 units). In such cases, one cannot count on the manifestation of any pattern. In order for a natural result to be obtained based on random observations, it is necessary for the law to come into force large numbers, i.e. so that for each interval there would be not several, but tens and hundreds of statistical units. To this end, we must try to increase the number of observations as much as possible. This is the most the right way detecting patterns in mass processes. If it doesn’t seem real opportunity increase the number of observations, then identifying a pattern can be achieved by reducing the number of intervals in the distribution series. By reducing the number of intervals in a variation series, the number of frequencies in each interval thereby increases. This means that the random fluctuations of each statistical unit overlap each other, “smooth out”, turning into a pattern.

The formation and construction of variation series allows us to obtain only a general, approximate picture of the distribution of the statistical population. For example, a histogram only in rough form expresses the relationship between the values ​​of a characteristic and its frequencies (frequencies). Therefore, variation series are essentially only the basis for further, in-depth study of the internal regularity of the static distribution.

TEST QUESTIONS FOR TOPIC 5

1. What is variation? What causes variation in a trait in a statistical population?

2. What types of varying characteristics can occur in statistics?

3. What is a variation series? What types of variation series can there be?

4. What is a ranked series? What are its advantages and disadvantages?

5. What is a discrete series and what are its advantages and disadvantages?

6. What is the procedure for forming an interval series, what are its advantages and disadvantages?

7. What is a graphical representation of ranked, discrete, interval distribution series?

8. What is the cumulate of distribution and what does it characterize?

Send your good work in the knowledge base is simple. Use the form below

Good work to the site">

Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

Posted on http://www.allbest.ru/

TASK1

The following information is available about wages employees at the enterprise:

Table 1.1

The amount of wages in conventional terms. den. units

It is required to construct an interval distribution series by which to find;

1) average salary;

2) average linear deviation;

4) standard deviation;

5) range of variation;

6) oscillation coefficient;

7) linear coefficient variations;

8) simple coefficient of variation;

10) median;

11) asymmetry coefficient;

12) Pearson asymmetry index;

13) kurtosis coefficient.

Solution

As you know, the options (recognized values) are arranged in ascending order to form discrete variation series. With a large number option (more than 10), even in the case of discrete variation, interval series are constructed.

If an interval series is compiled with even intervals, then the range of variation is divided by the specified number of intervals. Moreover, if the resulting value is integer and unambiguous (which is rare), then the length of the interval is assumed to be equal to this number. In other cases produced rounding Necessarily V side increase, So to the last digit left was even. Obviously, as the length of the interval increases, the range of variation by an amount equal to the product of the number of intervals: by the difference between the calculated and initial length of the interval

A) If the magnitude of the expansion of the range of variation is insignificant, then it is either added to the largest or subtracted from the smallest value of the characteristic;

b) If the magnitude of the expansion of the range of variation is noticeable, then, so that the center of the range does not shift, it is approximately divided in half, simultaneously adding to the largest and subtracting from lowest values sign.

If an interval series with unequal intervals is compiled, then the process is simplified, but still the length of the intervals must be expressed as a number with the last even digit, which greatly simplifies subsequent calculations of numerical characteristics.

30 is the sample size.

Let's create an interval distribution series using the Sturges formula:

K = 1 + 3.32*log n,

K - number of groups;

K = 1 + 3.32*lg 30 = 5.91=6

We find the range of the attribute - wages of workers at the enterprise - (x) using the formula

R= xmax - xmin and divide by 6; R= 195-112=83

Then the length of the interval will be l lane=83:6=13.83

The beginning of the first interval will be 112. Adding to 112 l ras = 13.83, we get its final value 125.83, which is also the beginning of the second interval, etc. end of the fifth interval - 195.

When finding frequencies, one should be guided by the rule: “if the value of a feature coincides with the boundary of the internal interval, then it should be attributed to the previous interval.”

We obtain an interval series of frequencies and cumulative frequencies.

Table 1.2

Therefore, 3 employees have a salary. fee from 112 to 125.83 conventional monetary units. Highest salary fee from 181.15 to 195 conventional monetary units. only 6 employees.

To calculate numerical characteristics, we transform the interval series into a discrete series, taking the middle of the intervals as an option:

Table 1.3

14131,83

Using the weighted arithmetic mean formula

conventional monetary units

Average linear deviation:

where xi is the value of the characteristic being studied for the i-th unit of the population,

Average value of the studied trait.

Posted on http://www.allbest.ru/

LPosted on http://www.allbest.ru/

Conventional monetary units

Standard deviation:

Dispersion:

Relative range of variation (oscillation coefficient): c= R:,

Relative linear deviation: q = L:

The coefficient of variation: V = y:

The oscillation coefficient shows the relative fluctuation of the extreme values ​​of a characteristic around the arithmetic mean, and the coefficient of variation characterizes the degree and homogeneity of the population.

c= R: = 83 / 159.485*100% = 52.043%

Thus, the difference between the extreme values ​​is 5.16% (=94.84%-100%) less than the average salary of employees at the enterprise.

q = L: = 17.765/ 159.485*100% = 11.139%

V = y: = 21.704/ 159.485*100% = 13.609%

The coefficient of variation is less than 33%, which indicates a weak variation in wages of workers at the enterprise, i.e. that the average value is a typical characteristic of workers’ wages (the population is homogeneous).

In interval distribution series fashion determined by the formula -

Frequency of the modal interval, i.e. the interval containing the largest number of options;

Frequency of the interval preceding the modal;

Frequency of the interval following the modal;

Modal interval length;

The lower limit of the modal interval.

For determining medians in the interval series we use the formula

where is the cumulative (accumulated) frequency of the interval preceding the median;

Lower limit of the median interval;

Median interval frequency;

Length of the median interval.

Median interval- an interval whose accumulated frequency (=3+3+5+7) exceeds half the sum of frequencies - (153.49; 167.32).

Let's calculate asymmetry and kurtosis, for which we will create a new worksheet:

Table 1.4

Factual data

Calculated data

Let's calculate the third order moment

Therefore, the asymmetry is equal to

Since 0.3553 0.25, the asymmetry is considered significant.

Let's calculate the fourth order moment

Therefore, the kurtosis is equal to

Because< 0, то эксцесс является плосковершинным.

The degree of asymmetry can be determined using the Pearson asymmetry coefficient (As): oscillation sample value turnover

where is the arithmetic mean of the distribution series; -- fashion; -- standard deviation.

With a symmetric (normal) distribution = Mo, therefore, the asymmetry coefficient is zero. If As > 0, then there is more mode, therefore, there is a right-handed asymmetry.

If As< 0, то less fashion, therefore, there is left-sided asymmetry. The asymmetry coefficient can vary from -3 to +3.

The distribution is not symmetrical, but has left-sided asymmetry.

TASK 2

What should the sample size be so that with probability 0.954 the sampling error does not exceed 0.04 if, based on previous surveys, it is known that the variance is 0.24?

Solution

The sample size for non-repetitive sampling is calculated using the formula:

t - confidence coefficient (with a probability of 0.954 it is equal to 2.0; determined from tables of probability integrals),

y2=0.24 - standard deviation;

10,000 people - sample size;

Dx =0.04 - maximum error of the sample mean.

With a probability of 95.4%, it can be stated that the sample size, ensuring a relative error of no more than 0.04, should be at least 566 families.

TASK3

The following data is available on income from the main activities of the enterprise, million rubles.

To analyze a series of dynamics, determine the following indicators:

1) chain and basic:

Absolute increases;

Rates of growth;

Growth rate;

2) average

Dynamics row level;

Absolute increase;

Growth rate;

Rate of increase;

3) absolute value of 1% increase.

Solution

1. Absolute increase (Dy)- this is the difference between the next level of the series and the previous (or basic):

chain: DN = yi - yi-1,

basic: DN = yi - y0,

уi - row level,

i - row level number,

y0 - base year level.

2. Growth rate (Tu) is the ratio of the subsequent level of the series and the previous one (or base year 2001):

chain: Tu = ;

basic: Tu =

3. Growth rate (TD) is the ratio of absolute growth to the previous level, expressed in %.

chain: Tu = ;

basic: Tu =

4. Absolute value of 1% increase (A)- this is the ratio of chain absolute growth to the growth rate, expressed in %.

A =

Average row level calculated using the arithmetic mean formula.

Average level of income from core activities for 4 years:

Average absolute increase calculated by the formula:

where n is the number of levels of the series.

On average, for the year, income from core activities increased by 3.333 million rubles.

Average annual growth rate calculated using the geometric mean formula:

уn is the final level of the row,

y0 - First level row.

Tu = 100% = 102.174%

Average annual growth rate calculated by the formula:

T? = Tu - 100% = 102.74% - 100% = 2.74%.

Thus, on average over the year, income from the main activities of the enterprise increased by 2.74%.

TASKSA4

Calculate:

1. Individual price indices;

2. General trade turnover index;

3. Aggregate price index;

4. Aggregate index of the physical volume of sales of goods;

5. Break down the absolute increase in the value of trade turnover by factors (due to changes in prices and the number of goods sold);

6. Draw brief conclusions on all obtained indicators.

Solution

1. According to the condition, individual price indices for products A, B, C amounted to -

ipA=1.20; iрБ=1.15; iрВ=1.00.

2. We will calculate the general trade turnover index using the formula:

I w = = 1470/1045*100% = 140.67%

Trade turnover increased by 40.67% (140.67%-100%).

On average, commodity prices increased by 10.24%.

The amount of additional costs of buyers from price increases:

w(p) = ? p1q1 - ? p0q1 = 1470 - 1333.478 = 136.522 million rubles.

As a result of rising prices, buyers had to spend an additional 136.522 million rubles.

4. General index of physical volume of trade turnover:

The physical volume of trade turnover increased by 27.61%.

5. Let’s determine the overall change in trade turnover in the second period compared to the first period:

w = 1470-1045 = 425 million rubles.

due to price changes:

W(p) = 1470 - 1333.478 = 136.522 million rubles.

due to changes in physical volume:

w(q) = 1333.478 - 1045 = 288.478 million rubles.

The turnover of goods increased by 40.67%. Prices on average for 3 goods increased by 10.24%. The physical volume of trade turnover increased by 27.61%.

In general, sales volume increased by 425 million rubles, including due to rising prices it increased by 136.522 million rubles, and due to an increase in sales volumes - by 288.478 million rubles.

TASK5

The following data is available for 10 factories in one industry.

Plant number

Product output, thousand pcs. (X)

Based on the given data:

I) to confirm the provisions of logical analysis about the presence of a linear correlation between the factor characteristic (product volume) and the resultant characteristic (electricity consumption), plot the initial data on the graph of the correlation field and draw conclusions about the form of the relationship, indicate its formula;

2) determine the parameters of the connection equation and plot the resulting theoretical line on the graph of the correlation field;

3) calculate the linear correlation coefficient,

4) explain the meaning of the indicators obtained in paragraphs 2) and 3);

5) using the resulting model, make a forecast about the possible energy consumption at a plant with a production volume of 4.5 thousand units.

Solution

The data of the attribute - the volume of production (factor), will be denoted by xi; sign - electricity consumption (result) through yi; points with coordinates (x, y) are plotted on the correlation field OXY.

The points of the correlation field are located along a certain straight line. Therefore, the relationship is linear; we will look for a regression equation in the form of a straight line Уx=ax+b. To find it, we use the system of normal equations:

Let's create a calculation table.

Using the averages found, we compose a system and solve it with respect to parameters a and b:

So, we get the regression equation for y on x: = 3.57692 x + 3.19231

We build a regression line on the correlation field.

Substituting the x values ​​from column 2 into the regression equation, we obtain the calculated ones (column 7) and compare them with the y data, which is reflected in column 8. By the way, the correctness of the calculations is confirmed by the coincidence of the average values ​​of y and.

Coefficientlinear correlation evaluates the closeness of the relationship between characteristics x and y and is calculated using the formula

The angular coefficient of direct regression a (at x) characterizes the direction of the identifieddependenciessigns: for a>0 they are the same, for a<0- противоположны. Its absolute value - a measure of change in the resultant characteristic when the factor characteristic changes by a unit of measurement.

The free term of direct regression reveals the direction, and its absolute value is a quantitative measure of the influence of all other factors on the resulting characteristic.

If< 0, then the resource of the factor characteristic of an individual object is used with less, and when>0 Withgreater efficiency than the average for the entire set of objects.

Let's conduct a post-regression analysis.

The coefficient at x of the direct regression is equal to 3.57692 >0, therefore, with an increase (decrease) in production output, electricity consumption increases (decreases). Increase in production output by 1 thousand units. gives an average increase in electricity consumption by 3.57692 thousand kWh.

2. The free term of the direct regression is equal to 3.19231, therefore, the influence of other factors increases the strength of the impact of product output on electricity consumption in absolute measurement by 3.19231 thousand kWh.

3. The correlation coefficient of 0.8235 reveals a very close dependence of electricity consumption on product output.

According to Eq. regression model easy to make predictions. To do this, the values ​​of x - the volume of production - are substituted into the regression equation and electricity consumption is predicted. In this case, the values ​​of x can be taken not only within a given range, but also outside it.

Let's make a forecast about the possible energy consumption at a plant with a production volume of 4.5 thousand units.

3.57692*4.5 + 3.19231= 19.288 45 thousand kWh.

LIST OF SOURCES USED

1. Zakharenkov S.N. Socio-economic statistics: Textbook and practical guide. -Mn.: BSEU, 2002.

2. Efimova M.R., Petrova E.V., Rumyantsev V.N. General theory of statistics. - M.: INFRA - M., 2000.

3. Eliseeva I.I. Statistics. - M.: Prospekt, 2002.

4. General theory of statistics / Under general. ed. O.E. Bashina, A.A. Spirina. - M.: Finance and Statistics, 2000.

5. Socio-economic statistics: Educational and practical. allowance / Zakharenkov S.N. and others - Mn.: Yerevan State University, 2004.

6. Socio-economic statistics: Textbook. allowance. / Ed. Nesterovich S.R. - Mn.: BSEU, 2003.

7. Teslyuk I.E., Tarlovskaya V.A., Terlizhenko N. Statistics. - Minsk, 2000.

8. Kharchenko L.P. Statistics. - M.: INFRA - M, 2002.

9. Kharchenko L.P., Dolzhenkova V.G., Ionin V.G. Statistics. - M.: INFRA - M, 1999.

10. Economic statistics / Ed. Yu.N. Ivanova - M., 2000.

Posted on Allbest.ru

...

Similar documents

    Calculation of the arithmetic mean for an interval distribution series. Definition general index physical volume of trade turnover. Analysis of the absolute change in the total cost of production due to changes in physical volume. Calculation of the coefficient of variation.

    test, added 07/19/2010

    The essence of wholesale, retail and public trade. Formulas for calculating individual and aggregate turnover indices. Calculation of characteristics of an interval distribution series - arithmetic mean, mode and median, coefficient of variation.

    course work, added 05/10/2013

    Calculation of planned and actual sales volume, percentage of plan fulfillment, absolute change in turnover. Determination of absolute growth, average growth rates and increase in cash income. Calculation of structural averages: modes, medians, quartiles.

    test, added 02/24/2012

    Interval series of distribution of banks by profit volume. Finding the mode and median of the resulting interval distribution series graphical method and by calculations. Calculation of characteristics of interval distribution series. Calculation of the arithmetic mean.

    test, added 12/15/2010

    Formulas for determining the average values ​​of an interval series - modes, medians, dispersion. Calculation of analytical indicators of dynamics series using chain and basic schemes, growth rates and increments. The concept of a consolidated index of costs, prices, expenses and turnover.

    course work, added 02/27/2011

    Concept and purpose, order and rules for constructing a variation series. Analysis of data homogeneity in groups. Indicators of variation (fluctuation) of a trait. Determination of average linear and square deviation, coefficient of oscillation and variation.

    test, added 04/26/2010

    The concept of mode and median as typical characteristics, the procedure and criteria for their determination. Finding the mode and median in discrete and interval variation series. Quartiles and deciles as additional characteristics of variation statistical series.

    test, added 09/11/2010

    Construction of an interval distribution series based on grouping characteristics. Characteristics of the deviation of the frequency distribution from a symmetrical shape, calculation of kurtosis and asymmetry indicators. Analysis of indicators balance sheet or income statement.

    test, added 10/19/2014

    Converting empirical series into discrete and interval ones. Determination of the average value for a discrete series using its properties. Calculation using a discrete series of mode, median, variation indicators (dispersion, deviation, oscillation coefficient).

    test, added 04/17/2011

    Construction of a statistical series of distribution of organizations. Graphical determination of the mode and median values. Closeness correlation connection using the coefficient of determination. Determining the sampling error of the average number of employees.

An example of solving a test on mathematical statistics

Problem 1

Initial data : students of a certain group consisting of 30 people passed an exam in the “Informatics” course. The grades received by students form the following series of numbers:

I. Let's create a variation series

m x

w x

m x nak

w x nak

Total:

II. Graphic representation of statistical information.

III. Numerical characteristics of the sample.

1. Arithmetic mean

2. Geometric mean

3. Fashion

4. Median

222222333333333 | 3 34444444445555

5. Sample variance

7. Coefficient of variation

8. Asymmetry

9. Asymmetry coefficient

10. Excess

11. Kurtosis coefficient

Problem 2

Initial data : Students of some group wrote their final test. The group consists of 30 people. The points scored by students form the following series of numbers

Solution

I. Since the characteristic takes on many different values, we will construct an interval variation series for it. To do this, first set the interval value h. Let's use Stanger's formula

Let's create an interval scale. In this case, we will take as the upper limit of the first interval the value determined by the formula:

We determine the upper boundaries of subsequent intervals using the following recurrent formula:

, Then

We finish constructing the interval scale, since the upper limit of the next interval has become greater than or equal to the maximum sample value
.

II. Graphic display of interval variation series

III. Numerical characteristics of the sample

To determine the numerical characteristics of the sample, we will compile an auxiliary table

Sum:

1. Arithmetic mean

2. Geometric mean

3. Fashion

4. Median

10 11 12 12 13 13 13 13 14 14 14 14 15 15 15 |15 15 15 16 16 16 16 16 17 17 18 19 19 20 20

5. Sample variance

6. Sample standard deviation

7. Coefficient of variation

8. Asymmetry

9. Asymmetry coefficient

10. Excess

11. Kurtosis coefficient

Problem 3

Condition : the ammeter scale division value is 0.1 A. Readings are rounded to the nearest whole division. Find the probability that during the reading an error will be made that exceeds 0.02 A.

Solution.

The rounding error of the sample can be considered as a random variable X, which is distributed evenly in the interval between two adjacent integer divisions. Uniform distribution density

Where
- length of the interval containing possible values X; outside this interval
In this problem, the length of the interval containing possible values ​​is X, is equal to 0.1, so

The reading error will exceed 0.02 if it is in the interval (0.02; 0.08). Then

Answer: R=0,6

Problem 4

Initial data: mathematical expectation and standard deviation of a normally distributed characteristic X respectively equal to 10 and 2. Find the probability that as a result of the test X will take the value contained in the interval (12, 14).

Solution.

Let's use the formula

And theoretical frequencies

Solution

For X her expected value M(X) and variance D(X). Solution. Let's find the distribution function F(x) random variable... sampling error). Let's compose variational row Interval width will be: For each value row Let's calculate how many...

  • Solution: separable equation

    Solution

    In the form of To find the quotient solutions inhomogeneous equation let's make up system Let's solve the resulting system... ; +47; +61; +10; -8. Build interval variational row. Give statistical estimates of the average value...

  • Solution: Let's calculate chain and basic absolute increases, growth rates, growth rates. We summarize the obtained values ​​in Table 1

    Solution

    Volume of production. Solution: Arithmetic mean of interval variational row is calculated as follows: for... Marginal sampling error with probability 0.954 (t=2) will be: Δ w = t*μ = 2*0.0146 = 0.02927 Let’s define the boundaries...

  • Solution. Sign

    Solution

    ABOUT work experience which and made up sample. The sample average work experience... of these employees and made up sample. The average duration for the sample... 1.16, significance level α = 0.05. Solution. Variational row of this sample looks like: 0.71 ...

  • Working curriculum in biology for grades 10-11 Compiled by: Polikarpova S. V.

    Working training program

    The simplest crossing schemes" 5 L.r. " Solution elementary genetic problems" 6 L.b. " Solution elementary genetic problems" 7 L.b. "..., 110, 115, 112, 110. Compose variational row, draw variational curve, find the average value of the characteristic...

  • A discrete variation series is constructed for discrete characteristics.

    In order to construct a discrete variation series, you need to perform the following steps: 1) arrange the units of observation in increasing order of the studied value of the characteristic,

    2) determine all possible values ​​of the attribute x i , arrange them in ascending order,

    the value of the attribute, i .

    frequency of attribute value and denote f i . The sum of all frequencies of a series is equal to the number of elements in the population being studied.

    Example 1 .

    List of grades received by students in exams: 3; 4; 3; 5; 4; 2; 2; 4; 4; 3; 5; 2; 4; 5; 4; 3; 4; 3; 3; 4; 4; 2; 2; 5; 5; 4; 5; 2; 3; 4; 4; 3; 4; 5; 2; 5; 5; 4; 3; 3; 4; 2; 4; 4; 5; 4; 3; 5; 3; 5; 4; 4; 5; 4; 4; 5; 4; 5; 5; 5.

    Here is the number X - gradeis a discrete random variable, and the resulting list of estimates isstatistical (observable) data .

      arrange observation units in ascending order of the studied characteristic value:

    2; 2; 2; 2; 2; 2; 2; 2; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 3; 4; 4; 4; 4; 4; 4; 4; 4; 4; 4; 4; 4; 4; 4; 4; 4; 4; 4; 4; 4; 4; 4; 4; 5; 5; 5; 5; 5; 5; 5; 5; 5; 5; 5; 5; 5; 5; 5; 5; 5.

    2) determine all possible values ​​of the attribute x i, order them in ascending order:

    In this example, all estimates can be divided into four groups with the following values: 2; 3; 4; 5.

    The value of a random variable corresponding to a particular group of observed data is called the value of the attribute, option (option) and designate x i .

    A number that shows how many times the corresponding value of a characteristic occurs in a number of observations is called frequency of attribute value and denote f i .

    For our example

    score 2 occurs - 8 times,

    score 3 occurs - 12 times,

    score 4 occurs - 23 times,

    score 5 occurs - 17 times.

    There are 60 ratings in total.

    4) write the received data into a table of two rows (columns) - x i and f i.

    Based on these data, it is possible to construct a discrete variation series

    Discrete variation series – this is a table in which the occurring values ​​of the characteristic being studied are indicated as individual values ​​in ascending order and their frequencies

    1. Construction of an interval variation series

    In addition to the discrete variational series, a method of grouping data such as an interval variational series is often encountered.

    An interval series is constructed if:

      the sign has a continuous nature of change;

      There were a lot of discrete values ​​(more than 10)

      the frequencies of discrete values ​​are very small (do not exceed 1-3 with a relatively large number of observation units);

      many discrete values ​​of a feature with the same frequencies.

    An interval variation series is a way of grouping data in the form of a table that has two columns (the values ​​of the characteristic in the form of an interval of values ​​and the frequency of each interval).

    Unlike discrete series the values ​​of an interval series attribute are represented not by individual values, but by an interval of values ​​(“from - to”).

    The number that shows how many observation units fell into each selected interval is called frequency of attribute value and denote f i . The sum of all frequencies of a series is equal to the number of elements (units of observation) in the population being studied.

    If a unit has a characteristic value equal to upper limit interval, then it should be assigned to the next interval.

    For example, a child with a height of 100 cm will fall into the 2nd interval, and not into the first; and a child with a height of 130 cm will fall into the last interval, and not into the third.

    Based on these data, an interval variation series can be constructed.

    Each interval has a lower bound (xn), an upper bound (xw) and an interval width ( i).

    The interval boundary is the value of the attribute that lies on the border of two intervals.

    children's height (cm)

    children's height (cm)

    amount of children

    more than 130

    If an interval has an upper and lower boundary, then it is called closed interval. If an interval has only a lower or only an upper boundary, then it is - open interval. Only the very first or the very last interval can be open. In the above example, the last interval is open.

    Interval width (i) – the difference between the upper and lower limits.

    i = x n - x in

    The width of the open interval is assumed to be the same as the width of the adjacent closed interval.

    children's height (cm)

    amount of children

    Interval width (i)

    for calculations 130+20=150

    20 (because the width of the adjacent closed interval is 20)

    All interval series are divided into interval series with equal intervals and interval series with unequal intervals . In spaced rows with equal intervals, the width of all intervals is the same. In interval series with unequal intervals, the width of the intervals is different.

    In the example under consideration - an interval series with unequal intervals.

    If the random variable under study is continuous, then ranking and grouping of observed values ​​often does not allow identifying character traits varying its values. This is explained by the fact that individual values ​​of a random variable can differ from each other as little as desired, and therefore, in the totality of observed data, identical values ​​of a quantity can rarely occur, and the frequencies of variants differ little from each other.

    It is also impractical to construct a discrete series for a discrete random variable, the number possible values which is great. In such cases, you should build interval variation series distributions.

    To construct such a series, the entire interval of variation of the observed values ​​of a random variable is divided into a series partial intervals and counting the frequency of occurrence of the value values ​​in each partial interval.

    Interval variation series call an ordered set of intervals of varying values ​​of a random variable with corresponding frequencies or relative frequencies of values ​​of the variable falling into each of them.

    To build an interval series you need:

    1. define size partial intervals;
    2. define width intervals;
    3. set it for each interval top And lower limit ;
    4. group the observation results.

    1 . The question of choosing the number and width of grouping intervals has to be decided in each specific case based on goals research, volume samples and degree of variation characteristic in the sample.

    Approximately number of intervals k can be estimated based only on sample size n in one of the following ways:

    • according to the formula Sturges : k = 1 + 3.32 log n ;
    • using table 1.

    Table 1

    2 . Spaces of equal width are generally preferred. To determine the width of intervals h calculate:

    • range of variation R - sample values: R = x max - x min ,

    Where xmax And xmin - maximum and minimum sampling options;

    • width of each interval h determined by the following formula: h = R/k .

    3 . Bottom line first interval x h1 is selected so that the minimum sample option xmin fell approximately in the middle of this interval: x h1 = x min - 0.5 h .

    Intermediate intervals obtained by adding the length of the partial interval to the end of the previous interval h :

    x hi = x hi-1 +h.

    The construction of an interval scale based on the calculation of interval boundaries continues until the value x hi satisfies the relation:

    x hi< x max + 0,5·h .

    4 . In accordance with the interval scale, the characteristic values ​​are grouped - for each partial interval the sum of frequencies is calculated n i option included in i th interval. In this case, the interval includes values ​​of the random variable that are greater than or equal to the lower limit and less than the upper limit of the interval.

    Polygon and histogram

    For clarity, various statistical distribution graphs are constructed.

    Based on the data of a discrete variation series, they construct polygon frequencies or relative frequencies.

    Frequency polygon x 1 ; n 1 ), (x 2 ; n 2 ), ..., (x k ; n k ). To construct a frequency polygon, options are plotted on the abscissa axis. x i , and on the ordinate - the corresponding frequencies n i . Points ( x i ; n i ) are connected by straight segments and a frequency polygon is obtained (Fig. 1).

    Polygon of relative frequencies called a broken line whose segments connect points ( x 1 ; W 1 ), (x 2 ; W 2 ), ..., (x k ; Wk ). To construct a polygon of relative frequencies, options are plotted on the abscissa axis x i , and on the ordinate - the corresponding relative frequencies W i . Points ( x i ; W i ) are connected by straight segments and a polygon of relative frequencies is obtained.

    When continuous sign it is advisable to build histogram .

    Frequency histogram called a stepped figure consisting of rectangles, the bases of which are partial intervals of length h , and the heights are equal to the ratio n i/h (frequency density).

    To construct a frequency histogram, partial intervals are laid out on the abscissa axis, and segments parallel to the abscissa axis are drawn above them at a distance n i/h .



    New on the site

    >

    Most popular