Home Hygiene Statistical summary and grouping. Statistical distribution series

Statistical summary and grouping. Statistical distribution series

The concept of summary, grouping, classification

Summary– systematization and summing up: weather reports, reports from the fields. The summary does not allow you to analyze the information in detail. Any summary must be based on data grouping, i.e. first grouping and then summarizing the data.

Grouping– division of populations into a number of groups according to the most significant characteristics.

There are qualitative and quantitative groupings. High quality– attributive, quantitative– variational. In turn, variation is divided into structural and analytical . Structural grouping involves calculating the specific gravity of each group. Example: at an enterprise, 80% are workers, 20% are office workers, of which 5% are managers, 3% are office workers, 12% are specialists. Target analytical groupings - to identify the relationship between characteristics: length of service and average earnings, length of service and output, and others.

When conducting grouping it is necessary:

Conducting a comprehensive analysis of the nature of the phenomenon being studied;

Identification of a grouping characteristic (one or several);

Set the boundaries of groups in such a way that the groups are significantly different from each other, and homogeneous elements are combined in each group.

According to the degree of complexity, groupings can be simple and combinational (based on characteristics).

Based on the initial information, primary and secondary groups are distinguished, primary carried out on the basis of initial observation data, secondary uses data from the primary grouping.

The number of groups is determined according to the Sturgess formula:

Where n- number of groups, N– general population.

If equal intervals are used, then interval value equal to .

Intervals may be equal or unequal. The latter, in turn, are divided into those changing according to the law of arithmetic or geometric progression. The first and last intervals can be open or closed. Closed intervals include or do not include interval boundaries.

If the intervals are closed and nothing is said about including upper bounds, then we assume that the upper bounds are included.

If the intervals are open, then we focus on the last interval.

The characteristic in these intervals can be measured discretely and continuously (i.e., divided). With a continuous sign, the boundaries close 1-10, 10-20, 20-30; if the characteristic changes discretely, then the following notation can be used: 1 – 10, 11 – 20, 21 – 30.

If the intervals are open, then the value of the last interval is equal to the previous one, and the value of the first one is equal to the second one.

Classification– grouping according to qualitative criteria. It is relatively stable, standardized and approved by state statistics bodies.


3.2. Distribution series: types and main characteristics

Under near distribution refers to a series of data characterizing a socio-economic phenomenon based on one characteristic. This simplest form groupings based on two characteristics.

Distribution series are divided into qualitative and quantitative, ranked and not ranked, grouped and not grouped, with discrete and continuous distribution of the characteristic.

An example of a non-grouped, non-ranked series of wages is the statement wages. At the same time, the list of employees can be ranked alphabetically or by personnel numbers. An example of a ranked series is a list of teams, a ranking of tennis players.

Ranked series distribution - a series of data arranged in descending or ascending order of a characteristic.

For grouped ranked series, the following characteristics are distinguished: variant, frequency or frequency, cumulate and distribution density.

Option()– average interval value of the characteristic. Because When creating a grouping, the principle must be followed uniform distribution characteristic in each interval, then the variant can be calculated as half the sum of the boundaries of the intervals.

Frequency() shows how many times a given attribute value occurs. The relative expression of frequency is frequency(.) , i.e. share, specific weight of the sum of frequencies.

Cumulates() – accumulated frequency or frequency, calculation on an accrual basis. Volume, costs, income are cumulatively calculated, i.e. performance results.

Table 1

Grouping of operating credit institutions
by size of registered authorized capital

in 2008 in the Russian Federation

Page 2

Let us construct an interval variation series for the distribution of districts by

the ratio of the average monthly pension accrued to pensioners registered with social security authorities and the average monthly nominal accrued salary to workers in the economy.

The number of groups required to construct a grouping is calculated using the Sturgess formula.

N=1+3.32*ln n (1.1)

where, N - Number of groups;

n - Number of elements in total

N=1+3.32*ln 24= 1+3.32*1.38=5.5816=6

Let's divide the entire set of districts into 6 groups, and find the value of the interval using the formula:

H= (Xmax - Xmin) /n (1.2)

where, Xmax=65.9 is the maximum value of the attribute in the studied ranked series (district No. 24);

Xmin=28.1 - minimum value (region No. 1).

The interval size will be:

H=(65.9-28.1) /6=6.3

Let's construct a series of district distributions, with this interval value, the value of Xmin = 28.1, then the upper limit of the first group will be:

28.1+6.3=34.4, etc.

We will distribute organizations by established groups and count their number in each group (Table 1.2).

Table 1.2

Interval series of district distribution.

Group number

Groups of districts according to the value of the ratio avg. amount accrued per month. pensions by Wed. nominally accrued salary, rub.

Number of districts

For clarity, let us depict interval series in the form of a histogram (Fig. 1.2).


Other materials:

Cyclic concepts of social development
Social change is the transition of society from another state to another. A change during which an irreversible complication of the social structure occurs is called social development. There are evolutionary and revolutionary paths of development...

Social functions and social status
Definition social functions personality is quite fully revealed in the theory of social roles. Every person living in society is included in many different social groups(family, study group, friendly company, etc.). For example...

Methodology and methods of sociological research
The essence of sociological research. Social life constantly poses many questions to a person, which can only be answered with the help of scientific research, in particular sociological. However, not every study with...

A variation series represents the arrangement of the characteristic values ​​of each statistical unit in a certain order. In this case, individual values ​​of a characteristic are usually called a variant (option). . Each member of a variation series (variant) is called ordinal statistics, and the number of variants is called the rank (order) of the statistic.

The most important characteristics variation series are its extreme variants (X 1 = Xmin; X n = Xmax) and the range of variation (Rx = Xn – X 1).

Finds variation series wide application during the initial processing of statistical information obtained as a result statistical observation. They serve as the basis for constructing an empirical distribution function statistical units as part of a statistical population. Therefore, variation series are called distribution rows.

In statistics, he distinguishes the following types of variation series: ranked, discrete, interval.

Ranked (from Latin rang - rank) row- this is a series of distribution of units of a statistical population in which the variants of a characteristic are in ascending or descending order. Any ranked series consists of rank numbers (1 to n) and the corresponding options. The number of options in a ranked series formed according to an essential characteristic is usually equal to the number of units in the statistical population.

To form a ranked series according to this characteristic(for example, according to the number of livestock workers in 100 agricultural enterprises), you can use the layout of the table. 5.1.

Table 5.1. The order of formation of the ranked series

End of work -

This topic belongs to the section:

Statistics

And food of the Republic of Belarus.. Department of Education, Science and Personnel..

If you need additional material on this topic, or you did not find what you were looking for, we recommend using the search in our database of works:

What will we do with the received material:

If this material was useful to you, you can save it to your page on social networks:

All topics in this section:

Shundalov B.M.
General theory of statistics. Tutorial For economic specialties higher agricultural educational institutions. Study guide with

Subject of statistics
The word "statistics" comes from the Latin "status", which means state, state of affairs. This makes it possible to emphasize the theoretical cognitive essence

The essence of statistical observation
Any statistical research, as noted above (topic 1), always begins with the collection of primary (initial) information about each unit of the statistical population. However, not everyone

Statistical Observation Program
In the first chapter, attention was drawn to the fact that each statistical unit, as an object as a whole, has many various properties, qualities, specific features which are commonly called

The list of signs recorded during the observation process is usually called a statistical observation program
Program development is one of the most important theoretical and practical issues statistical observation. The quality factor of the program largely determines the quality of the collected material, its reliability and

Forms of statistical observation
The whole variety of statistical observations comes down to two forms: statistical reporting and specially organized statistical observations. Statistical reporting

Statistical forms
A statistical form is a bank containing questions from a statistical observation program and a place to answer them. the form is a carrier of statistical information obtained as a result

Types of statistical observation
Statistical observations are classified into types, which may differ according to various principles. Thus, depending on the extent of coverage of the object being studied, statistical observations can be subdivided

Methods for conducting statistical observations
Statistical observations can be carried out in various ways, among which the following are often found: reporting, expeditionary, self-calculation, self-registration, questionnaire, correspondent.

Place, timing and period of statistical observations
In terms of any statistical observation, the location of this observation must be clearly defined, i.e. the place where the collected information is registered, statistical data is filled out

Errors in statistical observation and measures to combat them
One of the most important requirements for the results of statistical observation is their accuracy, which is understood as a measure of the correspondence of statistical knowledge to

Primary statistical summary
The results of statistical observation contain versatile information about each unit of a population or object and are usually disordered. This starting material is necessary first in

The essence and significance of relative statistical indicators
Relative indicators are statistical quantities, expressing a measure of the quantitative relationship between the absolute values ​​of a characteristic and reflecting the relative sizes of phenomena and processes. ABOUT

Types of relative indicators. Relative dynamics indicators
Depending on the tasks solved using relative values, the following types of relative indicators are distinguished: dynamics, structure, coordination, intensity, comparison, order fulfillment,

Relative structure indicators
One of the most important features of all phenomena lies in their complexity. Even a molecule of distilled water consists of hydrogen and oxygen atoms. Many phenomena of nature, society, human

Relative coordination indicators
Relative indicators of coordination are the relationship between each other absolute sizes components in some absolute whole. To calculate these indicators, one of the components

Relative intensity indicators
Relative indicators of intensity (degree) represent the ratio of the absolute sizes of two qualitatively different, but interrelated characteristics in a statistical group

Relative comparison indicators
Relative indicators of comparison (comparison) are obtained by correlating the same absolute indicators related to different statistical units.

Relative order fulfillment rates
Relative performance indicators of an order (task, plan) represent the ratio of absolute, actually achieved indicators for a certain period or as of

Relative indicators of the level of economic development
Relative level indicators economic development call the ratio of the absolute sizes of two qualitatively different (opposite) but interrelated characteristics. With this

The essence and significance of the graphic method
Absolute statistical indicators obtained as a result of statistical observations, and various relative indicators calculated on this basis can be better, deeper, more accessible

Basic requirements for constructing coordinate diagrams
The most common and convenient way to graphically display absolute and relative dynamics indicators, comparison indicators, etc. is considered to be a coordinate diagram.

Methods for graphically depicting indicators of dynamics and structure
In many cases, there is a need to reflect on the same coordinate diagram not one, but several lines characterizing the dynamics of various absolute or relative indicators or

Methods for graphically displaying comparison indicators
In a broad sense, comparison of indicators is carried out both in time and in space, i.e. Comparison techniques can cover dynamics, structure, and territorial objects. Therefore

The essence and meaning of cartograms and cartodiagrams
In many cases there is a need to graphically represent the most important signs, characteristic of vast territorial objects. In the agro-industrial complex system this can be settlements, agriculture

Test questions for topic 4
1. What is it? graphic method and what is it based on? 2. For what main purposes is the graphical method used? 3. How are they classified?

The essence of variation. Types of variational characteristics
Variation (from the Latin variatio - change) is a change in a characteristic (variant) in a statistical population, i.e. acceptance by units of the population or their groups of different knowledge recognition

By number of livestock workers
Rank number (No.) options Option corresponding to rank number (No.) Symbol Number of livestock workers

Discrete series distribution
A discrete (dividing) series is a variation series in which its groups are formed according to a characteristic that changes discontinuously, i.e. after a certain number one

Livestock workers
No. options Option (sign value), X Frequency signs Local frequencies, fl Cumulative frequencies, fн

Interval distribution series
In many cases, the cat's statistical population includes a large or even more infinite number option, which is most often found with continuous variation, is practically impossible and impractical

The essence of averages
Variation series reflect a wide variety of phenomena and processes that make up the essence of our reality. For a more complete, in-depth study of the phenomena and processes of the world around us

Arithmetic mean
If you substitute the value K = 1 into formula 6.2, you get the arithmetic mean value, i.e. .

In the ranked distribution series
Rank No. Options (characteristic values) Symbols Cultivated area, ha

Distribution row
Item No. Options Local Frequencies Weighted Average Options Harvest Symbols

Basic properties of the arithmetic mean
The arithmetic mean has many mathematical properties, which have important mathematical significance in its calculation. Knowledge of these properties helps to control the correctness and precision

Average chronological value
One of the varieties of the arithmetic mean is the chronological mean. The average value calculated from the totality of the values ​​of a characteristic at different moments or over different periods V

Root mean square value
Provided that the value K = 2 is set in formula 6.2. we obtain the mean square value. In a ranked series, the mean square value is calculated using the unweighted (pr

Geometric mean value
If we substitute the value K = 0 into formula 6.2, then the result is an average geometric value, which has a simple (unweighted) and weighted form. The geometric mean is simple

Harmonic mean value
Subject to substitution in general formula 6.2 value K = -1, you can obtain the harmonic average value, which has a simple and weighted form. Name of the middle harmony

Structural average. The essence and meaning of fashion
In some cases, to obtain a general characteristic of a statistical population for any criterion, it is necessary to use the so-called. structural averages. These include

The essence and meaning of the median
Median – options located in the middle of the variation series. The median in the ranked series is found as follows. First, calculate the number of median options:

The concept of the simplest indicators of variation
The essence of variation was discussed in Chapter 5 of the textbook, where it was noted that variation is fluctuation, a change in the value of a characteristic in a statistical population, i.e. acceptance by units collectively

Standard deviation
The standard deviation is calculated based on the root mean square value. It appears in unweighted (simple) and weighted forms. For ranked p

The coefficient of variation
The coefficient of variation is a relative indicator that can be calculated using the following formula:

Test questions for topic 6
1. What is the average value and what does it express? 2. What is a defining property of a population and why is it used in statistics? 3. What are the main types of medium

The essence of the general and sample population
In statistics, a continuous type of observation, such as, for example, a general population census, is relatively rare. Still, most often it is necessary to use incomplete observations, which

The concept of a stochastic population
In real conditions, cases of statistical work with a general population are relatively rare and, therefore, it is not always possible to obtain basic statistical characteristics

The essence of the selective metope
Statistical work in most cases is somehow connected with data obtained as a result of the application of a sampling method. Many studies would be impossible if they did not use

Advantages and disadvantages of the sampling method
The sampling method has a number of advantages over continuous observation. Firstly, selective observation can significantly save labor, money, and time for its implementation. Owl

Selection methods, their advantages and disadvantages
The selection of statistical units from the general population can be done in different ways and depends on many conditions. The sampling method includes the following methods for selecting statistical units: case

The essence of representativeness errors and the procedure for their calculation
One of the central issues in sampling method It is considered a theoretical calculation of the main statistical characteristics and, above all, the average value of the attribute in the general statistical scoop

The concept of a small sample. Point estimate of basic statistical characteristics
The use of a sampling method can be based on the selection from the general population of theoretically any number of statistical units. It has been mathematically proven that sample populations can be

Marginal sampling error. Interval evaluation of basic statistical characteristics
The marginal sampling error is the discrepancy between the statistical characteristics obtained in the sample and the general population. As shown above (formula

Techniques for calculating sample size for various selection methods
Preparatory work to conducting sample observation is directly related to determining the required sample size, which depends on the selection method and the number of units in the general

The concept of a secondary (complex) statistical summary
The results of a simple summary, the content of which is discussed in topic 2, cannot always satisfy the researcher, since they only give general idea about the object being studied, i.e. from statistics t

Typological groupings
Typological grouping is the division of a statistical population into substantially the same qualitative typological groups. Typological grouping

Structural groupings
Structural grouping consists in dividing a homogeneous and qualitative set of statistical units into groups that characterize the composition of a complex object. Through structural

The essence and procedure for carrying out simple and analytical grouping
Analytical grouping, in which the statistical population is divided into homogeneous groups according to one factor attribute, is called simple.

Analytical grouping
No. Groups of farms by fertilizer doses, t/ha. Frequency signs in groups (number of population units in a group)

Performance indicators in potato growing
Item no. Indicators Groups of farms by fertilizer dose, t/ha Total (on average) 10-20

The essence and meaning of statistical tables
The results of processing observation data using a variety of statistical methods (summaries, relative, average values, formations, variation series, variation indicators, analytical

Elementary composition of statistical tables
Complex statistical processing of observation results usually involves the use of numerous tables. Therefore, each table is assigned an individual number.

Types and forms of statistical tables
Depending on the structure of the table subject, the following types of statistical tables are distinguished: simple, group and combinational. Simple statistical table - hara

Supporting and performance statistical tables
Statistical tables can serve different functional roles. Some of them serve, for example, to summarize the results of statistical observation and contribute to the performance of the primary function

Production results, 2003
(combination table) Item no. Groups of farms by farmland load per 1 tractor, ha Subgroups of farms by load

Flax processing enterprises of the agro-industrial complex in 2003
(worksheet) Item no. Annual processing volume of trusts, tons Number of employees, people Loading capacity a

Design of statistical tables
Achieving your goals with tabular method possible in cases where the necessary requirements on the design of statistical tables. Typically all tables should have

The concept of the dispersion method
The name of the method is due to its widespread use various types dispersions, the essence and methods of calculation of which are discussed in the sixth topic of the textbook. It is advisable to note that the variance in quantity

Sign-result
No. Individual options Linear deviations individual. option from the average Squared linear deviations

Peasant farms
No. Productivity, c/ha Linear deviations of individual productivity from the average, c/ha Squared linear deviations of yield

Late blight on potato yields
No. Groups of farms by share of treated crops, % Number of farms in the group Average share of treated crops,

Sign-result
Group number Intervals by factor characteristic Local frequency Average variant of the effective characteristic

Types of dispersions. Variance addition rule
The principle of calculating dispersion (mean square deviations) in general view discussed in topic 6. In relation to the dispersion method, this means that each type of variation corresponds to a certain

Potato yield (first group)
Item no. Productivity, c/ha Linear deviation from the average group yield Squared linear deviations

The concept of R. Fisher's criterion
Dispersion method consists in assessing the ratio of the corrected variance, which characterizes systematic fluctuations of group average values ​​of the studied effective characteristic, to the corrected variance

Two-factor dispersion complex
The solution of this complex is aimed at studying the qualitative influence of two factor characteristics of the influence of two factor characteristics on one or more effective characteristics. Two-factor complex

Cereal crops
Subgroup No. Number of farms in the subgroup Average yield c/ha Linear deviations of yield in the subgroup from the average

Features of a multifactor dispersion complex
Studying the quality of communication, i.e. the significance of the influence of several (three, four or more) factor characteristics on performance indicators, essentially the duration of combined use

Grain yield
Item no. Elements of variations Symbols General variation Systematic variation Residual variation

Essence and types of correlations
In the previous chapter it was shown that the quality (significance) of the relationship between factor and performance characteristics in a statistical population is determined and assessed using variance

Basic forms of correlation between characteristics
Identifying the form of connection between characteristics is preceded by determining the causal relationship between them. This is the most important and responsible moment for correct use correlation method. By

Indicators of the closeness of correlations. Correlation relationship
One of the central issues solved using the correlation method is the determination and assessment of a quantitative measure of the closeness of the relationship between factor and performance characteristics. At

Straight-line pair correlation coefficients
If the relationship between the characteristics of the studied pair of characteristics is expressed in a form close to direct, then the degree of closeness of the relationship between these characteristics can be calculated using the coefficient pr

Rank correlation coefficient
Basic statistical characteristics in cases where population, from which the sample is taken, turns out to be outside the parameters of the normal or close to it distribution law

Multiple correlation coefficient
When studying the closeness of the relationship between several factor and performance characteristics, the cumulative coefficient is calculated multiple correlation. So, when determining the total m

Determination indicators
When studying the quantitative influence of characteristics - factors on the results, it is important to determine what part of the variability of the resulting characteristic is directly due to the influence of the variation we are studying

Essence, types, and meaning of regression equations
Regression is understood as a function designed to describe the dependence of changes in effective characteristics under the influence of fluctuations in characteristics - factors. The concept of regression was introduced in statistics

Straight Regression Equation
Correlation connection in a form close to rectilinear, can be represented as an equation of a straight line:

Hyperbolic regression equation
If the form of connection between the factor-attribute and the result-attribute, identified using a coordinate diagram (correlation field), approaches hyperbolic, then it is necessary to compose and solve the equation

Regressions
Item no. Sign-factor Sign-result Reverse value of the sign-factor Square of the reciprocal value

Hyperbolic regression
Item no. Pea yield, c/ha X Cost of peas, thousand rubles/c Y Estimated values

Parabolic Regression Equation
In some cases, empirical data from a statistical population, visually depicted using a coordinate diagram, show that an increase in the factor is accompanied by an accelerated growth of res.

Parabolic regression
Item No. X Y XY X2 X2U X4

Parabolic regression
Item No. Specific gravity potato crops, X Potato harvest, thousand c. U Value calculations

Multiple regression equation
The use of the correlation method in studying the dependence of a characteristic-result on several factor characteristics is formed according to a scheme similar to a simple (paired) correlation. One of

Elasticity coefficients
For a meaningful and accessible description (interpretation) of the results reflecting the correlation-regression dependence between characteristics through various regression equations, usually use

The essence of a time series
All phenomena of the surrounding world undergo continuous changes over time; over time, i.e. their volume, level, composition, structure, etc. change over time. it is advisable to note that according to

Agricultural enterprises
(at the beginning of the year; thousand) physical units) Indicators 2000 2001 2002 2003

Main indicators of the time series
Comprehensive analysis time series will allow us to reveal and characterize the patterns that manifest themselves at different stages of the development of phenomena, to identify trends and features of the development of these phenomena. In pro

Absolute level increases
One of the simplest indicators of the development of dynamics is the absolute increase in level. Absolute growth is the difference between two levels of a time series. Absolute

Level growth rate
To characterize the relative rate of change, the growth rate indicator. The growth rate is the ratio of one level of a dynamic series to another, taken as the basis of comparison. growth rate may be

Level growth rate
If the absolute rate of increase in levels of a dynamic series is characterized by the magnitude of absolute increases, then the relative rate of increase in levels is characterized by the rate of increase. Temp at

Absolute value of one percent increase
When analyzing time series, the task is often set: to find out in what absolute values ​​a 1% increase (decrease) in levels is expressed, since in a number of cases, when the rate of growth decreases (slows down)

For 1999-2003
Years Productivity, c/ha Absolute yield increases, c/ha Growth rate, % Growth rate, %

Techniques for aligning time series
To identify time patterns, it usually requires a fairly large number of levels, a time series. If a time series consists of a limited number of levels, then its alignment

Methods for analytical alignment of time series
Revealing general trend development of time series levels can be carried out using various techniques analytical alignment, which is most often carried out

Analytical alignment using exponential curve
In some cases, for example, during the process of commissioning and development of new production capacities, the dynamic series may be characterized by a rapidly growing change in levels, i.e. chain ones

Analytical alignment using a second order parabola
If the dynamic series under study is characterized by positive absolute increases, with the acceleration of the development of levels, then the alignment of the series can be carried out using a second-order parabola.

Analytical alignment using the hyperbola equation
If the dynamic series is characterized by fading absolute decreases in levels (for example, the dynamics of labor intensity of products, labor supply in agriculture, etc.), then the level

The concept of interpolation and extrapolation of time series levels
In some cases, it is necessary to find the values ​​of the missing intermediate levels of a time series based on its known values. In such cases, the interpolation technique can be used,

In statistics, grouping is understood as the division of a statistical population into groups that are homogeneous in any significant respect, the characteristics of selected groups of a system of indicators in order to identify types of phenomena, and the study of their structure and interrelationships. In the process of summarizing the primary material, phenomena are divided into groups according to various varying characteristics.

A variable characteristic is a characteristic that takes on different meanings for individual units of the population.

Tasks facing the group:

1. Identification of those parts of a mass phenomenon that are homogeneous in quality and conditions of development, and in which the same natural influences of factors operate;

2. Study and characterization of the structure and structural changes in the populations under study;

3. The influence of the relationship between individual characteristics of the phenomenon being studied.

The main issue of the grouping method is the choice of grouping characteristic, from the right choice which determines the results of the group and the work as a whole.

After selecting a grouping characteristic, it is important to divide the population units into groups.

The selected groups must be qualitatively homogeneous, and also have a sufficiently large number of units, which will allow them to display typical features characteristic of mass phenomena. That's why great attention is given to determining the number of groups and their boundaries. When solving this issue, the type of grouping, the nature of the grouping characteristic and the objectives of the study are taken into account.

Let's group the farms. Let’s take the milk yield from one cow, in kg, as a grouping characteristic. There is a large difference in the level of milk productivity in farms in this zone. This sign varies

Using the method of statistical grouping, differences between farms in terms of the level of milk productivity of cows are varied.

The first stage of work is the construction of a ranked series. In the ranked series, all values ​​are arranged in ascending or descending order of the grouping characteristic.

The ranked series shows the intensity of changes in values ​​ranging from 1364 to 6270 kg. grouping characteristic, using it it is possible to establish sharp transitions and identify units that are very different in the value of the characteristic.

To compile a ranked series, we use data on the milk productivity of cows on farms in the Achinsk zone for 2003.

We will present the results in Table 2.1.

Table 2.1.

Farm name

Milk yield from 1 cow per year, kg

JSC "Beloozerskoe"

JSC Sharypovskoe

JSC "Ivanovskoe"

CJSC "Orakskoe"

JSC "Sakhaptinskoe"

SJSC "Anashenskoe"

CJSC "Energetik"

SZAO "Baraitskoe"

SZAOOT "Igryshenskoe"

Agricultural production complex "Beloyarsky"

AOZT "Pavlovskoe"

JSC "Adadymskoe"

JSC "Krasnopolyanskoye"

JSC "Dorokhovskoe"

JSC "Glyadenskoye"

SKhAOZT "Legostaevskoe"

CJSC "Altaiskoye"

JSC "Svetlolobovskoe"

JSC "Podsosensky"

JSC "Krutoyarskoe"

LLP p/z "Achinsky"

JSC "Avangard"

JSC "Malinovsky"

SAZT "Navoselovskoye"

JSC "Nazarovskoye"

For greater clarity, we will depict the ranked series graphically, for which we will construct a Galton flint.

To do this, we will place on the x-axis in ascending order of the grouping characteristic, and along the axis - the value of the milk productivity of cows corresponding to the farm, Fig. 2.1.

Ranked series of farms according to the level of milk productivity of cows.

Let's analyze the data from the ranked series and its graph - evaluate the nature and intensity of the differences between farms and try to identify significantly different groups of farms. There are significant differences between farms in the level of milk productivity of cows: the range of fluctuations is 6270 - 1364 = 4906 kg per cow, and the level of milk production in farm No. 25 is 4.6 times higher than in No. 1 (6720/1364).

The increase in milk productivity from farm to farm occurs mainly gradually, smoothly, without large jumps, but the milk yield per cow of the last farm differs significantly from the rest of the farms. But this farm cannot be separated into a separate group, and since the differences between the other farms are small, there are no jumps and there is no other data indicating the boundaries of transition from one group to another, then typical groups can be distinguished based on the analysis of the ranked series in in this case it is forbidden. Therefore, next it is necessary to construct an interval series of farm distribution.

An interval variation series makes it possible to get an idea of ​​the number and nature of groups. First, let us decide the question of the number of groups into which the totality of farms should be distributed. The approximate number n can be determined using formula (2.1):

n = 1+3.322LgN, (2.1)

where n is the number of groups, N is a set of units.

This dependence can serve as a guideline when determining the number of groups in this case, if the distribution of population units for a given characteristic approaches normal and equal intervals in groups are used.

n = 1+3.322Lg25 = 1+3.322*1.5 ~ 6 groups.

i = (X max - X min) / n, where (2.2)

X max - maximum value of the attribute in the studied ranked series,

X min - minimum value of the attribute in the studied ranked series,

n - number of groups.

I = (6270 - 1364)/6 = 818

Now we will construct a series of distribution of farms with this interval value, the value of X min = 818 kg, then the upper limit of the first group will be: Xmin+i = 2182 kg. This boundary is also the boundary of the second group. The boundaries of other groups are determined similarly. The obtained data are presented in table 2.2.

Table 2.2

The interval series of distribution of state farms (Table 2.2.) shows that in the aggregate, farms with milk yield per cow (11 farms) from 1364 to 2182 kg predominate. Groups of farms with high productivity are small in number, so they should be combined, that is, a secondary grouping should be carried out, since there is not a single farm in the fourth group, and one in the fifth, but each group must have at least three farms.


Interval series of distribution of farms according to the level of milk productivity of cows.

Table 2.3

Secondary grouping of farms according to the level of milk productivity of cows.

Comparing the number of farms within each group, we can say that the number of farms with low level productivity is greater than with high to a large extent.

They are presented in the form of distribution series and are presented in the form.

A distribution series is one of the types of groupings.

Distribution range— represents an ordered distribution of units of the population being studied into groups according to a certain varying characteristic.

Depending on the characteristic underlying the formation of the distribution series, they are distinguished attributive and variational distribution rows:

  • Attributive- are called distribution series constructed according to qualitative characteristics.
  • Distribution series constructed in ascending or descending order of values ​​of a quantitative characteristic are called variational.
The variation series of the distribution consists of two columns:

The first column provides quantitative values ​​of the varying characteristic, which are called options and are designated . Discrete option - expressed as an integer. The interval option ranges from and to. Depending on the type of options, you can construct a discrete or interval variation series.
The second column contains number of specific option, expressed in terms of frequencies or frequencies:

Frequencies- these are absolute numbers that show how many times a given value of a characteristic occurs in the aggregate, which denote . The sum of all frequencies must be equal to the number of units in the entire population.

Frequencies() are frequencies expressed as a percentage of the total. The sum of all frequencies expressed as percentages must be equal to 100% in fractions of one.

Graphic representation of distribution series

The distribution series are visually presented using graphical images.

The distribution series are depicted as:
  • Polygon
  • Histograms
  • Cumulates
  • Ogives

Polygon

When constructing a polygon on horizontal axis(x axis) the values ​​of the varying characteristic are plotted, and on the vertical axis (y axis) the frequencies or frequencies are plotted.

The polygon in Fig. 6.1 is based on data from the micro-census of the population of Russia in 1994.

6.1. Household size distribution

Condition: Data is provided on the distribution of 25 employees of one of the enterprises according to tariff categories:
4; 2; 4; 6; 5; 6; 4; 1; 3; 1; 2; 5; 2; 6; 3; 1; 2; 3; 4; 5; 4; 6; 2; 3; 4
Task: Construct a discrete variation series and depict it graphically as a distribution polygon.
Solution:
In this example, the options are the employee's pay grade. To determine frequencies, it is necessary to calculate the number of employees with the corresponding tariff category.

The polygon is used for discrete variation series.

To construct a distribution polygon (Figure 1), we plot the quantitative values ​​of the varying characteristic - variants - along the abscissa (X) axis, and frequencies or frequencies along the ordinate axis.

If the values ​​of a characteristic are expressed in the form of intervals, then such a series is called interval.
Interval series distributions are depicted graphically in the form of a histogram, cumulate or ogive.

Statistical table

Condition: Data on the size of deposits is given 20 individuals in one bank (thousand rubles) 60; 25; 12; 10; 68; 35; 2; 17; 51; 9; 3; 130; 24; 85; 100; 152; 6; 18; 7; 42.
Task: Construct an interval variation series with equal intervals.
Solution:

  1. The initial population consists of 20 units (N = 20).
  2. Using the Sturgess formula, we determine the required number of groups used: n=1+3.322*lg20=5
  3. Let's calculate the value equal interval: i=(152 - 2) /5 = 30 thousand rubles
  4. Let's divide the initial population into 5 groups with an interval of 30 thousand rubles.
  5. We present the grouping results in the table:

With such a recording of a continuous characteristic, when the same value occurs twice (as the upper limit of one interval and the lower limit of another interval), then this value belongs to the group where this value acts as the upper limit.

bar chart

To construct a histogram, the values ​​of the boundaries of the intervals are indicated along the abscissa axis and, based on them, rectangles are constructed, the height of which is proportional to the frequencies (or frequencies).

In Fig. 6.2. shows a histogram of the distribution of the Russian population in 1997 by age group.

Rice. 6.2. Distribution of the Russian population by age groups

Condition: The distribution of 30 employees of the company by monthly salary is given

Task: Display the interval variation series graphically in the form of a histogram and cumulate.
Solution:

  1. The unknown boundary of the open (first) interval is determined by the value of the second interval: 7000 - 5000 = 2000 rubles. With the same value we find the lower limit of the first interval: 5000 - 2000 = 3000 rubles.
  2. To construct a histogram in a rectangular coordinate system, we plot along the abscissa axis the segments whose values ​​correspond to the intervals of the varicose series.
    These segments serve as the lower base, and the corresponding frequency (frequency) serves as the height of the formed rectangles.
  3. Let's build a histogram:

To construct cumulates, it is necessary to calculate the accumulated frequencies (frequencies). They are determined by sequentially summing the frequencies (frequencies) of previous intervals and are designated S. The accumulated frequencies show how many units of the population have a characteristic value no greater than the one under consideration.

Cumulates

The distribution of a characteristic in a variation series over accumulated frequencies (frequencies) is depicted using a cumulate.

Cumulates or a cumulative curve, unlike a polygon, is constructed from accumulated frequencies or frequencies. In this case, the values ​​of the characteristic are placed on the abscissa axis, and accumulated frequencies or frequencies are placed on the ordinate axis (Fig. 6.3).

Rice. 6.3. Cumulates of household size distribution

4. Let's calculate the accumulated frequencies:
The cumulative frequency of the first interval is calculated as follows: 0 + 4 = 4, for the second: 4 + 12 = 16; for the third: 4 + 12 + 8 = 24, etc.

When constructing cumulates, the accumulated frequency (frequency) of the corresponding interval is assigned to it upper limit:

Ogiva

Ogiva is constructed similarly to the cumulate with the only difference being that the accumulated frequencies are placed on the abscissa axis, and the characteristic values ​​are placed on the ordinate axis.

A type of cumulate is a concentration curve or Lorentz plot. To construct a concentration curve, a scale scale in percentages from 0 to 100 is plotted on both axes of the rectangular coordinate system. At the same time, the accumulated frequencies are indicated on the abscissa axis, and the accumulated values ​​of the share (in percent) by volume of the characteristic are indicated on the ordinate axis.

The uniform distribution of the characteristic corresponds to the diagonal of the square on the graph (Fig. 6.4). With an uneven distribution, the graph represents a concave curve depending on the level of concentration of the trait.

6.4. Concentration curve

New on the site

>

Most popular