Home Removal Square deviation in excel. Calculation of dispersion, root mean square (standard) deviation, coefficient of variation in Excel

Square deviation in excel. Calculation of dispersion, root mean square (standard) deviation, coefficient of variation in Excel

Variance is a measure of dispersion that describes the comparative deviation between data values ​​and the mean. It is the most used measure of dispersion in statistics, calculated by summing and squaring the deviation of each data value from the mean. The formula for calculating variance is given below:

s 2 – sample variance;

x av—sample mean;

n sample size (number of data values),

(x i – x avg) is the deviation from the average value for each value of the data set.

To better understand the formula, let's look at an example. I don’t really like cooking, so I rarely do it. However, in order not to starve, from time to time I have to go to the stove to implement the plan of saturating my body with proteins, fats and carbohydrates. The data set below shows how many times Renat cooks every month:

The first step in calculating variance is to determine the sample mean, which in our example is 7.8 times per month. The rest of the calculations can be made easier using the following table.

The final phase of calculating variance looks like this:

For those who like to do all the calculations in one go, the equation would look like this:

Using the raw count method (cooking example)

There are more effective method calculation of variance, known as the "raw counting" method. Although the equation may seem quite cumbersome at first glance, it is actually not that scary. You can make sure of this, and then decide which method you like best.

is the sum of each data value after squaring,

is the square of the sum of all data values.

Don't lose your mind right now. Let's put this all into a table and you will see that there are fewer calculations here than in the previous example.

As you can see, the result was the same as when using the previous method. Advantages this method become apparent as the sample size (n) increases.

Variance calculation in Excel

As you probably already guessed, Excel has a formula that allows you to calculate variance. Moreover, starting with Excel 2010, you can find 4 types of variance formula:

1) VARIANCE.V – Returns the variance of the sample. Boolean values ​​and text are ignored.

2) DISP.G - Returns the variance of population. Boolean values ​​and text are ignored.

3) VARIANCE - Returns the variance of the sample, taking into account Boolean and text values.

4) VARIANCE - Returns the variance of the population, taking into account logical and text values.

First, let's understand the difference between a sample and a population. The purpose of descriptive statistics is to summarize or display data so that you quickly get the big picture, an overview so to speak. Statistical inference allows you to make inferences about a population based on a sample of data from that population. The totality represents everything possible outcomes or measurements that are of interest to us. A sample is a subset of a population.

For example, we are interested in the totality of a group of students from one of the Russian universities and we need to determine the average score of the group. We can calculate the average performance of students, and then the resulting figure will be a parameter, since the whole population will be involved in our calculations. However, if we want to calculate the GPA of all students in our country, then this group will be our sample.

The difference in the formula for calculating variance between a sample and a population is the denominator. Where for the sample it will be equal to (n-1), and for the general population only n.

Now let's look at the functions for calculating variance with endings A, in the description of which it is said that the calculation takes into account text and boolean values. IN in this case when calculating the variance of a certain data array, where there are not numeric values Excel will interpret text and false Boolean values ​​as equal to 0, and true Boolean values ​​as equal to 1.

So, if you have a data array, calculating its variance will not be difficult using one of the Excel functions listed above.

Good afternoon

In this article, I decided to look at how standard deviation works in Excel using the STANDARDEVAL function. I just haven’t described or commented on it for a very long time, and also simply because it is a very useful function for those who study higher mathematics. And helping students is sacred; I know from experience how difficult it is to master. In reality, standard deviation functions can be used to determine the stability of products sold, create prices, adjust or form an assortment, and so on, no less. useful analyzes your sales.

Excel uses several variations of this variance function:


Mathematical theory

First, a little about the theory of how mathematical language you can describe the function standard deviation for using it in Excel, for analyzing, for example, sales statistics data, but more on that later. I warn you right away, I will write a lot of incomprehensible words...)))), if anything below in the text, look immediately for practical application in the program.

What exactly does standard deviation do? It produces an estimate of the standard deviation random variable X relative to her mathematical expectation based on an unbiased estimate of its variance. Agree, it sounds confusing, but I think students will understand what we are actually talking about!

First, we need to determine the “standard deviation”, in order to subsequently calculate the “standard deviation”, the formula will help us with this: The formula can be described as follows: it will be measured in the same units as the measurements of a random variable and is used when calculating the standard arithmetic mean error when constructions are made confidence intervals, when testing hypotheses for statistics or when analyzing a linear relationship between independent quantities. The function is defined as Square root from the variance of the independent variables.

Now we can define and standard deviation is an analysis of the standard deviation of a random variable X relative to its mathematical perspective based on an unbiased estimate of its variance. The formula is written like this:
I note that all two estimates are biased. At general cases It is not possible to construct an unbiased estimate. But an estimate based on an estimate of the unbiased variance will be consistent.

Practical implementation in Excel

Well, now let’s move away from the boring theory and see in practice how the STANDARDEVAL function works. I will not consider all variations of the standard deviation function in Excel; one is enough, but in examples. As an example, let’s look at how sales stability statistics are determined.

First, look at the spelling of the function, and as you can see, it is very simple:

STANDARD DEVIATION.Г(_number1_;_number2_; ….), where:


Now let's create an example file and, based on it, consider how this function works. Since to carry out analytical calculations it is necessary to use at least three values, as in principle in any statistical analysis, I took conditionally 3 periods, this could be a year, a quarter, a month or a week. In my case - a month. For maximum reliability, I recommend taking as many periods as possible, but no less than three. All the data in the table is very simple for clarity of operation and functionality of the formula.

First, we need to calculate the average value by month. We will use the AVERAGE function for this and get the formula: = AVERAGE(C4:E4).
Now, in fact, we can find the standard deviation using the STANDARDEVAL.G function, in the value of which we need to enter the sales of the product for each period. The result will be a formula of the following form: =STANDARD DEVIATION.Г(C4;D4;E4).
Well, half the work is done. Next step we form “Variation”, this is obtained by dividing by the average value, standard deviation and converting the result into percentages. We get the following table:
Well, the basic calculations are completed, all that remains is to figure out whether sales are stable or not. Let us take as a condition that deviations of 10% are considered stable, from 10 to 25% these are small deviations, but anything above 25% is no longer stable. To obtain the result according to the conditions, we will use a logical one and to obtain the result we will write the formula:

IF(H4<0,1;"стабильно";ЕСЛИ(H4<0,25;"нормально";"не стабильно"))

All ranges are taken for clarity; your tasks may have completely different conditions.
To improve data visualization, when your table has thousands of positions, you should take the opportunity to apply certain conditions that you need or use to highlight certain options with a color scheme, this will be very clear.

First, select the ones for which you will apply conditional formatting. In the “Home” control panel, select “Conditional Formatting” and in the drop-down menu, select “Rules for highlighting cells” and then click the menu item “Text contains...”. A dialog box appears in which you enter your conditions.

After you have written down the conditions, for example, “stable” - green, “normal” - yellow and “unstable” - red, we get a beautiful and understandable table in which you can see what to pay attention to first.

Using VBA for the STDEV.Y function

Anyone interested can automate their calculations using macros and use the following function:

Function MyStDevP(Arr) Dim x, aCnt&, aSum#, aAver#, tmp# For Each x In Arr aSum = aSum + x "calculate the sum of the array elements aCnt = aCnt + 1 "calculate the number of elements Next x aAver = aSum / aCnt "average value For Each x In Arr tmp = tmp + (x - aAver) ^ 2 "calculate the sum of the squares of the difference between the array elements and the average value Next x MyStDevP = Sqr(tmp / aCnt) "calculate STANDARDEV.G() End Function

Function MyStDevP(Arr)

Dim x , aCnt & , aSum #, aAver#, tmp#

For Each x In Arr

aSum = aSum + x "calculate the sum of the array elements

Statistics uses a huge number of indicators, and one of them is calculating variance in Excel. If you do this yourself manually, it will take a lot of time and you can make a lot of mistakes. Today we'll look at how to break down mathematical formulas into simple functions. Let's look at some of the simplest, fastest and most convenient calculation methods that will allow you to do everything in a matter of minutes.

Calculate variance

The variance of a random variable is the mathematical expectation of the squared deviation of a random variable from its mathematical expectation.

We calculate based on the general population

To calculate mat. Waiting for the program to use the DISP.G function, and its syntax looks like this: “=DISP.G(Number1;Number2;…)”.

A maximum of 255 arguments can be used, no more. Arguments can be prime numbers or references to the cells in which they are specified. Let's look at how to calculate variance in Microsoft Excel:

1. The first step is to select the cell where the calculation result will be displayed, and then click on the “Insert Function” button.

2. The function management shell will open. There you need to look for the “DISP.G” function, which can be in the “Statistical” or “Full alphabetical list” category. When it is found, select it and click “OK”.


3. A window with the function arguments will open. In it you need to select the line “Number 1” and on the sheet select the range of cells with the number series.


4. After this, the calculation results will be displayed in the cell where the function was entered.

This is how you can easily find variance in Excel.

We make calculations based on the sample

In this case, the sample variance in Excel is calculated with the denominator indicating not the total number of numbers, but one less. This is done for a smaller error using the special function DISP.V, the syntax of which is =DISP.V(Number1;Number2;...). Algorithm of actions:

  • As in the previous method, you need to select the cell for the result.
  • In the Function Wizard, you should find “DISP.B” under the “Full Alphabetical List” or “Statistical” category.


  • Next, a window will appear, and you should proceed in the same way as in the previous method.

Video: Calculating variance in Excel

Conclusion

Variance in Excel is calculated very simply, much faster and more convenient than doing it manually, because the mathematical expectation function is quite complex and calculating it can take a lot of time and effort.

Among the many indicators that are used in statistics, it is necessary to highlight the calculation of variance. It should be noted that performing this calculation manually is a rather tedious task. Fortunately, Excel has functions that allow you to automate the calculation procedure. Let's find out the algorithm for working with these tools.

Dispersion is an indicator of variation, which is the average square of deviations from the mathematical expectation. Thus, it expresses the spread of numbers around the average value. Calculation of variance can be carried out both for the general population and for the sample.

Method 1: calculation based on the population

To calculate this indicator in Excel for the general population, use the function DISP.G. The syntax of this expression is as follows:

DISP.G(Number1;Number2;…)

In total, from 1 to 255 arguments can be used. The arguments can be either numeric values ​​or references to the cells in which they are contained.

Let's see how to calculate this value for a range with numeric data.


Method 2: calculation by sample

Unlike calculating a value based on a population, in calculating a sample, the denominator does not indicate the total number of numbers, but one less. This is done for the purpose of error correction. Excel takes this nuance into account in a special function that is designed for this type of calculation - DISP.V. Its syntax is represented by the following formula:

DISP.B(Number1;Number2;…)

The number of arguments, as in the previous function, can also range from 1 to 255.


As you can see, the Excel program can greatly facilitate the calculation of variance. This statistic can be calculated by the application, either from the population or from the sample. In this case, all user actions actually come down to specifying the range of numbers to be processed, and Excel does the main work itself. Of course, this will save a significant amount of user time.

The standard deviation function is already from the category of higher mathematics related to statistics. There are several options for using the Standard Deviation function in Excel:

  • STANDARDEV function.
  • STANDARD DEVIATION function.
  • STDEV function

We will need these functions in sales statistics to identify the stability of sales (XYZ analysis). This data can be used both for pricing and for creating (adjusting) the assortment matrix and for other useful sales analyses, which I will definitely talk about in future articles.

Preface

Let's look at the formulas first in mathematical language, and then (below in the text) we will analyze in detail the formula in Excel and how the resulting result is used in the analysis of sales statistics.

So, Standard Deviation is an estimate of the standard deviation of a random variable x regarding its mathematical expectation based on an unbiased estimate of its variance)))) Don’t be afraid of incomprehensible words, be patient and you will understand everything!

Description of the formula: The standard deviation is measured in units of measurement of the random variable itself and is used when calculating the standard error of the arithmetic mean, when constructing confidence intervals, when statistically testing hypotheses, when measuring the linear relationship between random variables. Defined as the square root of the variance of the random variable

Now standard deviation is an estimate of the standard deviation of a random variable x relative to its mathematical expectation based on an unbiased estimate of its variance:

Dispersion;

- i th element of the selection;

Sample size;

Arithmetic mean of the sample:

It should be noted that both estimates are biased. In the general case, it is impossible to construct an unbiased estimate. However, the estimate based on the unbiased variance estimate is consistent.

Three sigma rule() - almost all values ​​of a normally distributed random variable lie in the interval. More strictly, with approximately 0.9973 probability, the value of a normally distributed random variable lies in the specified interval (provided that the value is true and not obtained as a result of sample processing). We will use a rounded interval of 0.1

If the true value is unknown, then you should use not, but s. Thus, the rule of three sigma is transformed into the rule of three s. It is this rule that will help us determine the stability of sales, but more on that later...

Now Standard Deviation Function in Excel

I hope I didn't bore you too much with math? Perhaps someone will need this information for an essay or some other purposes. Now let's look at how these formulas work in Excel...

To determine the stability of sales, we do not need to delve into all the options for the standard deviation functions. We will use just one:

STDEV function

STDEV(number1;number2;... )

Number1, number2,..- from 1 to 30 numeric arguments corresponding to the general population.

Now let's look at an example:

Let's create a book and a makeshift table. You will download this example in Excel at the end of the article.

To be continued!!!

Hello again. Well!? I had a free minute. Let's continue?

And so the stability of sales with the help STDEV functions

For clarity, let’s take a few improvised goods:

In analytics, be it a forecast, research or anything else related to statistics, it is always necessary to take three periods. This could be a week, a month, a quarter or a year. It is possible and even best to take as many periods as possible, but not less than three.

I specifically showed exaggerated sales, where the naked eye can see what is selling consistently and what is not. This will make it easier to understand how the formulas work.

And so we have sales, now we need to calculate the average sales values ​​by period.

The formula for the average value is AVERAGE (period data), in my case the formula looks like this = AVERAGE (C6: E6)

We apply the formula to all products. This can be done by grabbing the right corner of the selected cell and dragging it to the end of the list. Or place the cursor on the column with the product and press the following key combinations:

Ctrl + Down moves the cursor to the top of the list.

Ctrl + Right, the cursor moves to the right side of the table. Once again to the right and we will get to the column with the formula.

Now we clamp

Ctrl + Shift and press up. This way we will select the area where the formula will be drawn.

And the key combination Ctrl + D will drag the function where we need it.

Remember these combinations, they really increase your speed in Excel, especially when you work with large arrays.

The next stage, the standard departure function itself, as I already said, we will use only one STDEV

We write the function and set the sales values ​​of each period in the function values. If you have sales in the table one after another, you can use a range, as in my formula =STDEV(C6:E6) or list the required cells separated by semicolons =STDEV(C6;D6;E6)

Now all the calculations are ready. But how do you know what sells consistently and what doesn’t? Let’s just put the convention XYZ where,

X is stable

Y - with small deviations

Z - not stable

To do this, we use error intervals. if fluctuations occur within 10%, we will assume that sales are stable.

If between 10 and 25 percent, it will be Y.

And if the variation value exceeds 25%, this is not stability.

To correctly set the letters for each product, we will use the IF formula. Learn more about. In my table this function will look like this:

IF(H6<0,1;"X";ЕСЛИ(H6<0,25;"Y";"Z"))

Accordingly, we extend all the formulas for all names.

I will try to immediately answer the question, Why the intervals of 10% and 25%?

In fact, the intervals may be different, it all depends on the specific task. I specifically showed you exaggerated sales values, where the difference is visible to the eye. Obviously, product 1 is not sold consistently, but the dynamics show an increase in sales. We leave this product alone...

But here is product 2, there is already obvious destabilization. And our calculations show Z, which tells us that sales are not stable. Product 3 and Product 5 show stable performance, please note that the variation is within 10%.

Those. Product 5 with scores of 45, 46 and 45 shows a variation of 1%, which is a stable number series.

But Product 2 with indicators 10, 50 and 5 show a variation of 93%, which is NOT a stable number series.

After all the calculations, you can put a filter and filter out stability, so if your table consists of several thousand items, you can easily identify which ones are not stable in sales or, conversely, which ones are stable.

“Y” didn’t work out in my table, I think for clarity of the number series, it needs to be added. I'll draw Product 6...

You see, the number series 40, 50 and 30 shows 20% variation. There doesn’t seem to be a big error, but the spread is still significant...

And so to summarize:

10.50.5 - Z is not stable. Variation more than 25%

40,50,30 - Y you can pay attention to this product and improve its sales. Variation less than 25% but more than 10%

45,46,45 - X is stability, you don’t need to do anything with this product yet. Variation less than 10%

That's all! I hope I explained everything clearly, if not, ask what is not clear. And I will be grateful to you for every comment, be it praise or criticism. This way I will know that you are reading me and that you, which is very IMPORTANT, are interested. And accordingly, new lessons will appear.



New on the site

>

Most popular