I am sometimes asked, “why doesn’t the PayScale Salary Report and Research Center show the standard deviation of the data? (See Wikipedia for the (useless) mathematical definition of standard deviation.)

People are are interested in the standard deviation, because it attempts to give the *typical variation* in salaries. The first thing they calculate with it is a *typical range *of salaries, by simply adding and subtracting it from the mean.

The mean salary or median salary alone only give the *typical* salary, which is an imcomplete story. The range of salaries paid is also useful. For example, the median salary of nurses in our database is about $56,850/year. Is $70,000/year *atypically *high, or within the *typical range?*

The variation in salary is caused by many factors such as location, experience, skills, shift, etc. In our salary report, we attempt to make this range as small as possible by asking about all these factors. However, there always is some residual variation, for the multitude of reasons that we don’t capture (e.g., employee is married to the boss’s daughter).

I hope I have just convinced you that it is good to know what the typical range of salaries are, and explained that standard deviation is an attempt to measure that. So why don’t we report it?

Because it is a horrible, horrible way to measure the typical variation in salaries, even worse than the mean is for calculating the typical salary. That it was ever invented is a curse upon the community of knowledge seeking people.

The root of the problem is once again in what is *typical*. For example, an average person would expect that the typical range of salaries for nurses would cover what “most” nurses are paid. For example, if 100 nurses were asked what they were paid, most should report a salary within the range. “Most” is a little fuzzy, but a good range would have 50 to 90 of the 100 nurses reporting salaries within it.

**In theory**, standard deviation fits common sense: for a perfect normal distibution, 68% of samples drawn from it fall within +/- one standard devation of the mean. If salaries followed a normal distribution perfectly, we would expect 68 of those 100 nurses to have a salary within 1 standard deviation of the mean.

**In reality**, when real data is collected from imperfect people, and salaries are set by human consent, the relationship between standard deviation and the typical range of salaries is weak at best.

Is there a better way to measure the typical range? Absolutely, and that is what we use in PayScale salary reports. The range of salaries between the 25th and 75th percentile is what most (50%) of employees are paid. If you want a broader concept of “most”, look at the range between the 10th and 90th percentiles: 80% of people are paid in this range.

nb. Percentile is a simple concept: If the 25th percentile nurse salary is $40,000, that means 25% of nurses (25 out of 100) make less than $40,000, and 75% make more than $40,000. The median is also called the 50th percentile: 50% of nurses make less than the $48,000 median, and 50% make more. Similarly, a 75th percentile nurse salary of $58,000 means that 75% of nurses make less than $58,000, and 25% make more.

Using the range from 25th percentile to 75th for the *typical range* of salaries passes the “just makes sense” test: if 100 nurses are asked their salary, “most” (about 50%) will have a salary between $40,000 and $58,000.

How do I know this? Because we have asked thousands of registered nurses in the US exactly this question. The range of $40,000 to $58,000 comes from finding the 25th and 75th percentile in this data.

Now for why standard deviation is so bad. The standard deviation for this same data is $19,000 and the mean is $50,000. Hence the mean +/- 1 standard deviation is $31,000 to $69,000. If reality were theory, 68% of nurses would be paid in this range. However, it is not, and **over 80%** of nurses’ salaries fall within one standard deviation of the mean.

By looking at the nurse salaries, I found that the middle 68% of nurses’ salaries fall within about $13,000 of the median (16th percentile was $36,000 and 84th percentile is $62,000). If reality were theory, and salaries followed normal distributions, the standard deviation would have been $13,000. But reality is just not *normal* (in the statistical sense ðŸ™‚

Even worse than being misleading for non-normal distributions, standard deviation is incredibly sensitive to the most *atypical* values. For example, typos happen. Two nurses in this data sample added an extra zero to their salary, boosting their pay to ~$400,000.

Fixing these two data points has no measurable effect on the 25th and 75th percentile salaries: after all, these are only 2 data points out of thousands. However, fixing these two data points changes the standard deviation by **10% **($19,000 to $17,000)!

Why would anyone ever use a measure of the *typical range* of salaries that is so incredibly sensitive to the most *atypical* salaries? I blame the lack of computers in the 1800s, but that is the subject for another post. ðŸ™‚

Needless to say, we won’t be reporting standard deviations anytime soon in PayScale Salary Reports.

You’re a good. Distribution of salaries follows a Pareto distribution. Claiming that the standard deviation of a normal distribution is a poor approximation of the underlying population makes YOU useless, not the mathematical concept of variance. Irresponsible article. Your existence is futile.

Then eliminate the outliers and report the st. dev of that.

The point being made is that outliers are very hard to eliminate. Such as, the example of a person being married to the boss’s daughter or the fact that salaries are based on what people type in which may be bias or have a typo. There are always going to be outliers which cannot be eliminated and thus you cannot achieve a true normal distribution. So, the Sd(X) is not a good indicator of spread. Just use the Inter-Quartile Range as suggested which probably is more precise than Sd(X).