I am sometimes asked, “why doesn’t the PayScale Salary Report and Research Center show the standard deviation of the data? (See Wikipedia for the (useless) mathematical definition of standard deviation.)
People are are interested in the standard deviation, because it attempts to give the typical variation in salaries. The first thing they calculate with it is a typical range of salaries, by simply adding and subtracting it from the mean.
The mean salary or median salary alone only give the typical salary, which is an imcomplete story. The range of salaries paid is also useful. For example, the median salary of nurses in our database is about $56,850/year. Is $70,000/year atypically high, or within the typical range?
The variation in salary is caused by many factors such as location, experience, skills, shift, etc. In our salary report, we attempt to make this range as small as possible by asking about all these factors. However, there always is some residual variation, for the multitude of reasons that we don’t capture (e.g., employee is married to the boss’s daughter).
I hope I have just convinced you that it is good to know what the typical range of salaries are, and explained that standard deviation is an attempt to measure that. So why don’t we report it?
Because it is a horrible, horrible way to measure the typical variation in salaries, even worse than the mean is for calculating the typical salary. That it was ever invented is a curse upon the community of knowledge seeking people.
The root of the problem is once again in what is typical. For example, an average person would expect that the typical range of salaries for nurses would cover what “most” nurses are paid. For example, if 100 nurses were asked what they were paid, most should report a salary within the range. “Most” is a little fuzzy, but a good range would have 50 to 90 of the 100 nurses reporting salaries within it.
In theory, standard deviation fits common sense: for a perfect normal distibution, 68% of samples drawn from it fall within +/- one standard devation of the mean. If salaries followed a normal distribution perfectly, we would expect 68 of those 100 nurses to have a salary within 1 standard deviation of the mean.
In reality, when real data is collected from imperfect people, and salaries are set by human consent, the relationship between standard deviation and the typical range of salaries is weak at best.
Is there a better way to measure the typical range? Absolutely, and that is what we use in PayScale salary reports. The range of salaries between the 25th and 75th percentile is what most (50%) of employees are paid. If you want a broader concept of “most”, look at the range between the 10th and 90th percentiles: 80% of people are paid in this range.
nb. Percentile is a simple concept: If the 25th percentile nurse salary is $40,000, that means 25% of nurses (25 out of 100) make less than $40,000, and 75% make more than $40,000. The median is also called the 50th percentile: 50% of nurses make less than the $48,000 median, and 50% make more. Similarly, a 75th percentile nurse salary of $58,000 means that 75% of nurses make less than $58,000, and 25% make more.
Using the range from 25th percentile to 75th for the typical range of salaries passes the “just makes sense” test: if 100 nurses are asked their salary, “most” (about 50%) will have a salary between $40,000 and $58,000.
How do I know this? Because we have asked thousands of registered nurses in the US exactly this question. The range of $40,000 to $58,000 comes from finding the 25th and 75th percentile in this data.
Now for why standard deviation is so bad. The standard deviation for this same data is $19,000 and the mean is $50,000. Hence the mean +/- 1 standard deviation is $31,000 to $69,000. If reality were theory, 68% of nurses would be paid in this range. However, it is not, and over 80% of nurses’ salaries fall within one standard deviation of the mean.
By looking at the nurse salaries, I found that the middle 68% of nurses’ salaries fall within about $13,000 of the median (16th percentile was $36,000 and 84th percentile is $62,000). If reality were theory, and salaries followed normal distributions, the standard deviation would have been $13,000. But reality is just not normal (in the statistical sense 🙂
Even worse than being misleading for non-normal distributions, standard deviation is incredibly sensitive to the most atypical values. For example, typos happen. Two nurses in this data sample added an extra zero to their salary, boosting their pay to ~$400,000.
Fixing these two data points has no measurable effect on the 25th and 75th percentile salaries: after all, these are only 2 data points out of thousands. However, fixing these two data points changes the standard deviation by 10% ($19,000 to $17,000)!
Why would anyone ever use a measure of the typical range of salaries that is so incredibly sensitive to the most atypical salaries? I blame the lack of computers in the 1800s, but that is the subject for another post. 🙂
Needless to say, we won’t be reporting standard deviations anytime soon in PayScale Salary Reports.