Sometimes we are asked, are the PayScale salary reports unbiased? Since PayScale does not explicitly select the people who complete our salary survey, how do we know the ~2% of employees in the US who have completed our survey are representative of the US working population as a whole?
The extent to which any aggregate statistic, like typical salary, is biased (inaccurate) depends on how it is constructed, and what question it is trying to answer. In this post, I’ll look at three aspects of bias in reporting:
- All aggregate statistics lie: There is no such thing as a "true" typical answer
- The case of the civil engineer: Even well-measured typical values are wrong, simply because they are aggregates
- Sampling bias: A sample may not be suitable for the question being asked
In future posts, I will look at bias in government wage calculations, and then come back to whether PayScale data are biased for the questions we try to answer.
Is your salary biased high or low? The PayScale Salary Calculator is a quick and easy way to compare positions. But when you want powerful salary data and comparisons customized for your exact position, be sure to build a complete profile by taking PayScale’s full salary survey.
All Aggregate Statistics Lie
Before we go any further, contemplate this: all aggregate statistics lie. Aggregate statistics, like the mean, median, and percentiles we show in our salary reports, are designed to ignore the differences between individual data points in order to form a "typical" or "aggregate" answer.
By ignoring these differences, aggregate statistics lie, at a minimum, by omission. Eric was right to complain that the average wage for civil engineering graduates we reported was not representative of his pay. Kirin, who has only been on the job a couple of years, is yet another example of a recent civil engineering grad not earning the typical salary we reported.
Elsewhere on the PayScale site, we even report a different typical beginning civil engineer salary. The differences between these two aggregate statistics are caused by different definitions of "typical": mean vs. median values, total cash compensation vs. base salary only, different data collection dates, etc. That the values differ by <5% is surprisingly close.
This is a second way aggregate statistics lie: by not collecting and processing data in exactly the way that will answer the reader’s question. Eric expected our aggregate data to be representative of graduates of his program. The data we used for civil engineering graduates were not close to representing Eric or his fellow students, and this was by design.
What is a biased sample?
Aggregate statistics can lie, even when they are based on all the data, like the government has for census or income tax based aggregate statistics. For the case above, it not that PayScale has too little civil engineering data. More data on all civil engineering graduates’ pay would not make our typical starting civil engineer salary any closer to Eric’s expectation.
Sample bias is another way aggregate statistics lie: the sample (e.g., sub-set of employees surveyed) examined for a report is not like the population (e.g., all US employees) about which the question was asked. When the sample is not enough like, for the purposes of the report, the population as a whole, the sample is called biased.
Does it matter that PayScale does not have all the data on all employees? Like the Bureau of Labor Statistics, we only have a sample. Since we do not select the people who complete our salary survey, statisticians ask whether the sample PayScale is "unbiased" and "random" for the purpose for which it will be used. What do these terms mean?
What do unbiased and random mean?
Random means, once you have defined a set (population), e.g., the 50,000 people who went to see the Rolling Stones in Seattle in October, the people in a sample (sub-set), e.g., 100 who are asked whether the show was good or bad, are unbiased representatives of the set, except for possible statistical fluctuations.
Before talking about what biases could be let’s understand statistical fluctuations. To make it simple, consider 50,000 flips of a coin as the population. What would be a sample? The first 10 flips, the last 10 flips, flips number 23,101 to 23,110 etc. What would be a random sample?
- Put the result of each of the 100,000 flips on a piece of paper
- Put the paper slips in a really big hat
- Shake up the hat
- Pick out ten slips of paper
This is random, because there is nothing special about the 10 that are picked. In particular, the mechanism used to pick the sub-set of 10 does not make any of the 10 special.
Note that none of these samples of 10 will actually have the same result as the 50,000 for the ratio of heads to tail. For the 50,000, A typical result will be that 49.8% of the flips will be heads, and it would be very rare for less than 49% or more than 51% to be heads.
What will be typical results for the random sample? Occasionally (about 1 in 100 times) there will be as few as 10% heads or as many as 90%. This range of possibilities is caused by statistical fluctuations. Fortunately, it is easy to calculate how large statistical fluctuations can be.
Statistical fluctuations become less important the more data the sample has. For example, if the random sample is 100 flips, ratios less than 38% or more than 62% are very unlikely. Bump the sample up to 1000 flips, still only 2% of the total, and less than 46% or more than 54% become unlikely.
Which takes us back to the Rolling Stones. If I select people to ask whether the concert was good by picking randomly (something like the hat trick) from all the tickets scanned in at the concert, and I make sure I find the person who sat in each of the 100 seats I select, the sample will be a random, unbiased sample of everyone at the concert.
If 50 of the 100 say the concert was bad, I will be 99% confident that at least 38% of all people going to the concert thought it was bad, even though I asked only 0.2% of all people who were at the concert. That is the amazing power of a carefully selected random, unbiased sample: it does not take much data in the sample to represent the population as a whole.
If I ask people to call into a morning radio show the next day, the mechanism for choosing the sample is no longer constructed to be random and unbiased.
How typical is a person who calls into a radio show, of all people who go to concerts? Are Rolling Stones fans more likely to call? In decades of listening to morning radio programs, I have heard hundreds of phone-in reviews; I never heard anyone say a concert was bad.
I have attended concerts that were bad, in my opinion. Funny, I never called in to report this 😉
However, the radio shows never claimed they were giving the typical experience of all attendees at the show, just the typical experience of people who call into the radio program. If I make the leap to assuming that is everyone’s "typical" experience, that is my problem.
The question being asked is so critical for determining whether a given sample is biased.
In a future post, I’ll come back to the questions we can answer with PayScale data. For those pay questions, our data are as unbiased and random as any other source available.
Of course, our aggregate statistics will still lie 🙂
Is your pay, in aggregate, all that you deserve? Find out with the PayScale salary calculator.
Dr. Al Lee