PayScale often receives questions about how many salary survey employee profiles we have.
Our answer is we have enough, and the number is growing rapidly. 🙂
This begs the question, how large a salary survey data set is enough? How many data points are required for PayScale data to be truly significant? The number needed depends on what questions are being asked. In this post, I’ll look at the questions the United States Bureau of Labor Statistics and PayScale typically ask, and the amount of data each needs to handle statistical fluctuations.
You can experience our data techniques first-hand by trying the PayScale salary survey.
People are often surprised how small a sample can be and still be significant, meaning accurate and useful.
For example, the United States Bureau of Labor Statistics produces monthly reports on the country’s work force using a sample of only 60,000 U.S. households to represent the more than 110 million U.S. households; that is only about 1 in 2000.
Working to Find the National Unemployment Rate
This is the right size sample for the United States Bureau of Labor Statistics, because they are trying to answer broad questions such as, “What is the average pay of all workers in the U.S.?” or “What is the national unemployment rate?” As long as they have sampled the 110 million U.S. households in an unbiased fashion (a subject for another post), this data sample is more than adequate for those questions.
For example, those 60,000 U.S. households will likely have at least 50,000 people who are working or looking for work. If the true or population unemployment rate is 4% (rate for all workers and would-be workers), then this sample should have 4% x 50,000 = 2000 unemployed. What are the possible statistical fluctuations? The sample unemployment rate will be between 3.78% and 4.22% over 99% of the time. Close enough for government work. 🙂
Excel Accounting Worksheet
I calculated this statistical range using the binomial distribution function in Excel. My results are consistent with the errors quoted by the United States Bureau of Labor Statistics . For a shortcut way to calculate this (formally called the 99% confidence interval), simply use +/-2.5 times the square root of the number of data points. 2.5 x SQRT(2000) is 112, which gives (2000-112)/50,000 to (2000+112)/50,000 or 3.78% to 4.22%.
The one thing I learned in physics research, that has universal applicability, is how to take the square root of the right number to understand the uncertainty or error in a measurement. It is a miracle to me that nature follows such a simple rule.
Enough about the United States Bureau of Labor Statistics, what about PayScale?
Typical Dental Hygienist Pay
PayScale’s focus is on what I like to call “micro-markets.” We do not answer national questions like the United States Bureau of Labor Statistics. Instead, we answer local market questions like, “What is the typical dental hygienist pay with 5 years of experience ,working in the city of Seattle?” or “What is the typical range of hourly wages for a warehouse laborer who can drive a forklift with 2 years of experience in Dallas?”
For each of these questions, surprising small data sets are sufficient to give a good answer. For the “typical pay” type question, as few as 5 to 10 of the right data points are enough.
PayScale attempts to answer literally millions of slightly different questions (e.g., dental hygienist pay in Oklahoma City, 10 years of experience, etc.). We need our data set of millions in order to answer all these questions accurately.
In a future post, I will explain why (as few as) 5 well-matched salary survey profiles is enough to find the “typical pay.” In the meantime, figure out whether you are being paid the typical amount by completing our salary survey.
Dr. Al Lee
- Compare your salary: Get a free Salary Report