We want to figure out how average pay differs for men and women. The tricky thing is that not everybody agrees about what “average” means. In fact, if you want to start a tiny war within the data analytics department at PayScale, just ask if the word “average” can refer to anything but the mean. Since this is my post, you get my opinion: “average” is a blanket term that describes any statistic that reasonably gets at what statisticians call central tendency. In simpler terms, averages show us the center of the data.
This gif shows the distribution of pay for men and women, and where the means, medians and modes lie. This pay data is from the PayScale salary survey and includes only workers earning less than $200,000.1 Since the blue curve is taller than the red one at higher pay values, this distribution shows that men are more likely to earn higher salaries than women. We can use the difference in averages to describe this as the gender pay gap.2 Here we can see that while the means, medians and modes return very different values, the pay gap does not change much for each measure. In other words, the difference between men and women’s pay is roughly the same whether we look at mean, median or mode. So, what are these averages and why are they different?
WHAT IS THE MEAN?
The first average we’ll discuss is the mean. The mean is what most people think about when we talk about the average. The mean indicates what the value would be if each observation was equalized. In the case of compensation, this would mean taking a group’s pay, putting it into a single pile, and then distributing it equally across all members of the group.
Imagine the group in question is you and nine other friends, and that you are all putting your pay into one big pile. If one of your friends becomes the CEO of a large company, that will significantly increase the money in the pile that we are dividing up equally, and the mean will go up a lot. For this reason, we say that the mean is sensitive to outliers; one very large or very small data point can dramatically shift the mean. This isn’t good or bad, but it is important for understanding why means, medians and modes can be different. Mean pay is $73,800 for men and $57,800 for women, which translates to a gender pay gap of 78 cents on the dollar.
what is the median?
The median and mean are the most common measures of the central tendency, but the median uses a different approach. The median is a number that describes the exact midpoint of the data points in a given set. To find this, we find the point that splits the number of observations cleanly in half, so that 50 percent of the observations are larger than the median, and 50 percent are smaller. If we are thinking about compensation, we want to pick a number such that half of the people in our group earn less than this, and half earn more.
If we go back to your group of friends, five of your friends earn less than the median and the other five earn more. In this case, if your friend becomes the CEO, the median will only change if their previous salary was lower than the median. If so, a new observation will become the middle data point, and therefore we will have a new median. If the new CEO already earned more than the median, it will not move at all. Usually the median will move less than the mean.3 As a result, medians are less sensitive to outliers than means. Median pay is $62,400 for men and $50,000 for women, both much lower than mean pay for those groups. However, the ratio is not much different – the gender pay gap we calculate using the median is 78 cents on the dollar.
What is the mode?
The mode is the most common value in a data set. Because it can behave a little strangely, modes are less popular than means and medians. When looking at a salary data, the mode is the number that the largest number of individuals earn. We can we fit a smoothing curve that shows the relatively likelihood of people earning roughly a certain value. The mode will be the highest point on that curve, but that may not accurately depict the “middle” of the dataset. In some cases, there may be even multiple values tied for the highest number of observations. In these instances, we have multiple modes, which may be similar values or may be values that are far apart within the distribution.
One positive characteristic of the mode is that it is the least sensitive to outliers. If your friend becomes a CEO, that will not change what is the most common income among your friends.4 However, if everyone earns different values, as is often the case with salaries, then the mode is useless for describing the dataset. This variability makes the mode an unreliable measure of central tendency. The modes are lower than both the median and the mean: $50,900 for men and $39,700 for women. Once again, this leads to a similar pay gap calculation.
Which AVerage is right?
In this case, the difference in the pay gap between the averages we use to calculate it is rather small. We think that the median is the best measure of typical compensation, so we use that to calculate the gender pay gap.
The mean, median and mode might all be very similar. This is a common scenario that happens when we have neat, symmetric bell curves, and is the easiest to handle. However, there is nothing that says that data need to follow a bell curve, and as the shape of our data gets weirder, the values of these averages start to change. In cases where these measures do not yield similar values, the correct approach is not to pick the statistic that best matches the point you want to make. Instead, you should evaluate the distribution to decide which measure is the best summary of the data. The pay distributions we presented here have “fat right tails”, meaning that the numbers on the right side of the peak of the curve are more likely than the numbers on the left side of the peak. That fat right tail pulled the mean up more than the median and the mode. If the left tail were fat instead, the mean would be the lowest of the three averages. The most important thing to remember is to be transparent about what you choose to report and why.
What about the gender pay gap? Lots of variables determine why some people are paid more or less than others, and some of these make sense (unlike the case of two actors being compensated very differently for a reshoot). At PayScale, we believe that data-informed discussions surrounding pay ultimately lead to better outcomes for both employees and their employers. You can ensure that your pay is appropriate for your position by completing our salary survey and engaging in a more sophisticated conversation with your employer about why you are paid what you are.
- I only include pay numbers under $200,000 to keep the charts easy to read. Since men are more likely than women to earn over $200,000, this restriction actually slightly reduces the gender pay gap. I used roughly 104,000 women and 88,000 men who submitted salary profile data in the first few months of 2018. I fit a smoothing curve to show what the distribution of the pay data looks like. Nevertheless, there are some humps at “round” values, like $50,000, $75,000 and $100,000.
- We typically use the median to describe the gender pay gap in our research at PayScale.
- We can cook up a case where the median will move more than the mean, but these are strange cases. Here, if five of your friends were already lottery winners, the median could jump more than the mean. Usually, though, it doesn’t.
- Again, there’s a case where one person’s pay changing could change the mode. If a lot of your other friends are CEOs, adding one more CEO to the mix might make a very high level pay the most common, changing the mean. For the most part, though, we expect the mode to be the least sensitive to outliers.