In the last post, I looked at how hard it was to calculate the median vs. the arithmetic mean ("average") to understand why we ever got in the mess of using mean salary to identify a typical annual salary. To make things simple, I used the example of counting checks and check sizes.
Even for the small data set of 7 days and 15 checks, calculating the median number of checks per day and dollars per check was starting to get laborious. What if you were interested in these numbers for a whole year? How much harder is it to calculate medians vs. means for 365 days and ~750 checks?
How Does a Computer Calculate Numbers?
Let’s look at the typical number of checks per day, for a year. The mean is just as easy to calculate for a year as for a week. It is just the total number of checks (~750) divided by 365 days.
For the median number of checks per day for a year, you need to:
- Do 52 times as much work as for a week to count the number of checks per day for each day.
- Do much more than 52 times as much work to sort the list of numbers of checks per day. In geek speak, if you use the simplest sorting method, you need to do O(n^2) or ~52*52 (~2500) times as much work to sort a year’s worth of checks instead of a week’s.
- Count up 183 from the bottom of the sorted list to find the median number of checks per day.
Finding the Median Number in Data
The same increase in work holds for the typical size or dollar value of checks. To calculate the mean size, just divide the total dollar value of the checks by the number of checks written. The hardest part is if you waited until the end of the year to total up the dollar value of all the checks, since this will take about 52 times as long as for a week.
For the median size of check, it is just as bad as for the median number of checks per day:
- Do 52 times as much work as for a week to write down the dollar value of each check.
- Do much more than 52 times as much work to sort this list. You could easily do ~2500 times as much work to sort a year’s worth of checks instead of a week’s.
- Count up ~375 (half of 750) from the bottom of the sorted list to find the median number of checks per day.
Calculating the median number is much more labor intensive than calculating the mean. This is always true: the work needed to calculate the median always grows rapidly with more data, and much faster than the work needed for the mean.
Imagine the labor involved in calculating something like the median salary of everyone in the US! No wonder mean salary was picked for typical or average salary.
Median Mean and Mode Relation
If you are reasonably mentally stable, you just would not do this much work by hand for large data sets. While the mean is a less accurate measure of typical than the median, the difficulty of calculating the median for anything but the smallest data sets rules it out for hand calculations.
The scientists of the 1800’s were reasonably mentally stable, so they made do with the mean for a “typical” value for a data set. In the next post, we will look at why the mean remained even after the first personal computers introduced easy calculation, and why the standard deviation was used for “typical range” of values for a data set.
Until then, find out where your salary stands. Are you above or below the median?
Dr. Al Lee