In descriptive statistics, there are three concepts which help us describe our data: central tendency (how typical it is), dispersion (how varied it is) and distribution (how data is distributed). In this post I will talk about the central tendency, in other words, determining the AVERAGE of your data. I will discuss three main concepts here: mean, median and mode. I will also discuss the mild and extreme outliers and explain the tricky procedure of identifying them. They actually describe dispersion of data, but because finding them involves some operations with the mean, I decided to show it in this post.
Mean
Ex means sum of the scores X, thus Ex = X1+X2+X3...+Xn.
Median
For example, let's say we have a set of following scores: 8, 20, 1, 3, 1, 12, 5, 1, 6. If we put them in order we get 1, 1, 1, 3, 5, 6, 8, 12, 20. Thus, the median is the fifth score (since there are 9 of them), which is 5.
When there is an odd number of scores, we would normally calculate the average of the two middle scores. Say, if there was another score in the set above - for example 2, - the median would be an average of the 5th and 6th scores, which equal 3 and 5. So, the median equals 4 for the set of
1, 1, 1, 2, 3, 5, 6, 8, 12, 20.
If there was a score of 13 in that set (instead of 2), the median would still equal the average between the 5th and 6th scores, but in this case they would be 5 and 6. Thus, the median equals 5.5 for the set of 1, 1, 1, 3, 5, 6, 8, 12, 13, 20.
Mode
In the example above, the frequency (f) of the score 1 equals 3 (since it occurs three times in the set), which means that 1 is the mode of the set of 1, 1, 1, 3, 5, 6, 8, 12, 20
Sometimes there can be more than 1 mode, and in this case it is called Bi-modal distribution.
Outliers
To do so, you need to calculate the interquartile range, which is the range of scores unaffected by outliers. To do so, arrange the scores from the smallest to the largest one and simply ignore 25% of the lowest and 25% of the highest scores. Since the outliers are unusually high or low scores, they will not affect the remaining 50%. Interquartile range equals the difference between the highest and the lowest score of these 50%.
Lets imagine that we have a set of numbers: 124, 118, 115, 65, 122, 120. Let us put them in order: 65, 115, 118, 120, 122, 124. Deleting the extreme quarters leaves us with 118, 120. Now, the interquartile range is 120 - 118 = 2.
Then, to identify outliers, we multiply the interquartile range by 1.5, which in our case equals 2*1.5 = 3. The outliers among the low scores are the scores which are smaller than the smallest score of the interquartile range minus this number; in our example,
118 - 3 = 115. Any score smaller than 115 is an outlier, so in our set 65 is an outlier.
The outliers among the high scores are the scores which are bigger than a sum of the biggest score of the interquartile range and this number; in our example, 120 + 3 = 123. Thus, 124 is an outlier.
This formula, however, identifies mild outliers. To find extreme outliers, multiply the interquartile range by 3 rather than 1.5. In our case it will be 2*3 = 6. Then the procedure is the same: lowest score of the interquartile range minus this number equals 118 - 6 = 112; biggest score of the interquartile range plus this number equals 120 + 6 = 126. Thus, the score of 65 is an extreme outlier (since it is smaller than 112), while 124 is only a mild outlier (since it is NOT bigger than 126).