By Professor Omar Hasan Kasule Sr.

Learning Objectives:

Use of averages in data summary

Definition, properties, advantages, and disadvantages of various types of averages

Relations among the various averages

Choice of average to use


Key Words and Terms:

Arithmetic mean, indexed mean

Arithmetic mean, robust mean

Arithmetic mean, the midrange

Arithmetic mean, weighted mean

Mean, arithmetic mean

Mean, geometric mean

Mean, harmonic mean




Unit Outline


A. Biological Basis

B. Theoretical Basis

C. Purpose

D. Common Measures



A. Arithmetic Mean

B. Geometric Mean

C. Harmonic Mean

D. Weighted Mean

E. Indexed Mean



A. Definition

B. Properties

C. Advantages:

D. Disadvantages



A. Definition

B. Mathematical Properties

C. Advantages

D. Disadvantages




Biological phenomena vary around the average. The average represents what is normal by being the point of equilibrium. The average is a representative summary of the data using one value. Three averages are commonly used: the mean, the mode, and the median. There are 3 types of means: the arithmetic mean, the geometric mean, and the harmonic mean. The most popular is the arithmetic mean. The arithmetic mean is considered the most useful measure of central tendency in data analysis. The geometric and harmonic means are not usually used in public health. The median is gaining popularity. It is the basis of some non-parametric tests as will be discussed later. The mode has very little public health importance.



The arithmetic mean is the sum of the observations' values divided by the total number of observations and reflects the impact of all observations. The robust arithmetic mean is the mean of the remaining observations when a fixed percentage of the smallest and largest observations are eliminated. The mid-range is the arithmetic mean of the values of the smallest and the largest observations. The weighted arithmetic mean is used when there is a need to place extra emphasis on some values by using different weights. The indexed arithmetic mean is stated with reference with an index mean. The consumer price index (CPI) is an example of an indexed mean. The arithmetic mean has 4 properties under the central limit theorem (CLT) assumptions: the sample mean is an unbiased estimator of the population mean, the mean of all sample means is the population mean, the variance of the sample means is narrower than the population variance, and the distribution of sample means tends to the normal as the sample size increases regardless of the shape of the underlying population distribution.


The arithmetic mean enjoys 4 desirable statistical advantages: best single summary statistic, rigorous mathematical definition, further mathematical manipulation, and stability with regard to sampling error. Its disadvantage is that it is affected by extreme values. It is more sensitive to extreme values than the median or the mode. The geometric mean (GM) is defined as the nth root of the product of n observations and is less that the arithmetic mean for the same data. It is used if the observations vary by a constant proportion, such as in serological and microbiological assays, to summarize divergent tendencies of much skewed data. It exaggerates the impact of small values while it diminishes the impact of big values. Its disadvantages are that it is cumbersome to compute and it is not intuitive. The harmonic mean (HM) is defined as the arithmetic mean of the sum of reciprocals for a series of values. It is used in economics and business and not in public health. Its computation is cumbersome and it is not intuitive.



The mode is the value of the most frequent observation. It is rarely used in science and its mathematical properties have not been explored. It is intuitive, easy to compute, and is the only average suitable for nominal data. It is useless for small samples because it is unstable due to sampling fluctuation. It cannot be manipulated mathematically. It is not a unique average; one data set can have more than 1 mode.



The median is value of the middle observation in a series ordered by magnitude. It is intuitive and is best used for erratically spaced or heavily skewed data. The median can be computed even if the extreme values are unknown in open-ended distributions. It is less stable to sampling fluctuation than the arithmetic mean.

Prof Omar Hasan Kasule, Sr. August 2005