By Professor Omar Hasan Kasule Sr.

Learning Objectives:

Interpretation of a scatter-gram

Definition of simple linear correlation

Interpretation of the Pearson linear correlation coefficient


Key Words and Terms:

Correlation, correlation coefficient

Correlation, linear correlation

Correlation, negative correlation

Correlation, perfect correlation

Correlation, positive correlation

Dependent Variable

Independent Variable

Relation, linear relation

Relation, non-linear relation


Scatter-plot matrix




Correlation analysis is used as preliminary data analysis before applying more sophisticated methods. Correlation describes the relation between 2 random variables (bivariate relation) about the same person or object with no prior evidence of inter-dependence. Correlation indicates only association; the association is not necessarily causative. Correlation analysis has the objectives of describing the relation between x and y, prediction of y if x is known, prediction of x if y is known, studying trends, and studying the effect of a third factor on the relation between x and y.


The first step in correlation analysis is to inspect a scatter plot of the data to obtain a visual impression of the data layout and identify out-liers. Then Pearson’s coefficient of correlation (product moments correlation), r, is the commonest statistic for linear correlation. It has a complicated formula but can be computed easily by modern computers. It essentially is a measure of the scatter of the data.



Inspecting a scatter-gram helps interpret the coefficient. The correlation is not interpretable for small samples. Values of 0.25 - 0.50 indicate a fair degree of association. Values of 0.50 - 0.75 indicate moderate to fair relation. Values above 0.75 indicate good to excellent relation. Values of r = 0 indicate either no correlation or that the two variables are related in a non-linear way. In perfect positive correlation, r=1. In perfect negative correlation, r=-1. In cases of no correlation, r=0. In cases of no correlation with r=0, the scatter-plot is circular. The linear correlation coefficient is not used when the relation is non-linear, outliers exist, the observations are clustered in 2 or 4 groups, and  if one of the variables is fixed in advance.



The Spearman rank correlation coefficient is used for small data sets for which the Pearson linear correlation coefficient would be invalid.

Professor Omar Hasan Kasule Sr. August 2005