Interpretation of a scatter-gram

Definition of simple linear correlation

Interpretation of the Pearson linear
correlation coefficient

*Key
Words and Terms:*

Correlation, correlation coefficient

Correlation, linear correlation

Correlation, negative correlation

Correlation, perfect correlation

Correlation, positive correlation

Dependent Variable

Independent Variable

Relation, linear relation

Relation, non-linear relation

Scatter-gram

Scatter-plot matrix

UNIT SYNOPSIS

DESCRIPTION

Correlation analysis is used as preliminary data analysis before applying more sophisticated methods. Correlation
describes the relation between 2 random variables (bivariate relation) about the same person or object with no prior evidence
of inter-dependence. Correlation indicates only association; the association is not necessarily causative. Correlation analysis
has the objectives of describing the relation between x and y, prediction of y if x is known, prediction of x if y is known,
studying trends, and studying the effect of a third factor on the relation between x and y.

The first step in correlation analysis is to inspect a scatter plot of the data to obtain a visual impression of
the data layout and identify out-liers. Then Pearson’s coefficient of correlation (product moments correlation), r,
is the commonest statistic for linear correlation. It has a complicated formula but can be computed easily by modern computers.
It essentially is a measure of the scatter of the data.

PEARSON'S CORRELATION COEFFICIENT, r

Inspecting a scatter-gram helps interpret the coefficient. The correlation is not interpretable for small samples.
Values of 0.25 - 0.50 indicate a fair degree of association. Values of 0.50 - 0.75 indicate moderate to fair relation. Values
above 0.75 indicate good to excellent relation. Values of r = 0 indicate either no correlation or that the two variables are
related in a non-linear way. In perfect positive correlation, r=1. In perfect negative correlation, r=-1. In cases of no correlation,
r=0. In cases of no correlation with r=0, the scatter-plot is circular. The linear correlation coefficient is not used when
the relation is non-linear, outliers exist, the observations are clustered in 2 or 4 groups, and if one of the variables is fixed in advance.

NON-PARAMETRIC CORRELATION ANALYSIS

The Spearman rank correlation coefficient is used for small data sets
for which the Pearson linear correlation coefficient would be invalid.