This paper describes experience of using SAS to carry out conditional logistic regression on
a paired data set. The aim of the analysis was to establish whether specific health education interventions made a change
in opinions about cancer. Respondents were asked about opinions on cancer before and after the health education intervention.
SAS conditional logistic regression was used to identify interventions that were independent predictors of change in opinions
about cancer.
INTRODUCTION
We had experience in logistic regression
using SAS routines for independent data. However while working on a large governmentfunded project on cancer education we
came across paired data that required conditional logistic regression. There was no documentation in the SAS manuals and products
available to us on how to carry out the analysis. We made several calls to the SAS representative office in Kuala Lumpur and failed to get an
immediate answer. They however contacted the SAS institute in the US who faxed the relevant
instructions and we carried out the analysis successfully. The analysis turned out to be surprisingly very simple. It involved
using the difference between the indicator variables of the paired data values as a response variable and running the regression
like that of independent data. The results of the analysis were published in the International Medical Journal of June 2002.
Logistic regression is a type of nonlinear regression that has become very
popular for analysis of categorical data. In logistic regression, the dependent variable, y, is stated as the logit which
is the logarithmic transformation of a proportion or a probability. In logistic regression the outcome, variable y, is binary
or dichotomized. The derivation of the logistic model is simple and straightforward. If the outcome y is dichotomous, it can
take on only two values, 0 and 1. We can define p = Pr(y=1). The odds of y=1 can be computed as p/(1p). The logit is defined
as the log transformation of the odds thus logit (p_{i}) = natural logarithm of p_{i}/(1p_{i}) =
a + b_{1}x_{1} + b_{n}x_{n}. Simple mathematical
manipulation we can define p from above as p_{i} = 1 / [1 + e^{(a + b1x1 … bnxn)}. The parameters are fitted by MLE. Fitting by least squares is less satisfactory as show below. The logistic
function written as e^{x}/1 + e^{x} is the inverse of the logit function written as log (p/1p).
Logistic regression is very useful in epidemiological analysis for 2 reasons:
(a) a dichotomized outcome variable and (b) derivation of the odds ratio directly from the regression coefficient.
The techniques of logistic regression
used for independent data can be adapted for paired or matched data. For 1:1 matched data, variables
are manipulated in such a way that the difference between each pair is used as explanatory variable and the usual logistic
regression model is fitted. The proportional hazards procedure is used to fit a conditional logistic regression model is fitted
for 1:M or N:M matched data. A stratum is formed for each matched pair based on age or some other variable. A survival variable is created such that all cases in a stratum have the same event time and the controls
are censored at a later time. The survival variable is 1 for cases and 0 for controls.
When using statistical packages to model the logistic relation care must
be taken to make sure that the right response is being modelled. The packages normally model the logit of the nonevent (y=0)
by default. The usual outputs of a logistic regression are: the parameter estimate (the logistic regression coefficient),
the standard error of the estimate, the Wald chisquare which is defined as {b/se(b)}^{2}, the pvalue being the probability
of a result higher than the given value of chi square, the standardized estimate which is defined as b/{(p^{2}/3)/s}^{1/2}, the odds ratio (OR) defined as the exponent of b = e^{b}, the 95% confidence intervals
for OR, the global chi square, and 6 statistics that describe the association of predicted and observed probabilities: concordant
pairs, discordant pairs, tied pairs, Somer’s D, Gamma, Taua.
METHODS
Study subjects
The study was carried out in three out in 6 districts. Two secondary schools were selected at
random from each district; all selected schools agreed to participate in the study. Some schools were allocated to the intervention
group whereas others were allocated to the nonintervention or control group. Form 4 students and their teachers completed
baseline questionnaires of KAP. This was followed by an intervention package that consisted of lectures, video shows, posters,
and distribution of brochures on cancer. A postintervention questionnaire survey of KAP was carried out after the end of
the intervention, about six months after the baseline questionnaire.
Interventions
Cancer education materials, brochures and posters, currently used in Malaysia were collected. Some new brochures were made.
A video on cancer education was prepared in a question and answer format. Seminars were held in the schools at which students
and teachers were informed about various aspects relating to cancer KAP. They were given an opportunity to ask questions.
Posters on cancer were displayed in the school. Brochures on cancer KAP were distributed. A video was shown. In some schools
pathological specimens of cancer lesions were displayed.
Statistical Analysis
Matched analysis using the MacNemar statistic was carried out to identify KAP indicators that
changed significantly at the repeat survey. Conditional logistic multivariate analysis appropriate for paired data was used
to study how specific intervention modalities affected change of KAP while controlling for putative confounding factors. The
response variables were defined by subtracting the score of the first from that of the second survey. Since scores were scored
as 1 or 2, the response variables took on the dichotomous values of 0 and 1. The independent variables were: intervention
group, attending a seminar, seeing a poster, receiving a brochure, and watching a video.
RESULTS
Tables 1 and 2 show the data and the results of the analysis on opinions about
cancer.
TABLE
1: OPINIONS ON CANCER: EFFECT OF INTERVENTION MODALITIES ON CHANGE AT REPEAT SURVEY (BIVARIATE ANALYSIS)
Opinion 

Seminar 
Poster 
Brochure 
Video 
n 
% 
P 
n 
% 
P 
n 
% 
P 
n 
% 
P 
Cancer is a serous problem 
Agree 
Pre 
592 
96.1 
0.336 
907 
96.4 
0.037 
567 
97.1 
0.123 
425 
95.7 
0.032 
Post 
585 
95.0 
888 
94.4 
557 
95.4 
410 
92.3 
Disagree 
Pre 
693 
97.1 
0.069 
382 
97.5 
0.655 
712 
96.4 
0.197 
859 
97.2 
0.414 
Post 
680 
95.2 
380 
96.9 
702 
95.0 
853 
96.5 
Some cancers can be detected early 
Agree 
Pre 
552 
91.4 
0.286 
854 
91.3 
0.274 
531 
92.0 
0.052 
402 
91.2 
0.346 
Post 
542 
89.7 
841 
90.0 
513 
88.9 
394 
89.3 
Disagree 
Pre 
644 
91.9 
0.147 
345 
92.5 
0.102 
659 
91.7 
0.331 
793 
92.1 
0.070 
Post 
629 
89.7 
333 
89.3 
649 
90.3 
773 
89.8 
Cancer is a rare and affects the unlucky 
Agree 
Pre 
97 
16.9 
0.439 
144 
16.2 
0.403 
76 
14.0 
0.033 
67 
16.3 
0.128 
Post 
106 
18.4 
156 
17.6 
99 
18.2 
82 
20.0 
Disagree 
Pre 
100 
15.0 
0.100 
50 
14.0 
0.020 
119 
17.3 
0.695 
129 
15.6 
0.270 
Post 
120 
18.0 
70 
19.6 
124 
18.0 
144 
17.4 
Some cancers are hereditary 
Agree 
Pre 
309 
53.1 
0.001 
495 
55.4 
0.008 
322 
58.6 
0.026 
237 
55.6 
0.086 
Post 
360 
61.9 
541 
60.5 
352 
64.0 
257 
60.3 
Disagree 
Pre 
403 
59.9 
0.618 
220 
60.1 
0.933 
386 
55.4 
0.383 
474 
57.3 
0.207 
Post 
395 
58.7 
219 
59.8 
400 
57.4 
496 
60.0 
TABLE 2: OPINIONS ON CANCER: INDEPENDENT EFFECTS OF INTERVENTION MODALITIES ON CHANGE AT REPEAT SURVEY (MULTIVARIATE
CONDITIONAL LOGISTIC REGRESSION ANALYSIS)

Intervention School 
Seminar 
Poster 
Brochure 
Video 
b 
P 
OR 
b 
P 
OR 
b 
P 
OR 
b 
P 
OR 
b 
P 
OR 
Cancer can be cured if detected early 
0.4763 
0.1111 
0.621 
0.7249 
0.0179 
0.484 
0.5948 
0.0628 
0.552 
0.0348 
0.9108 
1.035 
0.1354 
0.6687 
1.145 
Cancer is a serious problem 
0.2836 
0.2286 
1.328 
0.1594 
0.4770 
1.173 
0.5793 
0.0460 
1.785 
0.4711 
0.0403 
0.624 
0.6520 
0.0040 
1.919 
All cancers can be detected early 
0.2721 
0.0448 
1.313 
0.1206 
0.3544 
1.128 
0.2234 
0.1467 
1.250 
0.0117 
0.9312 
1.012 
0.1080 
0.4322 
0.898 
Some cancers are hereditary 
0.1746 
0.1821 
1.191 
0.2848 
0.0258 
0.752 
0.0876 
0.5499 
0.916 
0.0225 
0.8658 
0.978 
0.1090 
0.4232 
0.897 
Cancer is contagious 
0.0169 
0.9067 
1.017 
0.0448 
0.7444 
1.046 
0.0954 
0.5573 
1.100 
0.3510 
0.0145 
0.704 
0.2811 
0.0509 
1.325 
CONCLUSION
SAS can easily be employed for logistic
regression analysis of matched or paired data with satisfactory results.