Home

ISLAMIC MEDICAL EDUCATION RESOURCES-03

0309-CONDITIONAL LOGISTIC REGRESSION FOR PAIRED DATA

Paper written for the 17th Annual SAS Users’ Malaysia Conference held at the Renaissance Hotel Kuala Lumpur on 4th September 2003 by Professor Omar Hasan Kasule MB ChB (MUK), MPH, DrPH (Harvard) Deputy Dean Faculty of Medicine UIA Kuantan Pahang EM omarkasule@yahoo.com

ABSTRACT

This paper describes experience of using SAS to carry out conditional logistic regression on a paired data set. The aim of the analysis was to establish whether specific health education interventions made a change in opinions about cancer. Respondents were asked about opinions on cancer before and after the health education intervention. SAS conditional logistic regression was used to identify interventions that were independent predictors of change in opinions about cancer.

 

INTRODUCTION

We had experience in logistic regression using SAS routines for independent data. However while working on a large government-funded project on cancer education we came across paired data that required conditional logistic regression. There was no documentation in the SAS manuals and products available to us on how to carry out the analysis. We made several calls to the SAS representative office in Kuala Lumpur and failed to get an immediate answer. They however contacted the SAS institute in the US who faxed the relevant instructions and we carried out the analysis successfully. The analysis turned out to be surprisingly very simple. It involved using the difference between the indicator variables of the paired data values as a response variable and running the regression like that of independent data. The results of the analysis were published in the International Medical Journal of June 2002.

 

Logistic regression is a type of non-linear regression that has become very popular for analysis of categorical data. In logistic regression, the dependent variable, y, is stated as the logit which is the logarithmic transformation of a proportion or a probability. In logistic regression the outcome, variable y, is binary or dichotomized. The derivation of the logistic model is simple and straightforward. If the outcome y is dichotomous, it can take on only two values, 0 and 1. We can define p = Pr(y=1). The odds of y=1 can be computed as p/(1-p). The logit is defined as the log transformation of the odds thus logit (pi) = natural logarithm of pi/(1-pi) = a + b1x1 +  bnxn. Simple mathematical manipulation we can define p from above as pi = 1 / [1 + e-(a + b1x1 … bnxn).  The parameters are fitted by MLE. Fitting by least squares is less satisfactory as show below. The logistic function written as ex/1 + ex is the inverse of the logit function written as log (p/1-p).

 

Logistic regression is very useful in epidemiological analysis for 2 reasons: (a) a dichotomized outcome variable and (b) derivation of the odds ratio directly from the regression coefficient.

 

The techniques of logistic regression used for independent data can be adapted for paired or matched data. For 1:1 matched data, variables are manipulated in such a way that the difference between each pair is used as explanatory variable and the usual logistic regression model is fitted. The proportional hazards procedure is used to fit a conditional logistic regression model is fitted for 1:M or N:M matched data. A stratum is formed for each matched pair based on age or some other variable.  A survival variable is created such that all cases in a stratum have the same event time and the controls are censored at a later time. The survival variable is 1 for cases and 0 for controls.

 

When using statistical packages to model the logistic relation care must be taken to make sure that the right response is being modelled. The packages normally model the logit of the non-event (y=0) by default. The usual outputs of a logistic regression are: the parameter estimate (the logistic regression coefficient), the standard error of the estimate, the Wald chisquare which is defined as {b/se(b)}2, the p-value being the probability of a result higher than the given value of chi square, the standardized estimate which is defined as b/{(p2/3)/s}1/2, the odds ratio (OR) defined as the exponent of b = eb, the 95% confidence intervals for OR, the global chi square, and 6 statistics that describe the association of predicted and observed probabilities: concordant pairs, discordant pairs, tied pairs, Somer’s D, Gamma, Tau-a.

 

METHODS

Study subjects

The study was carried out in three out in 6 districts. Two secondary schools were selected at random from each district; all selected schools agreed to participate in the study. Some schools were allocated to the intervention group whereas others were allocated to the non-intervention or control group. Form 4 students and their teachers completed base-line questionnaires of KAP. This was followed by an intervention package that consisted of lectures, video shows, posters, and distribution of brochures on cancer. A post-intervention questionnaire survey of KAP was carried out after the end of the intervention, about six months after the baseline questionnaire.

 

Interventions

Cancer education materials, brochures and posters, currently used in Malaysia were collected. Some new brochures were made. A video on cancer education was prepared in a question and answer format. Seminars were held in the schools at which students and teachers were informed about various aspects relating to cancer KAP. They were given an opportunity to ask questions. Posters on cancer were displayed in the school. Brochures on cancer KAP were distributed. A video was shown. In some schools pathological specimens of cancer lesions were displayed.

 

Statistical Analysis

Matched analysis using the MacNemar statistic was carried out to identify KAP indicators that changed significantly at the repeat survey. Conditional logistic multivariate analysis appropriate for paired data was used to study how specific intervention modalities affected change of KAP while controlling for putative confounding factors. The response variables were defined by subtracting the score of the first from that of the second survey. Since scores were scored as 1 or 2, the response variables took on the dichotomous values of 0 and 1. The independent variables were: intervention group, attending a seminar, seeing a poster, receiving a brochure, and watching a video.

 


RESULTS

Tables 1 and 2 show the data and the results of the analysis on opinions about cancer.

 

TABLE 1: OPINIONS ON CANCER: EFFECT OF INTERVENTION MODALITIES ON CHANGE AT REPEAT SURVEY (BIVARIATE ANALYSIS)

Opinion

 

Seminar

Poster

Brochure

Video

n

%

P

n

%

P

n

%

P

n

%

P

Cancer is a serous problem

Agree

Pre

592

96.1

0.336

907

96.4

0.037

567

97.1

0.123

425

95.7

0.032

Post

585

95.0

888

94.4

557

95.4

410

92.3

Disagree

Pre

693

97.1

0.069

382

97.5

0.655

712

96.4

0.197

859

97.2

0.414

Post

680

95.2

380

96.9

702

95.0

853

96.5

Some cancers can be detected early

Agree

Pre

552

91.4

0.286

854

91.3

0.274

531

92.0

0.052

402

91.2

0.346

Post

542

89.7

841

90.0

513

88.9

394

89.3

Disagree

Pre

644

91.9

0.147

345

92.5

0.102

659

91.7

0.331

793

92.1

0.070

Post

629

89.7

333

89.3

649

90.3

773

89.8

Cancer is a rare and affects the unlucky

Agree

Pre

97

16.9

0.439

144

16.2

0.403

76

14.0

0.033

67

16.3

0.128

Post

106

18.4

156

17.6

99

18.2

82

20.0

Disagree

Pre

100

15.0

0.100

50

14.0

0.020

119

17.3

0.695

129

15.6

0.270

Post

120

18.0

70

19.6

124

18.0

144

17.4

Some cancers are hereditary

Agree

Pre

309

53.1

0.001

495

55.4

0.008

322

58.6

0.026

237

55.6

0.086

Post

360

61.9

541

60.5

352

64.0

257

60.3

Disagree

Pre

403

59.9

0.618

220

60.1

0.933

386

55.4

0.383

474

57.3

0.207

Post

395

58.7

219

59.8

400

57.4

496

60.0

 

TABLE 2: OPINIONS ON CANCER: INDEPENDENT EFFECTS OF INTERVENTION MODALITIES ON CHANGE AT REPEAT SURVEY (MULTIVARIATE CONDITIONAL LOGISTIC REGRESSION ANALYSIS)

 

Intervention School

Seminar

Poster

Brochure

Video

b

P

OR

b

P

OR

b

P

OR

b

P

OR

b

P

OR

Cancer can be cured if detected early

-0.4763

0.1111

0.621

-0.7249

0.0179

0.484

-0.5948

0.0628

0.552

0.0348

0.9108

1.035

0.1354

0.6687

1.145

Cancer is a serious problem

0.2836

0.2286

1.328

0.1594

0.4770

1.173

0.5793

0.0460

1.785

-0.4711

0.0403

0.624

0.6520

0.0040

1.919

All cancers can be detected early

0.2721

0.0448

1.313

0.1206

0.3544

1.128

0.2234

0.1467

1.250

0.0117

0.9312

1.012

-0.1080

0.4322

0.898

Some cancers are hereditary

0.1746

0.1821

1.191

-0.2848

0.0258

0.752

-0.0876

0.5499

0.916

-0.0225

0.8658

0.978

-0.1090

0.4232

0.897

Cancer is contagious

0.0169

0.9067

1.017

0.0448

0.7444

1.046

0.0954

0.5573

1.100

-0.3510

0.0145

0.704

0.2811

0.0509

1.325

 

CONCLUSION

SAS can easily be employed for logistic regression analysis of matched or paired data with satisfactory results.

Professor Omar Hasan Kasule Sr. September 2003