Home

ISLAMIC MEDICAL EDUCATION RESOURCES-03

0308-SURVIVAL ANALYSIS (PART I)

Enter subhead content here Lecture at the FIFTH ADVANCED ASIAN COURSE IN TROPICAL EPIDEMIOLOGY INSTITUTE FOR MEDICAL RESEARCH KUALA LUMPUR 18-29 AUGUST 2003 By Professor Dr Omar Hasan Kasule, Sr. MB ChB (MUK), MPH, DrPH (Harvard) Deputy Dean for Research, Faculty of Medicine, UIA PO Box 141 Kuantan Pahang MALAYSIA Tel 609 513 2797 Fax 609 513 3615 E-M omarkasule@yahoo.com

Learning Objectives:

Definition and use of survival analysis

Construction of the survival curve using the life-table method

Construction of the survival curve using the Kaplan-Maier method

Comparison of the curves graphically by vision

Interpretation and testing of survival curves

Use of Cox’s Proportional Hazards Regression Model

 

Key Words and Terms:

Censoring

Kaplan-Maier

Life Table

Cox proportional hazards regression

Survival curve

 

AGENDA

 

I. LEARNING OBJECTIVES

 

Pre-test #1

 

II. INTRODUCTION TO SURVIVAL ANALYSIS

 

Pre-test #2

 

III. PROBLEMS OF SURVIVAL ANALYSIS

 

                Pre-test #3

 

IV. METHODS OF SURVIVAL ANALYSIS

 

                Pre-test #4

 

V. COMPARING SURVIVAL CURVES

 

                Survival data set

 

VI. PRACTICAL ASSIGNMENTS

 

PRE TEST #1

1. The following statements are true about survival analysis in general

Survival analysis studies effects of covariates on survival

Survival analysis cannot be used to compare treatments

The use of survival analysis is confined to the medical field

Parametric methods are not used in survival analysis

Life-table and Kaplan-Maier are parametric methods

 

2. The following statements are true about survival analysis in general

The outcome in survival analysis  is time until the event of interest

The term ‘event’ is the same as ‘failure’

Relapse is not considered an event

The outcome in survival analysis is always dichotomous

Survival analysis can be used for the ‘survival’ of machine parts

 

3. The following statements are time periods in survival analysis

Remission duration

Time to relapse

Survival after relapse

Time to diagnosis

Incubation period

 

4. The following are valid ‘zero times’ for survival analysis

Time at point of randomization.

Time at enrolment

Time of the first visit

Time of diagnosis

Time at loss to follow up

 

INTRODUCTION TO SURVIVAL ANALYSIS I

 

Definition of survival analysis

Study of survival duration

Study of the effects of covariates on survival

 

Objectives of survival analysis

Estimation and interpretation of the hazard function

Comparing survival/hazard functions

Assessment of the relationship of explanatory covariates to survival

 

Disciplines that use survival analysis

Clinical medicine

Sociology – event history

Engineering – reliability analysis & failure time analysis

Economics – duration analysis

 

Uses of survival analysis - medical

Follow-up of patients on treatment by experimental therapies

Evaluation of survival after diagnosis

Summarizing and evaluating mortality in different groups

 

Uses of survival analysis – non medical

Survival of electric bulbs

Survival of machine tools

Survival of equipment

Survival of friendships

Survival in political office

Survival of marriage (time to divorce)

 

INTRODUCTION TO SURVIVAL ANALYSIS II

 

Measurement of Time durations in survival analysis

Time to relapse

Remission duration

Survival after relapse

Time to death

Time to a complication

 

Zero times used in survival analysis

Point of randomization (best)

Enrolment

First visit

First symptoms

Diagnosis

Start of treatment

 

Survival data

32 elderly acute leukemia patients

Randomization to 2 treatment groups: full dose (17) and half dose (15)

Full dose group: died 14; censored 3

Half dose group: died 8; censored 7  

 

Data set-up for survival analysis (rectangular file)

ID

Survival Duration

(days/weeks/months)

Status

(died/censored)

Treatment Group

Covariates

(age, sex, gender, etc)

1.

 

 

 

V1

V2

V3

V4

V5

2.

 

 

 

 

 

 

 

 

3.

 

 

 

 

 

 

 

 

4.

 

 

 

 

 

 

 

 

.

 

 

 

 

 

 

 

 

 

PRE TEST #2

1. The following statements are true about causes of censoring

Censoring is loss of information due to incomplete observation

Censoring is caused by withdrawal from the study

Study termination is never a cause of censoring

Loss to follow-up is a cause of censoring

Death due to a competing risk is a cause of censoring

 

2. The following statements are true about types of censoring

In left censoring observation ends before a given point in time

In right censoring the subject last seen alive at a given time is not seen after

Interval censoring is loss to follow up between diagnosis and treatment

Right censoring is less common than left censoring

Random censoring occurs uniformly throughout the study

 

3. The following statements are true about types of censoring

Non-random censoring is due to investigator manipulation

Non-random censoring leads to bias

Random censoring is not related to outcome

In progressive censoring entry and censoring times are different for each subject

Analysis based on the intention to treat is more conservative than censored analysis.

4. The following statements are true about truncation & competing causes of death

In left truncation, only patients surviving a certain time are considered

In right truncation only patients experiencing an event by a given time are considered.

Competing causes of death are one cause of censoring that bias survival estimates.

A study can be designed to ensure that competing causes of death are eliminated

The term competing causes of death refers only to neoplastic diseases

 

PROBLEMS OF SURVIVAL ANALYSIS

The Ideal situation for constructing survival curve

Complete follow up until moment of death

No losses to follow-up

No other causes of death

 

3 Problems of survival analysis

Censoring = loss of information

Truncation = exclusion of subjects from the analysis

Competing causes of death (bias survival estimates)

 

Causes of censoring

Withdrawal from the study

Study termination

Loss to follow-up

Death due to a competing risk

 

Types of censoring

Left: observation ends before a given point in time

Right: subject is last seen alive at a given time and is not followed up subsequently

Interval: mixture of left and right censoring, occurs between 2 given time points

Random: occurs uniformly throughout the study, not related to outcome (no bias)

Non-random: due to investigator manipulation (bias)

Progressive: in studies in which entry and censoring times are different for each subject

 

Types of truncation

Left: only individuals surviving a certain time are included in the sample

Right: only individuals experiencing event of interest by a given time are included

 

PRE-TEST #3

1. The following statements are true about non-parametric survival analysis

The life-table methods better with large data sets

The life table method is preferred if time of occurrence is not measured precisely

The life table method assumes withdrawals at the end of the interval

The Kaplan-Maier method is best used for small data sets

The Kaplan-Maier method is best used for precisely measured occurrence times

In the Kaplan-Maier method time intervals are not fixed in advance.

 

2. The following statements are true about regression methods for survival analysis

The proportional hazards regression method is semi-parametric

Sir David Cox proposed the proportional hazards regression method in 1972

The proportional hazards regression method unpopular for survival analysis

The proportional hazards regression method can be used if data distribution is unknown

The proportional hazards regression method gives valid results only for normally distributed data

 

METHODS OF SURVIVAL ANALYSIS

Parametric methods of survival analysis (used if underlying distribution is known)

Weibull

Lognormal

Gamma

 

Non-parametric methods of survival analysis (used if underlying distribution is unknown)

The life-table method

The Kaplan-Maier method

The Proportional hazards method

 

The life-table (LT) method for drawing a survival curve

Suitable for large data sets

Resorted to if time of occurrence of an event cannot be measured precisely

Leads to bias due to assumptions about the point of withdrawal

 

The Kaplan-Maier (KM) method for drawing a survival curve

Suitable for small data sets

Used only if the time of event occurrence is measured precisely

Improvement on the life-table method in the handling of withdrawals

 

SURVIVAL CURVE USING THE LT METHOD

A table with 8 columns is set up with suitable time intervals fixed by the analyst

 

Col 1

Col 2

Col 3

Col 4

Col 5

Col 6

Col 7

Col 8

t

O

D

W

O-W

P = D /(O-W)

Q = 1 – P

S = Q = (1-P)

0

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Column #1 (t) is the time at the start of the time interval. The first row of the table is assigned time 0.

 

Column #2 (O) is the number of subjects under observation at the start of the time interval.

 

Column #3 (D) is the number who died during the time interval.

 

Column #4 (W) is the number withdrawn during the time interval.

 

Column #5 (O-W) is the number under observation during the interval computed as col 2 – col 4

 

Column #6 (P) is the probability of dying in the interval computed as P = D / O-W

 

Column #7 (Q) is the probability of surviving to the end of the interval computed as Q=1-P.

 

Column #8 (S) is the probability of survival from time 0 until the end of the interval. The probability for

the first row is 1.0. Subsequent probabilities are computed by multiplying Q into the cumulative survival probability (S) of the prior row.

 

The survival probabilities in column #8 are plotted against time in column #1 to generate a survival curve.

 

Two or more curves can be generated depending on the treatment or experimental groups.

 

SURVIVAL CURVE USING THE KM METHOD

A table with 7 columns is set up by the analyst

 

Col 1

Col 2

Col 3

Col 4

Col 5

Col 6

Col 7

t

O

D

W

P = D/O

Q = 1 – P

S =  Q =  (1-P)

0

 

 

 

 

 

1.0

 

 

 

 

 

 

 

 

Column #1 (t) is the occurrence of an event (death or withdrawal)

 

Column #2 (O) is the number of subjects at risk at time, t

 

Column #3 (D) is the number of deaths at time t

 

Column #4 (W) is the number of withdrawals at time t

 

Column #5 (P) is the probability of death at time t computed as P = D/O

 

Column #6 (Q) is the probability of survival at time t computed as the 1 – P

 

Column #7 (S) indicates cumulative survival from time 0 to time t. It is computed by multiplying the row probability of survival (Q) into the cumulative probability of survival (S) of the previous row.

 

SURVIVAL ANALYSIS BY REGRESSION

LIFEREG

Parametric regression that uses MLE

Accommodates left censoring

Tests hypotheses about the shape of the hazard function

Cannot handle time-varying covariates

 

THE POISSON REGRESSION

Used for rare outcomes (<5%) for which PHREG is not appropriate

Models the log transform of incidence at time t as a linear function of covariates

Assumes that incidence does not depend on the elapsed time, t.

 

LOGISTIC, PROBIT, AND GENERAL LINEAR MODELS

Easier and more intuitive than Cox’s regression

Treat time like any other variable

Treat covariates as fixed.

 

PHREG (Cox’s Proportional Hazards)

Robust semi-parametric regression that fits the data well

No selection or assumption of any particular probability model

Uses of partial MLE estimates and proportional hazards

Handles only right censoring

Handles time-dependent covariates

Can handle competing risks

Controls confounding variables

Lacks built-in graphics capability.

 

PRETEST #4

1. The following statements are true about survival curves

The life-table method can be used to construct the survival curve

The Kaplan-Maier method cannot be used to construct survival curves

Survival curves can be compared using statistical methods

If the survival curve for group A is above and parallel to that of group B, then group A has a longer median survival

The survival curve is approximately a reverse-J shape

 

2. The following statements are true about comparing survival curves

Visual inspection of survival curves is completely useless.

Gehan’s generalized Wilcoxon test is a non-parametric test

Cox’s F test is a parametric method

The Wilcoxon test is more sensitive to differences between the curves at the earlier failure times

The log-rank test attaches equal importance to all failure times irrespective of whether they are early or late.

 

3. The following statements are true about comparing survival curves

The MH test is used to compare survival curves

The Log-rank is a parametric test for comparing survival curves

The generalized Wilcoxon is a parametric test for comparing survival curves

The Cox proportional hazards test is a non-parametric test

Survival analysis with 1 curve only and no comparison curve makes no sense

 

COMPARING SURVIVAL CURVES

Visual comparison of the curves

Cross-overs

Trends

 

The non-parametric methods for comparing 2 survival distributions

Gehan’s generalized Wilcoxon test

Peto’s generalized Wilcoxon test

Cox-Mantel test

Log-rank test

Mantel-Haenszel test

Cox’s F test

 

The parametric tests for comparing 2 survival distributions

Likelihood ratio test

Cox’s F test

 

The Wilcoxon test

More sensitive to differences between the curves at the earlier failure times

Less sensitive than the log-rank test for later failure times

Gives more weight to the earlier part of the survival curve

 

The Mantel-Haenszel test

Relies on methods of analyzing incidence density ratios

 

The Log-rank test

More sensitive if the assumptions of proportional hazards hold

Attaches equal importance to early and late failure times

Modification by Peto attaches more importance to earlier failure times

 

ANALYSIS FOR PROGNOSTIC COVARIATES

A KM curve can be drawn for each level of covariate and visual inspection is used to compare

Cox’s regression is a semi-parametric method for studying several covariates simultaneously

The loglinear exponential and the linear exponential regression methods are parametric approaches to studying prognostic covariates.

Risk factors for death can be identified using linear discriminant functions and the linear logistic regression method.

 

 SURVIVAL DATA SET

The study was carried out by the Eastern Cooperative Oncology Group based at Harvard University and was published in the Journal of Clinical Oncology 2:865-870, 1984. A randomized clinical trial involving 32 elderly patients with acute leukemia was carried out to compare two treatment regimens. The patients were randomized to either full dose induction therapy or half dose attenuated therapy. The table below gives the case-by-case information on the study.

 

ID

Survival Duration

(days)

Status

(1=died 2=censored)

Treatment Group

(1=full dose, 2=half dose)

Age

(years)

1

4

1

1

79

8

20

1

1

77

9

31

1

1

82

10

26

1

1

74

13

14

1

1

77

16

24

2

1

75

17

16

1

1

83

19

115

1

1

72

23

19

1

1

75

24

11

1

1

72

26

14

1

1

73

27

4

1

1

76

29

67

2

1

83

31

12

1

1

78

37

10

1

1

83

39

84

1

1

73

41

100

2

1

71

4

155

1

2

76

7

378

1

2

78

11

19

1

2

71

15

208

2

2

79

18

18

1

2

73

20

291

2

2

81

21

102

2

2

77

22

64

1

2

75

28

188

1

2

70

33

111

2

2

75

34

1

1

2

78

38

57

2

2

-

40

63

2

2

91

42

71

1

2

80

43

36

2

2

73

 

PRACTICAL ASSIGNMENTS

 

ASSIGNMENT #1 (LIFETABLE METHOD)

Using intervals of 20 days for survival duration, manually construct 2 survival curves on the same graph paper for the full and the half dose treatment groups respectively. What conclusions can you make about the two treatments from visual inspection of the curves?

Repeat the above using the SPSS program. Interpret results of statistical tests comparing the 2 curves

 

ASSIGNMENT #2 (KAPLAN MAIER METHOD)

Manually construct 2 survival curves on the same graph paper for the full and the half dose treatment groups respectively. What conclusions can you make about the two treatments from visual inspection of the curves?

Repeat the above using the SPSS program. Interpret results of statistical tests comparing the 2 curves

 

ASSIGNMENT #3 (PROPORTIONAL HAZARDS METHOD)

Using the SPSS program carry out a comparison of the two treatments

Repeat the above with age in the model. What conclusions do you make about as a prognostic factor?

 

SOLUTION ASSIGNMENT #1 - LT

Full dose group

Sort table by survival time

ID

Survival Duration

(days)

Status

(1=died 2=censored)

Treatment Group

(1=full dose, 2=half dose)

Age

(years)

1

4

1

1

79

27

4

1

1

76

37

10

1

1

83

24

11

1

1

72

31

12

1

1

78

13

14

1

1

77

26

14

1

1

73

17

16

1

1

83

23

19

1

1

75

8

20

1

1

77

16

24

2

1

75

10

26

1

1

74

9

31

1

1

82

29

67

2

1

83

39

84

1

1

73

41

100

2

1

71

19

115

1

1

72

 

                Compute survival

Col 1

Col 2

Col 3

Col 4

Col 5

Col 6

Col 7

Col 8

T

O

D

W

O-W

P = D /(O-W)

Q = 1 – P

S = Q = (1-P)

0-

 

 

 

 

 

 

 

20-

 

 

 

 

 

 

 

40-

 

 

 

 

 

 

 

60-

 

 

 

 

 

 

 

80-

 

 

 

 

 

 

 

100-

 

 

 

 

 

 

 

120-

 

 

 

 

 

 

 

140-

 

 

 

 

 

 

 

160-

 

 

 

 

 

 

 

180-

 

 

 

 

 

 

 

200-

 

 

 

 

 

 

 

220-

 

 

 

 

 

 

 

 

SOLUTION ASSIGNMENT #1- LT (CONT…1)

Half dose group

Sort table by survival duration

ID

Survival Duration

(days)

Status

(1=died 2=censored)

Treatment Group

(1=full dose, 2=half dose)

Age

(years)

34

1

1

2

78

18

18

1

2

73

11

19

1

2

71

43

36

2

2

73

38

57

2

2

-

40

63

2

2

91

22

64

1

2

75

42

71

1

2

80

21

102

2

2

77

33

111

2

2

75

4

155

1

2

76

28

188

1

2

70

15

208

2

2

79

20

291

2

2

81

7

378

1

2

78

 

                Compute survival

Col 1

Col 2

Col 3

Col 4

Col 5

Col 6

Col 7

Col 8

T

O

D

W

O-W

P = D /(O-W)

Q = 1 – P

S = Q = (1-P)

0-

 

 

 

 

 

 

 

20-

 

 

 

 

 

 

 

40-

 

 

 

 

 

 

 

60-

 

 

 

 

 

 

 

80-

 

 

 

 

 

 

 

100-

 

 

 

 

 

 

 

120-

 

 

 

 

 

 

 

140-

 

 

 

 

 

 

 

160-

 

 

 

 

 

 

 

180-

 

 

 

 

 

 

 

200-

 

 

 

 

 

 

 

220-

 

 

 

 

 

 

 

 

SOLUTION ASSIGNMENT #1 - LT (CONT…2)

Draw survival curves

 

Compare survival curves by visual inspection

SOLUTION ASSIGNMENT #1 - LT (CONT…3)

 

Draw survival curves using the SPSS program

Interpret the statistical tests

 

 

SOLUTION TO ASSIGNMENT #2 - KM

Full dose group

Col 1

Col 2

Col 3

Col 4

Col 5

Col 6

Col 7

t

O

D

W

P = D/O

Q = 1 – P

S =  Q =  (1-P)

4

 

1

0

 

 

 

4

 

1

0

 

 

 

10

 

1

0

 

 

 

11

 

1

0

 

 

 

12

 

1

0

 

 

 

14

 

1

0

 

 

 

14

 

1

0

 

 

 

16

 

1

0

 

 

 

19

 

1

0

 

 

 

20

 

1

0

 

 

 

24

 

0

1

 

 

 

26

 

1

0

 

 

 

31

 

1

0

 

 

 

67

 

0

1

 

 

 

84

 

1

0

 

 

 

100

 

0

1

 

 

 

115

 

1

0

 

 

 

 

Half dose group

Col 1

Col 2

Col 3

Col 4

Col 5

Col 6

Col 7

t

O

D

W

P = D/O

Q = 1 – P

S =  Q =  (1-P)

1

 

1

0

 

 

 

18

 

1

0

 

 

 

19

 

1

0

 

 

 

36

 

0

1

 

 

 

57

 

0

1

 

 

 

63

 

0

1

 

 

 

64

 

1

0

 

 

 

71

 

1

0

 

 

 

102

 

0

1

 

 

 

111

 

0

1

 

 

 

155

 

1

0

 

 

 

188

 

1

0

 

 

 

208

 

0

1

 

 

 

291

 

0

1

 

 

 

378

 

1

0

 

 

 

 

SOLUTION TO ASSIGNMENT #2 – KM (CONT..1)

Draw survival curves

Compare survival curves by visual inspection

 

SOLUTION TO ASSIGNMENT #2 –KM (CONT..3)

 

Draw survival curves using the SPSS program

Interpret statistical tests

 

SOLUTION TO ASSIGNMENT #3 - COX

Carry out Cox’s regression

 

Determine the effect of the age covariate on survival

 

Prof Omar Hasan Kasule, Sr. August 2003