Definition and use of survival analysis
Construction of the survival curve using the lifetable method
Construction of the survival curve using the KaplanMaier method
Comparison of the curves graphically by vision
Interpretation and testing of survival curves
Use of Cox’s Proportional Hazards Regression Model
Key Words and Terms:
Censoring
KaplanMaier
Life Table
Cox proportional hazards regression
Survival curve
AGENDA
I. LEARNING OBJECTIVES
Pretest #1
II. INTRODUCTION TO SURVIVAL ANALYSIS
Pretest #2
III. PROBLEMS OF SURVIVAL ANALYSIS
Pretest #3
IV. METHODS OF SURVIVAL ANALYSIS
Pretest #4
V. COMPARING SURVIVAL CURVES
Survival data set
VI. PRACTICAL ASSIGNMENTS
PRE TEST #1
1. The following statements are true about survival analysis
in general
Survival analysis studies effects of covariates on survival
Survival analysis cannot be used to compare treatments
The use of survival analysis is confined to the medical field
Parametric methods are not used in survival analysis
Lifetable and KaplanMaier are parametric methods
2. The following statements are true about survival analysis
in general
The outcome in survival analysis is time until the event
of interest
The term ‘event’ is the same as ‘failure’
Relapse is not considered an event
The outcome in survival analysis is always dichotomous
Survival analysis can be used for the ‘survival’ of machine parts
3. The following statements are time periods in survival analysis
Remission duration
Time to relapse
Survival after relapse
Time to diagnosis
Incubation period
4. The following are valid ‘zero times’ for survival analysis
Time at point of randomization.
Time at enrolment
Time of the first visit
Time of diagnosis
Time at loss to follow up
INTRODUCTION TO SURVIVAL ANALYSIS I
Definition of survival analysis
Study of survival duration
Study of the effects of covariates on survival
Objectives of survival analysis
Estimation and interpretation of the hazard function
Comparing survival/hazard functions
Assessment of the relationship of explanatory covariates to survival
Disciplines that use survival analysis
Clinical medicine
Sociology – event history
Engineering – reliability analysis & failure time analysis
Economics – duration analysis
Uses of survival analysis  medical
Followup of patients on treatment by experimental therapies
Evaluation of survival after diagnosis
Summarizing and evaluating mortality in different groups
Uses of survival analysis – non medical
Survival of electric bulbs
Survival of machine tools
Survival of equipment
Survival of friendships
Survival in political office
Survival of marriage (time to divorce)
INTRODUCTION TO SURVIVAL ANALYSIS II
Measurement of Time durations in survival analysis
Time to relapse
Remission duration
Survival after relapse
Time to death
Time to a complication
Zero times used in survival analysis
Point of randomization (best)
Enrolment
First visit
First symptoms
Diagnosis
Start of treatment
Survival data
32 elderly acute leukemia patients
Randomization to 2 treatment groups: full dose (17) and half dose (15)
Full dose group: died 14; censored 3
Half dose group: died 8; censored 7
Data setup for survival analysis (rectangular file)
ID 
Survival Duration
(days/weeks/months) 
Status
(died/censored) 
Treatment Group 
Covariates
(age, sex, gender,
etc) 
1. 



V1 
V2 
V3 
V4 
V5 
2. 








3. 








4. 








. 








PRE TEST #2
1. The following statements are true about causes of censoring
Censoring is loss of information due to incomplete observation
Censoring is caused by withdrawal from the study
Study termination is never a cause of censoring
Loss to followup is a cause of censoring
Death due to a competing risk is a cause of censoring
2. The following statements are true about types of censoring
In left censoring observation ends before a given point in time
In right censoring the subject last seen alive at a given time is not seen after
Interval censoring is loss to follow up between diagnosis and treatment
Right censoring is less common than left censoring
Random censoring occurs uniformly throughout the study
3. The following statements are true about types of censoring
Nonrandom censoring is due to investigator manipulation
Nonrandom censoring leads to bias
Random censoring is not related to outcome
In progressive censoring entry and censoring times are different for each subject
Analysis based on the intention to treat is more conservative than censored analysis.
4. The following statements are true about truncation & competing causes of death
In left truncation, only patients surviving a certain time are considered
In right truncation only patients experiencing an event by a given time are considered.
Competing causes of death are one cause of censoring that bias survival estimates.
A study can be designed to ensure that competing causes of death are eliminated
The term competing causes of death refers only to neoplastic diseases
PROBLEMS OF SURVIVAL ANALYSIS
The Ideal situation for constructing survival curve
Complete follow up until moment of death
No losses to followup
No other causes of death
3 Problems of survival analysis
Censoring = loss of information
Truncation = exclusion of subjects from the analysis
Competing causes of death (bias survival estimates)
Causes of censoring
Withdrawal from the study
Study termination
Loss to followup
Death due to a competing risk
Types of censoring
Left: observation ends before a given point in time
Right: subject is last seen alive at a given time and is not followed up subsequently
Interval: mixture of left and right censoring, occurs between 2 given time points
Random: occurs uniformly throughout the study, not related to outcome (no bias)
Nonrandom: due to investigator manipulation (bias)
Progressive: in studies in which entry and censoring times are different for each subject
Types of truncation
Left: only individuals surviving a certain time are included in the sample
Right: only individuals experiencing event of interest by a given time are included
PRETEST #3
1. The following statements are true about nonparametric survival analysis
The lifetable methods better with large data sets
The life table method is preferred if time of occurrence is not measured precisely
The life table method assumes withdrawals at the end of the interval
The KaplanMaier method is best used for small data sets
The KaplanMaier method is best used for precisely measured occurrence times
In the KaplanMaier method time intervals are not fixed in advance.
2. The following statements are true about regression methods
for survival analysis
The proportional hazards regression method is semiparametric
Sir David Cox proposed the proportional hazards regression method in 1972
The proportional hazards regression method unpopular for survival analysis
The proportional hazards regression method can be used if data distribution is unknown
The proportional hazards regression method gives valid results only for normally distributed
data
METHODS OF SURVIVAL ANALYSIS
Parametric methods of survival analysis (used if underlying distribution is known)
Weibull
Lognormal
Gamma
Nonparametric methods of survival analysis (used if underlying distribution is unknown)
The lifetable method
The KaplanMaier method
The Proportional hazards method
The lifetable (LT) method for drawing a survival curve
Suitable for large data sets
Resorted to if time of occurrence of an event cannot be measured precisely
Leads to bias due to assumptions about the point of withdrawal
The KaplanMaier (KM) method for drawing a survival curve
Suitable for small data sets
Used only if the time of event occurrence is measured precisely
Improvement on the lifetable method in the handling of withdrawals
SURVIVAL CURVE USING THE LT METHOD
A
table with 8 columns is set up with suitable time intervals fixed by the analyst
Col 1 
Col 2 
Col 3 
Col 4 
Col 5 
Col 6 
Col 7 
Col 8 
t 
O 
D 
W 
OW 
P = D /(OW) 
Q = 1 – P 
S = Õ Q = Õ (1P) 
0 























Column
#1 (t) is the time at the start of the time interval. The first row of the table is assigned time 0.
Column
#2 (O) is the number of subjects under observation at the start of the time interval.
Column
#3 (D) is the number who died during the time interval.
Column
#4 (W) is the number withdrawn during the time interval.
Column
#5 (OW) is the number under observation during the interval computed as col 2 – col 4
Column
#6 (P) is the probability of dying in the interval computed as P = D / OW
Column
#7 (Q) is the probability of surviving to the end of the interval computed as Q=1P.
Column
#8 (S) is the probability of survival from time 0 until the end of the interval. The probability for
the
first row is 1.0. Subsequent probabilities are computed by multiplying Q into the cumulative survival probability (S) of the
prior row.
The
survival probabilities in column #8 are plotted against time in column #1 to generate a survival curve.
Two
or more curves can be generated depending on the treatment or experimental groups.
SURVIVAL CURVE USING THE KM METHOD
A
table with 7 columns is set up by the analyst
Col 1 
Col 2 
Col 3 
Col 4 
Col 5 
Col 6 
Col 7 
t 
O 
D 
W 
P = D/O 
Q = 1 – P 
S = ÕQ =
Õ (1P) 
0 





1.0 







Column
#1 (t) is the occurrence of an event (death or withdrawal)
Column
#2 (O) is the number of subjects at risk at time, t
Column
#3 (D) is the number of deaths at time t
Column
#4 (W) is the number of withdrawals at time t
Column
#5 (P) is the probability of death at time t computed as P = D/O
Column
#6 (Q) is the probability of survival at time t computed as the 1 – P
Column
#7 (S) indicates cumulative survival from time 0 to time t. It is computed by multiplying the row probability of survival
(Q) into the cumulative probability of survival (S) of the previous row.
SURVIVAL ANALYSIS BY REGRESSION
LIFEREG
Parametric regression that uses MLE
Accommodates left censoring
Tests hypotheses about the shape of the hazard function
Cannot handle timevarying covariates
THE POISSON REGRESSION
Used for rare outcomes (<5%) for which PHREG is not appropriate
Models the log transform of incidence at time t as a linear function of covariates
Assumes that incidence does not depend on the elapsed time, t.
LOGISTIC, PROBIT, AND GENERAL LINEAR MODELS
Easier and more intuitive than Cox’s regression
Treat time like any other variable
Treat covariates as fixed.
PHREG (Cox’s Proportional Hazards)
Robust semiparametric regression that fits the data well
No selection or assumption of any particular probability model
Uses of partial MLE estimates and proportional hazards
Handles only right censoring
Handles timedependent covariates
Can handle competing risks
Controls confounding variables
Lacks builtin graphics capability.
PRETEST #4
1. The following statements are true about survival curves
The lifetable method can be used to construct the survival curve
The KaplanMaier method cannot be used to construct survival curves
Survival curves can be compared using statistical methods
If the survival curve for group A is above and parallel to that of group B, then group A has a longer
median survival
The survival curve is approximately a reverseJ shape
2. The following statements are true about comparing survival
curves
Visual inspection of survival curves is completely useless.
Gehan’s generalized Wilcoxon test is a nonparametric test
Cox’s F test is a parametric method
The Wilcoxon test is more sensitive to differences between the curves at the earlier failure times
The logrank test attaches equal importance to all failure times irrespective of whether they are early
or late.
3. The following statements are true about comparing survival
curves
The MH test is used to compare survival curves
The Logrank is a parametric test for comparing survival curves
The generalized Wilcoxon is a parametric test for comparing survival curves
The Cox proportional hazards test is a nonparametric test
Survival analysis with 1 curve only and no comparison curve makes no sense
COMPARING SURVIVAL CURVES
Visual comparison of the curves
Crossovers
Trends
The nonparametric methods for comparing 2 survival distributions
Gehan’s generalized Wilcoxon test
Peto’s generalized Wilcoxon test
CoxMantel test
Logrank test
MantelHaenszel test
Cox’s F test
The parametric tests for comparing 2 survival distributions
Likelihood ratio test
Cox’s F test
The Wilcoxon test
More sensitive to differences between the curves at the earlier failure times
Less sensitive than the logrank test for later failure times
Gives more weight to the earlier part of the survival curve
The MantelHaenszel test
Relies on methods of analyzing incidence density ratios
The Logrank test
More sensitive if the assumptions of proportional hazards hold
Attaches equal importance to early and late failure times
Modification by Peto attaches more importance to earlier failure times
ANALYSIS FOR PROGNOSTIC COVARIATES
A KM curve can be drawn for each level of covariate and visual inspection is used to compare
Cox’s regression is a semiparametric method for studying several covariates simultaneously
The loglinear exponential and the linear exponential regression methods are parametric approaches to
studying prognostic covariates.
Risk factors for death can be identified using linear discriminant functions and the linear logistic
regression method.
SURVIVAL DATA SET
The study was carried out by the Eastern Cooperative Oncology Group based at Harvard University and
was published in the Journal of Clinical Oncology 2:865870, 1984. A randomized clinical trial involving 32 elderly patients
with acute leukemia was carried out to compare two treatment regimens. The patients were randomized to either full dose induction
therapy or half dose attenuated therapy. The table below gives the casebycase information on the study.
ID 
Survival Duration
(days) 
Status
(1=died 2=censored) 
Treatment Group
(1=full dose, 2=half dose) 
Age
(years) 
1 
4 
1 
1 
79 
8 
20 
1 
1 
77 
9 
31 
1 
1 
82 
10 
26 
1 
1 
74 
13 
14 
1 
1 
77 
16 
24 
2 
1 
75 
17 
16 
1 
1 
83 
19 
115 
1 
1 
72 
23 
19 
1 
1 
75 
24 
11 
1 
1 
72 
26 
14 
1 
1 
73 
27 
4 
1 
1 
76 
29 
67 
2 
1 
83 
31 
12 
1 
1 
78 
37 
10 
1 
1 
83 
39 
84 
1 
1 
73 
41 
100 
2 
1 
71 
4 
155 
1 
2 
76 
7 
378 
1 
2 
78 
11 
19 
1 
2 
71 
15 
208 
2 
2 
79 
18 
18 
1 
2 
73 
20 
291 
2 
2 
81 
21 
102 
2 
2 
77 
22 
64 
1 
2 
75 
28 
188 
1 
2 
70 
33 
111 
2 
2 
75 
34 
1 
1 
2 
78 
38 
57 
2 
2 
 
40 
63 
2 
2 
91 
42 
71 
1 
2 
80 
43 
36 
2 
2 
73 
PRACTICAL ASSIGNMENTS
ASSIGNMENT #1 (LIFETABLE METHOD)
Using intervals of 20 days for survival duration, manually construct 2 survival curves on the same
graph paper for the full and the half dose treatment groups respectively. What conclusions can you make about the two treatments
from visual inspection of the curves?
Repeat the above using the SPSS program. Interpret results of statistical tests comparing the 2 curves
ASSIGNMENT #2 (KAPLAN MAIER METHOD)
Manually construct 2 survival curves on the same graph paper for the full and the half dose treatment
groups respectively. What conclusions can you make about the two treatments from visual inspection of the curves?
Repeat the above using the SPSS program. Interpret results of statistical tests comparing the 2 curves
ASSIGNMENT #3 (PROPORTIONAL HAZARDS METHOD)
Using the SPSS program carry out a comparison of the two treatments
Repeat the above with age in the model. What conclusions do you make about as a prognostic factor?
SOLUTION ASSIGNMENT #1  LT
Full dose group
Sort table by survival time
ID 
Survival Duration
(days) 
Status
(1=died 2=censored) 
Treatment Group
(1=full dose, 2=half dose) 
Age
(years) 
1 
4 
1 
1 
79 
27 
4 
1 
1 
76 
37 
10 
1 
1 
83 
24 
11 
1 
1 
72 
31 
12 
1 
1 
78 
13 
14 
1 
1 
77 
26 
14 
1 
1 
73 
17 
16 
1 
1 
83 
23 
19 
1 
1 
75 
8 
20 
1 
1 
77 
16 
24 
2 
1 
75 
10 
26 
1 
1 
74 
9 
31 
1 
1 
82 
29 
67 
2 
1 
83 
39 
84 
1 
1 
73 
41 
100 
2 
1 
71 
19 
115 
1 
1 
72 
Compute survival
Col 1 
Col 2 
Col 3 
Col 4 
Col 5 
Col 6 
Col 7 
Col 8 
T 
O 
D 
W 
OW 
P = D /(OW) 
Q = 1 – P 
S = Õ Q = Õ (1P) 
0 







20 







40 







60 







80 







100 







120 







140 







160 







180 







200 







220 







SOLUTION ASSIGNMENT #1 LT (CONT…1)
Half dose group
Sort table by survival duration
ID 
Survival Duration
(days) 
Status
(1=died 2=censored) 
Treatment Group
(1=full dose, 2=half dose) 
Age
(years) 
34 
1 
1 
2 
78 
18 
18 
1 
2 
73 
11 
19 
1 
2 
71 
43 
36 
2 
2 
73 
38 
57 
2 
2 
 
40 
63 
2 
2 
91 
22 
64 
1 
2 
75 
42 
71 
1 
2 
80 
21 
102 
2 
2 
77 
33 
111 
2 
2 
75 
4 
155 
1 
2 
76 
28 
188 
1 
2 
70 
15 
208 
2 
2 
79 
20 
291 
2 
2 
81 
7 
378 
1 
2 
78 
Compute survival
Col 1 
Col 2 
Col 3 
Col 4 
Col 5 
Col 6 
Col 7 
Col 8 
T 
O 
D 
W 
OW 
P = D /(OW) 
Q = 1 – P 
S = Õ Q = Õ (1P) 
0 







20 







40 







60 







80 







100 







120 







140 







160 







180 







200 







220 







SOLUTION ASSIGNMENT #1  LT (CONT…2)
Draw survival curves
Compare survival curves by visual inspection
SOLUTION ASSIGNMENT #1  LT (CONT…3)
Draw survival curves using the SPSS program
Interpret the statistical tests
SOLUTION TO ASSIGNMENT #2  KM
Full dose group
Col 1 
Col 2 
Col 3 
Col 4 
Col 5 
Col 6 
Col 7 
t 
O 
D 
W 
P = D/O 
Q = 1 – P 
S = ÕQ =
Õ (1P) 
4 

1 
0 



4 

1 
0 



10 

1 
0 



11 

1 
0 



12 

1 
0 



14 

1 
0 



14 

1 
0 



16 

1 
0 



19 

1 
0 



20 

1 
0 



24 

0 
1 



26 

1 
0 



31 

1 
0 



67 

0 
1 



84 

1 
0 



100 

0 
1 



115 

1 
0 



Half dose group
Col 1 
Col 2 
Col 3 
Col 4 
Col 5 
Col 6 
Col 7 
t 
O 
D 
W 
P = D/O 
Q = 1 – P 
S = ÕQ =
Õ (1P) 
1 

1 
0 



18 

1 
0 



19 

1 
0 



36 

0 
1 



57 

0 
1 



63 

0 
1 



64 

1 
0 



71 

1 
0 



102 

0 
1 



111 

0 
1 



155 

1 
0 



188 

1 
0 



208 

0 
1 



291 

0 
1 



378 

1 
0 



SOLUTION TO ASSIGNMENT #2 – KM (CONT..1)
Draw survival curves
Compare survival curves by visual inspection
SOLUTION TO ASSIGNMENT #2 –KM (CONT..3)
Draw survival curves using the SPSS program
Interpret statistical tests
SOLUTION TO ASSIGNMENT #3  COX
Carry out Cox’s regression
Determine the effect of the age covariate on survival