By Professor Omar Hasan Kasule

Learning Objectives:

Definition and use of survival analysis

Construction of the survival curve using the life-table and the Kaplan-Maier methods

Comparison of the curves graphically by vision

Interpretation, and testing of survival curves


Key Words and Terms:

Actuarial Analysis



Life Table

Proportional Hazards

Survival Analysis

Survival curve

Survival Rate

Total Survival




Survival analysis is study of the occurrence and timing of events. Covariates are studied to determine their effect on survival duration. Although applicable for both retrospective and prospective data, they are best for the latter. Two features of survival analysis are not found in conventional statistics: censoring and time-dependent covariates (time-varying explanatory variables).



Three methods of survival analysis are commonly: the life-table method, the Kaplan-Maeir method, and the Proportional hazards method.



Survival curves are used for preliminary examination of data. Median survival can be read off the curves. Visual inspection can tell us whether there are obvious differences between the 2 groups and whether those differences are increasing or decreasing.



Survival analysis is used in follow-up of patients on treatment by various experimental therapies. It is also used to evaluate survival after diagnosis with specific diseases. It is also used to summarize and evaluate mortality in different groups. The methods can be extended to other uses that are non-medical such as: survival of animals in drug trials, survival of electric bulbs, survival of machine tools, survival of equipment, survival of friendships, time to promotion, time to divorce. The techniques of survival analysis are employed in various disciplines for example event history in sociology, reliability analysis in engineering, failure time analysis in engineering, and duration analysis in economics.



There are several ways of measuring time to the event of interest. Time may be measured as duration for example time since birth (age), time since a given event, time since the last occurrence of the same event. Time may also be measured as calendar time although this is less popular in clinical trials. The following examples illustrate various descriptions of time periods: time to relapse, survival after relapse, time to death, time to infection or any other complication. In survival analysis our interest is in survival duration which is usually time measured from zero time until the event of interest: failure/relapse, death, 1st response, or censoring. Zero time is defined as the point in time when the hazard starts operating, the point of randomization, the time of enrolment into the study, the date of the first visit, the date of the first symptoms, the date of diagnosis, or the date of starting treatment. The best zero time is the point if randomization. Use of time at diagnosis or start of treatment may introduce bias because socio-economic factors may determine access to diagnosis and treatment facilities. Survival duration is measured by subtraction of the zero time from time at failure or censoring. Thus we may be interested in time from start of treatment to the 1st response. Sometimes the interest is in the length of remission, remission duration. Sometimes the interest is in the tumor-free time. Survival can be described as relative survival or absolute survival. Relative survival is to 1-year survival of trial subjects with the general population. Absolute survival is the proportion of the trial subjects who live up to 5 years. Absolure survival is more popular in usage.



A problem in survival analysis is censoring. Censoring occurs when an individual is not followed up until occurrence of the event of interest. Censoring leads to loss of information due to incomplete observation. Those not followed up fully may have a different experience that would lead to bias in the study. Censoring is caused by loss to follow-up, withdrawal from the study, study termination when subjects had different dates of enrolment, loss to follow-up, or death due to a competing risk. Censored observations contribute to the analysis until the time of censoring. Censored analysis makes the assumption that if censored subjects had been followed beyond the point in time at which they were censored, they would have had the same rates of outcomes as those not censored at that time. Existence of similar censoring patterns between different treatment groups suggests that censoring assumptions are holding.




Column #1 is the time at the start of the time interval. The first row of the table is assigned time 0. Column #2 is the number of subjects under observation at the start of the time interval, O. Column #4 is the number who died during the time interval, D. Column #4 is the number withdrawn during the time interval, W. Withdrawals are considered to occur at the start of the time interval. We assume that there are no secular trends in risk of death in different calendar periods. Those who withdraw and those who stay under observation have the same probability of death. Column #5 is the number under observation during the interval. It is computed as O-W. Column #6 is the probability of dying in the interval. It is computed as P = D / O-W. Column #7 is the probability of surviving to the end of the interval and is computed as Q=1-P. Column #8 is the probability of survival from time 0 until the end of the interval. The probability for the first row is 1.0. Subsequent probabilities are computed by multiplying Q into the survival probability of the prior row. The survival probabilities in column #8 are plotted against time in column #1 to generate a survival curve. Two or more curves can be generated depending on the treatment or experimental groups.



The lifetable methods works well with large data sets and when the time of occurrence of an event can not be measured precisely. It is an advantage of being able to make a credible analysis without knowing the exact times of censoring or withdrawal.



The life-table method is not efficient in handling withdrawals. This could be a source of bias. The choice of the interval is arbitrary. The method assumes that withdrawal occurs at mid-interval which may not be the case.




The KM involves defining a risk set at each time there is a failure and computation of the instantaneous probability of death at that time.



Column #1 is the time at occurrence of an event, ti. It is an exact time and not a time interval. It is not fixed in advance but is defined by events of death or withdrawal. Deaths and withdrawals occur at different times. The notation t refers to any time when death, withdrawal, or censoring of an event occur.  Column #2 is the number of subjects at risk at time, ti. This number decreases progressively down the column as the number of deaths, the number of withdrawals, and the number of censored observations are subtracted. Column #4 is the number of deaths at time t. Column #4 is the number of withdrawals at time t. Column #5 is the probability of death at time ti. It is computed as the number of deaths at time ti  (column #4) divided by the number at risk just before time ti (column #2). Occurrence of withdrawals is recorded in the table but they are considered non-events. A withdrawal affects only the number at risk when the next event of death occurs. Column #6 is the probability of survival at time ti . It is computed as the 1 - probability of death at time ti…Column #7 indicates cumulative survival from time 0 to time ti . It is computed by multiplying the row probability of survival into the probability of survival of the previous row.



The Kaplan-Maier method is best used for small data sets in which the time of event occurrence is measured precisely. The Kaplan-Maier method is an improvement on the life-table method in the handling of withdrawals. The life-table method considers withdrawals to occur at the start of the interval but it reality withdrawals occur throughout the interval. The assumption could therefore create bias or imprecision. The Kaplan-Maier method avoids this complication by not fixing the time intervals in advance. Intervals are defined in two ways: (a) An interval ends when the end-point event of interest occurs. (b) An interval ends when a withdrawal occurs. 



The Proportional hazards regression is the most popular. This procedure uses regression methods proposed in 1972 by the British statistician Sir David Cox in his famous paper ‘Regression Models and Lifetables’ published in the Journal of the Royal Statistical Society. It became one of the most quoted papers in statistics literature.



The curves can be compared by manual inspection or specialized formulas can be used.

Professor Omar Hasan Kasule Sr. April 2001