1.0 CONCEPTS OF DISEASE CAUSATION

A. Causal Triangle

B. Risk

C. Cause

D. Disease Cause Associations

E. Criteria Of Causality

2.0 CONCEPT OF EXPOSURE

A. Definition

B. Classification Of Exposures

C. Measurement Of Exposures

3.0 DISEASE DETERMINANTS

A. Definition Of Determinants

B. Biological Determinants

C. Behavioral Determinants

D. Environmental Determinants

E. Social Determinants

4.0 GENERAL CONCEPTS OF EPIDEMIOLOGICAL ANALYSIS

A. Analytic Epidemiology

B. Hypothesis Testing

C. Preliminaries to Data Analysis

D. Procedures Used:

5.0 TESTS OF ASSOCIATION

A. Tests Of Association On Means:

B. Tests Of Association On Proportions: Single 2x2 Contingency Table

C. Tests Of Association On Proportions: Single: 2 X K Contingency Table:

D. Tests Of Association On Proportions: Stratified 2 X 2 Contingency Tables

E. Properties Of The Chi-Square Statistic:

6.0 MEASURES OF EFFECT

A. Comparison Of Proportions In Contingency Table

B. Measures Of Excessive Risk:

C. Regression Effect Estimates

D. Properties Of The Odds Ratio:

E. Interaction and Effect Modification

7.0 VALIDITY and PRECISION

A. Validity

B. Internal Validity

C. External Validity

D. Precision

8.0 SOURCES and TREATMENT OF BIAS

A. Mis-classification bias

B. Selection bias

C. Confounding bias

D. Mis-specification bias

E. Survey error and sampling bias

CONCEPTS OF DISEASE CAUSATION

The causal triangle

Environment

Host

Disease

Risk

Disease risk is a probability

Risk factor = known empirically to be involved in disease causation

Risk indicator = likely to be cause but are not yet confirmed

Cause

Data on causes from animal or human experiments/observations

Causes: causative or preventive

Sufficient cause = constellation of RFs that triggers disease

Necessary cause = (RF always part of the sufficient cause

Causes: weak and strong

Multi-causality of most diseases

Synergy = cooperative interaction in disease causation

Antagonism = causes acting against one another

The causal chain or causal pathway

Multi-stage

Initiated by the main risk factor

Final stages are due to promoters

Disease and Putative Risk Factor

Statistically or non-statistical association

Statistical association may be causal or non-causal

1 disease with => RFs

1 cause with => different independent causes

Criteria of causality

Essential criteria

Specificity

Strength

Time sequence

Biological plausibility

Back-up criteria

Dose-effect relationship

Repetition

Consistency

Evidence from intervention

Experimental evidence.

CONCEPTS OF EXPOSURE

Exposure (personal attribute or environmental agent)

Physiological effect

Cause disease

Protect from disease

Description of exposures

Defined by subjective or objective data

Current or past exposures

Dichotomous (exposed vs unexposed)

Ranking by importance

Quantitatively or qualitatively

Instruments for measuring exposures

Questionnaires

Personal interviews

Biochemical analyses of biological material

Physical and chemical analysis of the environment

Dimensions of exposure measurement

Nature of the exposure

Dose

Time

Errors of exposure measurement

Differential errors bias the odds ratio

Non-differential errors attenuate effect)

DISEASE DETERMINANTS

Biological determinants

Demographic: age and gender

Genetic

Behavioral determinants

Lifestyle

Nutrition

Environmental determinants

Infections

Physical agents: heat, cold, and radiation

Social determinants

Socio-economic status

Occupation

Race & ethnicity

Medical care.

CONCEPTS OF EPIDEMIOLOGICAL ANALYSIS

Introduction

Data analysis affects practical decisions

Construction of hypotheses

Testing hypotheses

2-sided test covers p1>p2 and p2>p1

sided test covers either p1>p2 or P2 > p1 and not both

Simple manual inspection of the data

Identifying outliers

Assessing the normality of data

Identifying commonsense relationships

Alert the investigator to errors in computer analysis

Data models for continuous data

Straight line regression

Non-linear regression

Trends

Data models for categorical data

Maximum likelihood

Logistic models

Two procedures of analytic epidemiology

Measures of association

t-test

chi-square

linear correlation coefficient

linear regression coefficient

Measures of effect

Odds Ratio

Risk Ratio

Rate difference

Logistic regression coefficient

Trend analysis

Relationships missed by association mrasures

Relations missed by effect measures.

VALIDITY and PRECISION

Validity = measure of accuracy

External validity = generalizability

Internal validity
= results of individual study not biased

Precision

Measures variation in the estimate

lack of random error = little variation in the estimate

Narrow CI = precise

Wide CI = imprecise

Reliability is reproducibility

Sources of bias

Misclassification bias

Selection bias

Confounding bias

Sampling bias

Types of bias

Negative bias = parameter estimate is below the true parameter

Positive bias = the parameter estimate is above the true parameter

Errors

Random (non-differential) errors lead to imprecise parameter estimates

Systematic (differential) errors lead to bias

CAUSAL INFERENCE

1.0 CONCEPTS OF DISEASE CAUSATION

The concept of the causal triangle (environment, host, and disease) has been used for many years to simplify epidemiological
reasoning. Disease risk is a probability. A risk factor is known empirically to be involved in disease causation. Risk indicators
are likely to be causes but are not yet confirmed. Data on causes can be obtained from animal or human experiments/observations.
Causes may be defined as causative or preventive. A risk factor is described as sufficient when its mere presence will trigger
the disease concerned. In practice a sufficient cause refers to a constellation of 2 or more risk factors since most diseases
are multi-causal. One disease normally has more than 1 sufficient cause. There are some risk factors that are always present
in all sufficient causes of the disease. These are referred to as necessary causes. Causes may be weak or strong. Causes may
interact either cooperatively in disease causation (synergy) or act against one another (antagonism). The causal chain or
causal pathway is multi-stage. It is initiated by the main risk factor. The final stages are due to promotors. Association
of disease with a putative risk factor may be statistically or non-statistical. Statistical association can be causal or non-causal.
One disease may have 2 or more co-factors. One disease may have 2 quite different independent causes. One cause leads to 2
different diseases. The criteria of causality are either essential criteria or back-up criteria. The essential causal criteria
are four: specificity, strength, time sequence, and biological plausibility. The back-up causal criteria are five: dose-effect
relationship, repetition, consistency, evidence from intervention, and experimental evidence.

2.0 CONCEPT OF EXPOSURE

An exposure is defined as a substance, phenomenon, or event that has a physiological effect, can cause or protect
from disease. Exposures may be personal attributes or environmental agents, defined by subjective or objective data, current
or past exposures. Exposures can be dichotomous (exposed vs unexposed), ranked according to importance, stratified. Categorization
may be based on statistical distributions for example BMI. Exposures may be measured quantitatively or qualitatively. The
following are instruments used to measure exposures: questionnaires, personal interviews, biochemical analyses of biological
material, physical and chemical analysis of the environment. Measurement of an exposure involves three dimensions: nature
of the exposure, the dose, and time. Differential errors in exposure measurement result in a biased odds ratio; the bias remains
even of the sample size is increased. Non-differential errors make the odds ratio tend to the null value (attenuation of effect).
Non differential error lowers study power and requires a larger sample size to detect a given difference. Measurement errors
can be reduced by multiple assessments of the exposure such as repeat assessments of cholesterol. The effect measure can be
adjusted to account for the effect of the error. The best approach is to use high quality control measures at the stage of
data collection to minimize errors.

3.0 DISEASE DETERMINANTS

Biological determinants are demographic or genetic. Age and gender structure of a population have an impact on
mortality and morbidity. Pre-disposition to many diseases is inherited. Some
diseases are known to be genetically-caused while the genetic basis of others is being unravelled. Behavioral determinants
are lifestyle and nutrition. Environmental determinants are infections and physical agents such as heat, cold, and radiation.
Social determinants are the socio-economic status, occupation, race, ethnicity, and medical care.

4.0 GENERAL CONCEPTS OF EPIDEMIOLOGICAL ANALYSIS

Data analysis affects practical decisions. It involves construction of hypotheses and testing them. The 2-sided
test covers p1>p2 and p2>p1. The 1-sided test covers either p1>p2 or P2 > p1 and not both. The 2-sided test is
preferentially used because it is more conservative. Simple manual inspection of the data is needed can help identify outliers,
assess the normality of data, and identify commonsense relationships, and alert the investigator to errors in computer analysis.
Data models for continuous data can be straight line regression, non-linear regression, or trends. Data models for categorical
data are the maximum likelihood and the logistic models. Two procedures are employed in analytic epidemiology. The test for
association is done first. The assessment of the effect measures is done after finding an association. Effect measures are
useless in situations in which tests for association are negative. The common tests for association are: t-test, chi-square,
the linear correlation coefficient, and the linear regression coefficient. The effect measures commonly employed are: Odds
Ratio, Risk Ratio, Rate difference. Measures of trend can discover relationships that are not picked up by association and
effect measures.

5.0 TESTS OF ASSOCIATION

The tests below are used for continuous measurement data. The t-test is used for two sample means. Analysis of
variance, ANOVA (F test) is used for more than 2 sample means. Multiple analysis of variance, MANOVA, is used to test for
more than one factor. Linear regression is used in conjunction with the t test for data that requires modeling. Dummy variables
in the regression model can be used to control for confounding factors like age and sex. The chisquare test is used to test
association of 2 or more proportions in contingency tables. The exact test is used to test proportions for small sample sizes.
The Mantel-Haenszel chi-square statistic is used to test for association in stratified 2 x 2 tables. The chi square statistic
is valid if at least 80% of cells have more than 5 observed, at least 80% of cells have more than 1.0 expected, and there
are at least 5 observed in 80% of cells. If the observations are not independent of one another as in paired or matched studies, the McNemar chisquare test is used instead of the usual Pearson chisquare
test. The chisquare works best for approximately Gaussian distributions.

6.0 MEASURES OF EFFECT

The Mantel-Haenszel Odds Ratio is used for 2 proportions in single
or stratified 2x2 contingency table. Logistic regression can be used as an alternative to the MH procedure. For paired proportions,
a special form of the M-H OR and a special form of logistic regression called conditional logistic regression, are used. Excessive
disease risk is measured by Attributable Risk, Attributable Risk Proportion, and Population Attributable Risk. Variation
of an effect measure by levels of a third variable is called effect modification by epidemiologists and interaction by statisticians.
Synergism/antagonism is when the interaction between two causative factors leads to an effect more than what is expected on
the basis of additivity or subtractibility. Interaction can be conceptualized at 4 levels. Statistical (additive and multiplicative),
biologic, public health, & decision making. The chi square for heterogeneity can be used to test for effect modification/interaction.

VALIDITY and PRECISION

An epidemiological study should be considered as a sort of measurement
with parameters for validity and precision. Validity is a measure of accuracy. Validity can be classified as internal validity and external validity. External validity
is also called generalizability. Precision measures variation in the estimate.
Reliability is reproducibility. Bias is defined technically as the situation in which the expectation of the parameter is
not zero. The following types of bias are explained in the next unit: Misclassification bias, Selection bias, and confounding
bias. Bias may move the effect parameter away from the null value or toward the null value. In negative bias the parameter
estimate is below the true parameter. In positive bias the parameter estimate is above the true parameter. A study is not
valid if it is biased. Systematic errors lead to bias and therefore invalid parameter estimates. Random errors lead to imprecise
parameter estimates. Internal validity is concerned with the results of each individual study. Internal validity is impaired
by study bias. External validity is generalizability of results. Traditionally results are generalized if the sample is representative
of the population. In practice generalizability is achieved by looking at results of several studies each of which is individually
internally valid. It is therefore not the objective of each individual study to be generalizable because that would require
assembling a representative sample. Precision is a measure for lack of random error. An effect measure with a narrow confidence
interval is said to be precise. An effect measure with a wide confidence interval in imprecise. Precision is increased in
three ways: increasing the study size, increasing study efficiency, and care taken in measurement of variables to decrease
mistakes.

8.0 SOURCES AND TREATMENT OF BIAS

Misclassification is inaccurate assignment of exposure or disease status.
Random or non-differential misclassification of disease, measured by the 95% CI, biases the effect measure towards the null,
underestimates the effect measure but does not introduce bias. Non-random or differential misclassification is a systematic
error that biases the effect measures away from the null exaggerating or underestimating the effect measure. Positive association
may become negative and negative associations association may become positive. Misclassification bias is classified as information
bias, detection bias, and proto-pathic bias. Information bias is systematic incorrect measurement on response due to questionnaire
defects, observer errors, respondent errors, instrument errors, diagnostic errors, and exposure mis-specification. Detection
bias arises when disease or exposure are sought more vigorously in some groups than others. Protopathic bias arises when early
signs of disease cause a change in behaviour with regard to the risk factor. Misclassification bias can be prevented by using
double-blind techniques to decrease observer and respondent bias. Treatment of misclassification bias the probabilistic approach
and measurement of inter-rater variation.

Selection bias arises when subjects included in the study differ
in a systematic way from those not selected. Selection bias due to biological factors includes the Neyman fallacy and susceptibility
bias. The Neyman fallacy arises when the risk factor is related to prognosis
(survival) thus biasing prevalence studies. Susceptibility bias arises when susceptibility to disease is indirectly related
to the risk factor. Selection bias due to disease ascertainment procedures includes publicity, exposure, diagnostic, detection,
referral, self-selection, and Berkson biases. The Hawthorne self selection bias is also called the healthy worker effect since sick people are not employed
or are dismissed. The Berkson fallacy arises due to differential admission of some cases to hospital in proportions such that
studies based on the hospital give a wrong picture of disease-exposure relations in the community. Selection bias during data
collection is represented by non-response bias and follow-up bias Prevention:
study design should avoid the causes of selection bias that have been mentioned. Treatment: there are no easy methods for
adjustment for the effect of selection bias once it has occurred.

Confounding is mixing up of effects. Confounding bias arises when the disease-exposure
relationship is disturbed by an extraneous factor called the confounding variable. The confounding variable is not actually
involved in the exposure-disease relationship. It is however predictive of disease but is unequally distributed between exposure
groups. Being related both to the disease and the risk factor, the confounding variable could lead to a spurious apparent
relation between disease and exposure. A confounder must fulfil the following criteria: relation to both disease and exposure
and not being part of the causal pathway, being a true risk factor for the disease, being associated to the exposure in the
source population, must not be affected by either disease or exposure. Prevention of confounding at the design stage by eliminating
the effect of the confounding factor can be achieved using 4 strategies: pair-matching, stratification, randomisation, and
restriction. Care must be taken to deal only with true confounders. Adjusting for non-confounders reduces the precision of
the study. Non-multivariate treatment of confounding employs standardization and stratified Mantel-Haenszel analysis. Multi-variate
treatment of confounding employs multivariate adjustment procedures: multiple linear regression, linear discriminant function,
and multiple logistic regression.

Mis-specification bias arises when a wrong statistical model is used. For
example use of parametric methods for non-parametric data biases the findings.

Survey error and sampling bias: Total survey error is the sum of
the sampling error and three non-sampling errors (measurement error, non-response error, and coverage error). Sampling errors
are easier to estimate than non-sampling errors. Sampling error decreases with increasing sample size. Non-sampling errors
may be systematic like non-coverage of the whole sample or they may be non-systematic. Non-systematic errors cause severe
bias. Sampling bias, positive or negative, arises when results from the sample
are consistently wrong (biased) away from the true population parameter. The sources of bias are: incomplete or inappropriate
sampling frame, use of a wrong sampling unit, non-response bias, measurement bias, coverage bias, and sampling bias. Sensitivity
analysis can be carried out for the major types of bias often using simulations.