Home

ISLAMIC MEDICAL EDUCATION RESOURCES-03

0308-HYPOTHESES

Lecture at the FIFTH ADVANCED ASIAN COURSE IN TROPICAL EPIDEMIOLOGY INSTITUTE FOR MEDICAL RESEARCH KUALA LUMPUR 18-29 AUGUST 2003 By Professor Dr Omar Hasan Kasule, Sr. MB ChB (MUK), MPH, DrPH (Harvard) Deputy Dean for Research, Faculty of Medicine, UIA PO Box 141 Kuantan Pahang MALAYSIA Tel 609 513 2797. Fax 609 513 3615 E-M omarkasule@yahoo.com

Learning Objectives:

Hypothesis testing and the scientific method

The null and the alternative hypotheses

Procedures and interpretation of hypothesis tests

The concept, meaning and significance of the p-value

Errors of statistical testing

 

Key Words and Terms:

Error type 2 (beta error)

Error, type 1 (alpha error)

False negative

False positive

Hypothesis testing, formal

Hypothesis testing, informal

Hypothesis, alternative

Hypothesis, null

Non-rejection region

Null value

Region, critical

Region, of non-rejection

Region, of rejection

Significance, clinical

Significance, clinical

Significance, practical

Significance, practical

Significance, statistical

Statistical power

Statistical tests

Test statistic

Test, of significance

Test, one-tail

Test, statistic

Test, two-tail

True negative

True positive


AGENDA

1.0 EPIDEMIOLOGIC METHODOLOGY:

A. Epidemiological Research

B. Hypotheses:

C. Sources of Epidemiological Data

D. Empiricism, Induction, Refutation, and Bayeseniasm

E. Balance of Strengths and Weaknesses:

 

2.0 HYPOTHESES AND THE SCIENTIFIC METHOD

A. The Scientific Method

B. Formulation Of Hypotheses

C. Informal Hypothesis Testing

D. Formal Testing Of Hypotheses

E. Generation Of New Hypotheses

 

3.0 NULL HYPOTHESIS (Ho ) & ALTERNATIVE HYPOTHESIS (HA):

A. Types Of Hypotheses:

B. Conclusions About Hypotheses:

C. False Positive And False Negative

D. Summary Of Testing Errors

 

4.0  HYPOTHESIS TESTING USING TESTS OF SIGNIFICANCE

A. Parameters Used In Hypothesis Testing

B. Procedure Of Hypothesis Testing

C. Statistical Significance:

D. Interpretation Of P-Values:

 

5.0 HYPOTHESIS TESTING USING CONFIDENCE INTERVALS

A. Two Approaches To Hypothesis Testing

B. The Concept Of Null Value

C. Procedure Of Hypothesis Testing

D. Interpretation

 

6.0 CONCLUSIONS and INTERPRETATIONS

A. Implications Of Statistically Significant

B. Implications Of Not Statistically Significant

C. Statistical And Practical Significance

D. 1-Tail And 2-Tail Tests:

E. Errors Of Testing

 

 


 

EPIDEMIOLOGIC METHODOLOGY:

An epidemiologic investigation proceeds through identifying and describing a problem, using the scientific method to formulate and test hypotheses, and interpreting findings.

 

Epidemiological information is sourced from existing data or studies (observational or experimental). Existing data is from census, medical facilities, government, and private sector, health surveys, and vital statistics.

 

Experimental studies, natural or true experiments, and involve deliberate human action or intervention whose outcome is then observed. They have the advantage of controlled conditions but have ethical problems of experimenting on humans.

 

Observational studies allow nature to take its course and just record the occurrences of disease and describe the what, where, when, and why of a disease. They are of 4 types of observational studies: ecologic, cross-sectional, case control, and cohort (follow-up) studies. Their advantage is low cost and fewer ethical issues. They suffer from 3 disadvantages: disease aetiology is not studied directly because the investigator does not manipulate the exposures, unavailability of information, and confounding.

 

Epidemiological methodology, following the scientific method, is empirical, inductive, and refutative. Epidemiology relies on and respects only empirical findings. Empiricism refers to reliance on physical proof. Induction is building a theory on several individual observations. Refutation is basically refusal of a supposition until it is proved otherwise. Epidemiological investigation is not as deterministic as laboratory investigation but is cheap and easy.

 

HYPOTHESES AND THE SCIENTIFIC METHOD

The scientific method if hypothesis formulation, experimentation to test the hypothesis, and drawing conclusions. Hypotheses are statements of prior belief. They are modified by results of experiments to give rise to new hypotheses. The new ones then in turn become the basis for new experiments.

 

There are two traditions of formal hypothesis testing: significance testing and the Neyman-Pearson testing. Significance testing depends on use of a single p-value to reach a decision. The Neyman-Pearson approach uses the confidence interval conventionally selected as the 95% CI. The two approaches are related because  a in significance testing corresponds to 1-a in the Neyman-Pearson approach.

 

NULL HYPOTHESIS (Ho ) & ALTERNATIVE HYPOTHESIS (HA):

The null or research hypothesis, H0, states that there is no difference between the two comparison groups and that the apparent difference seen is due to sampling error. The alternative hypothesis, HA, disagrees with the null hypothesis. H0 and HA are complimentary and exhaustive and between them cover all the possibilities.

 

A hypothesis cannot be proved; you only give an objective measure of probability of its truth.

 

We can use concepts of conditional probability to define errors of statistical testing.

 

Type 1 error = a error = Probability of rejecting a true H0 (false positive) = Pr (rejecting H0 | H0 is true).

 

Type 2 error = berror = Probability of not rejecting a false H0 (false negative) = Pr (not rejecting H0 | H0 is false).

 

The confidence level (1 - a) = True positive = Pr (not rejecting H0 | H0 is true).

 

Power (1-b) = True negative = Pr (rejecting H0 | H0 is false).

 

Whereas a relates to significance error, b relates to error of acceptance.

 

   TABULAR SUMMARY OF TESTING ERRORS

True situation

Result of testing

Decision

Type of error

HO is true

Do not reject HO

Correct

None

HO is true

Reject HO

Wrong

Type 1

HO is false

Do not reject HO

Wrong

Type 2

HO is false

Reject HO

Correct

None

 

The above table can be set out in a different way as follows:

DECISION MADE

TRUE SITUATION

H0 is true

H0 is false

Do not reject H0

Correct Decision (1-a)

Type 2 error (b)

Reject H0

Type 1 error (a)

Correct decision (1-b)

 

HYPOTHESIS TESTING USING TESTS OF SIGNIFICANCE

Parameters of significance testing are the significance level, critical region, p-value, type 1 error, type II error, and power. 

 

a, the critical or rejection region is the far end of the distribution. 1-a is the non-rejection region. a, the pre-set level of significance usually 0.05, is the probability that a test statistic falls in the rejection region or the probability of wrongfully rejecting H0 5% of the time, a ratio of 1:20.

 

The p value, the observed significance level, is the percentage of extreme observations away from the null or mean value. P value can be defined in a commonsense way as the probability of rejecting a true hypothesis by mistake.

 

Hypothesis testing start by stating H0 and HA, assuming a level of significance usually 0.05, selecting a test statistic which when applied to the data will yield a p-value. Four test statistics based on approximate Gaussian distribution are employed: F, t, c. Exact methods based on the binomial distribution are used for small samples.

 

The decision rules are: If the p < 0.05 H0 is rejected (test statistically significant). If the p>0.05 H0 is not rejected (test not statistically significant).

 

HYPOTHESIS TESTING USING CONFIDENCE INTERVALS

The 95% confidence interval is more informative than the p-value approach because it indicates precision.

 

Under H0 the null value is defined as 0 (when the difference between comparison groups=0) or as 1.0 (when the ratio between comparison groups=1).

 

The 95% CIs can be computed using approximate Gaussian or exact binomial methods.

The decision rules are: if the CI contains the null value, H0 is not rejected. If the CI When the interval does not contain the null value, H0 is rejected.

 

CONCLUSIONS and INTERPRETATIONS

 

IMPLICATIONS OF STATISTICALLY SIGNIFICANT

H0 is false

H0 is rejected

Observations are not compatible with H0

Observations are not due to sampling variation

Observations are real/true biological phenomenon

 

IMPLICATIONS OF NOT STATISTICALLY SIGNIFICANT

H0 is not false (we do not say true)

H0 is not rejected

Observations are compatible with H0

Observations are due to sampling variation or random errors of measurement.   

Observations are artificial, apparent and not real biological phenomena

 

Statistically significant may have no clinical/practical significance/importance. This is due other factors being involved and not studied and measurements that are not valid.

 

Clinically important difference may not reach statistical significance due to small sample size and  measurement that are not discriminating enough.

 

Hypothesis testing may be 1-sided or 2-sided. The 1-sided test considers extraneous values on one side (1 tail) and is rarely used. The 2-sided test considers extraneous values on 2 sides (2 tails), is a more popular conservative test, and looks for any change in the parameter whatever its direction.

 

 

PRACTICAL ASSIGNMENT (survival data set)

1. Test the null hypothesis that there is no difference in mean survival time between the 2 treatment groups

 

2. Test the hypothesis that there is no difference in the proportion of males and females between the two treatment groups


HYPOTHESES AND THE SCIENTIFIC METHOD

A. THE SCIENTIFIC METHOD

The scientific method is currently the most powerful method available in empirical investigations. It proceeds in stages starting with formulation of a study hypothesis. An experiment is then designed based on the hypothesis. The data from the experimentation is used to draw objective conclusions about the hypothesis.

B. FORMULATION OF HYPOTHESES

In accordance with the scientific method a null hypothesis is formulated usually in the form that there is no difference between 2 groups being compared. Correct formulation of the null hypothesis is necessary for study design and study interpretation. A series of studies can be interpreted to yield a general explanatory law or hypothesis. Such generalizations are based on results of a series of valid studies.

C. INFORMAL HYPOTHESIS TESTING

The use of hypotheses and the scientific method is sometimes informal. For example when a patient walks into the doctor's office, the doctor will form a hypothesis based on preliminary observations. This will become the working hypothesis used to guide further clinical examination and investigations. The hypothesis may be changed or updated in view of new information that may be collected.

D. FORMAL TESTING OF HYPOTHESES

There are two traditions of formal hypothesis testing: significance testing and the Neyman-Pearson hypothesis testing.

Significance testing depends on use of a single p-value to reach a decision. Significance testing has been criticized on various grounds. It has does not incorporate any measure of the magnitude of association. It cannot assess precision of the measurement. It historically developed in agriculture and industry which required simple choices. It does not fit in the epidemiological paradigm being more suited to industry and agriculture where problems are less complicated. Significance testing involves putting the hypothesis in a mathematical formulation and computing the probability of the hypothesis being correct or incorrect. The bulk of inferential statistics is concerned with the formulation and testing of hypotheses. In formal testing the data is used to generate a test statistic that is used to generate a probability value. The value is compared to a pre-set probability of significance to make a decision about the null hypothesis.

The Neyman-Pearson approach avoids the criticisms of significance testing stated above. It does not give a conclusion based on a single probability of the hypothesis being true. It provides a confidence range of the probability of the hypothesis being true. It is therefore more informative and makes more use of the data provided. The Neyman-Pearson approach uses the confidence interval conventionally selected as the 95% confidence interval. It however must be noted that the two are related to one another. If significance testing uses a significance level of a, the corresponding confidence interval is 1 - a %.

E. GENERATION OF NEW HYPOTHESES

Hypotheses are statements of prior belief. They are modified by results of experiments to give rise to new hypotheses. The new ones then in turn become the basis for new experiments. This process is repeated continuously enabling scientific knowledge and understanding to grow. In this process no facts or knowledge can remain static for long. Changes are continually taking place.

2.4.2 NULL HYPOTHESIS (Ho ) & ALTERNATIVE HYPOTHESIS (HA):

A. TYPES OF HYPOTHESES:

A hypothesis is a statement of belief in something. Unlike other types of beliefs, scientific beliefs are subject to experimental verification. Two hypotheses are always stated for proper scientific investigation: the null and the alternative hypotheses. The null hypothesis or research hypothesis, H0, states that there is no difference between the two comparison groups and that the apparent difference seen is due to sampling error. The alternative hypothesis, HA, disagrees with the null hypothesis and states that there is a real difference not explained by sampling error. H0 and HA are complimentary and exhaustive in that between them they cover all the possibilities. HA could be vague. When H0 is rejected, we cannot accept HA we only fail to reject it.

B. CONCLUSIONS ABOUT HYPOTHESES:

The aim of hypothesis testing is to make a conclusion about H0. The conclusion is in the form of rejecting or not rejecting the hypothesis. If H0 is rejected, HA becomes the new working hypothesis. A hypothesis cannot be proved; you only give an objective measure of probability of its truth

C. FALSE POSITIVE and FALSE NEGATIVE

Intersections of distributions of H0 and HA: The observed data can be plotted on 2 normal curves one under the assumptions of the null hypothesis; the other under the assumption of the alternative hypothesis. The 2 curves will naturally intersect. This intersection gives rise to two concepts that are basic in hypothesis testing: the probability of false positive and the probability of false negative. False positive is that part of the HA curve intersecting into the H0 curve. It is also referred to as the type 11 or beta error. The probability of false negative is that part of the H0 curve intersecting into the HA curve. It is also referred to as the type 1 or alpha error.

D. SUMMARY OF TESTING ERRORS

True situation

Result of testing

Decision

Type of error

HO is true

Do not reject HO

Correct

None

HO is true

Reject HO

Wrong

Type 1

HO is false

Do not reject HO

Wrong

Type 2

HO is false

Reject HO

Correct

None

 

The above table can be set out in a different way as follows:

DECISION MADE

TRUE SITUATION

H0 is true

H0 is false

Do not reject H0

Correct Decision (1-a)

Type 2 error (b)

Reject H0

Type 1 error (a)

Correct decision (1-b)

 

We can use concepts of conditional probability to define the parameters explained above as follows. Type 1 error = Pr (rejecting H0 | H0 is true). Type 2 error = Pr (not rejecting H0 | H0 is false). The confidence level (1 - a) = True positive = Pr (not rejecting H0 | H0 is true). Power (1-b) = True negative = Pr (rejecting H0 | H0 is false). Whereas alpha related to significance error, beta relates to error of acceptance.

2.4.3 HYPOTHESIS TESTING USING TESTS OF SIGNIFICANCE

A. PARAMETERS USED IN HYPOTHESIS TESTING

Four parameters or concepts are used in hypothesis or significance testing: critical region, significance level, p-value, type 1 error, type II error, and beta.  The critical region is the far end of the distribution. We may talk of a one-sided critical region or a 2-sided critical region. The critical region is also called the rejection region denoted by alpha. The non-rejection region is denoted by 1-alpha. Alpha is the probability that a test statistic falls in the critical or rejection region. Alpha, the level of significance usually set at 0.05, is probability of wrongfully rejecting H0 5% of the time, a ratio of 1:20. The p value can be defined or described in various ways. The p value is a measure of the compatibility of the observed data with the null hypothesis. The p value is the observed significance level. The p value is the probability of results as extensive or more extensive than the preset level of significance. The p-value is the percentage of extreme observations away from the null or mean value. The p-value is the probability of observing the test statistic or a more extreme value. P-value can also de defined as the area of the tail beyond the value of the test statistic. The p-value can be 1-tail or 2-tail (upper tail and lower tail). The upper tail p value is the probability that the test statistic is higher than the stated value. The lower tail p value is the probability that the test statistic is less than the observed value. Type 1 error, also called alpha error, is the probability of false positive. This stated in other words it is the probability of rejecting a true null hypothesis.  It can also be defined as the probability of incorrect rejection of the null hypothesis. Type II or beta error is the probability of false negative. This stated in a different way is the probability of failing to reject a false null hypothesis. 

B. PROCEDURE OF HYPOTHESIS TESTING

The procedures start by stating H0 and HA. Then a level of significance is assumed; this is traditionally taken to be 0.05 or 1 in 20. The level of significance is the same as saying that I am taking a 1 in 20 risk of being wrong. The next step is selecting  a test statistic which when applied to the data will yield a p-value. If approximate methods are used, 4 test statistics based on the Gaussian distribution are employed: F-test, t-test, and the chi-test. These are computed from the data and the corresponding p-value is looked up in appropriate tables.  In cases of small samples for which the Gaussian distribution is not valid, exact methods of computing the p-value are used. These methods based on the binomial distribution yield the p-value directly from the data. The following decision rules are used in making conclusions about the null hypothesis. If the p-value is less than 0.05, the H0 is rejected. If the p-value is greater than 0.05, the H0 is not rejected.

C. STATISTICAL SIGNIFICANCE:

The results of hypothesis testing may reveal either a statistically significant difference or a statistically non-significant difference. H0 is rejected for statistically significant results.

H0 is not rejected for statistically non-significant results.

D. INTERPRETATION OF P-VALUES:

The following is a guideline on the interpretation of p-values. P value <0.01 indicates strong evidence against H0. P values 0.01 - 0.05 indicate moderate evidence against H0. P values 0.05 - 0.10 are suggestive evidence against H0. P values > 0.1 provide little or no evidence against H0.

2.4.4 HYPOTHESIS TESTING USING CONFIDENCE INTERVALS

A. TWO APPROACHES TO HYPOTHESIS TESTING

P-value defined for practical purposes as probability of more than 95% deviation from the average/null. 95% confidence interval. The 95% confidence interval is used less often than the p-value, although many investigators are of the opinion that it is more informative. The 95% confidence interval is more informative than the p-value approach. It gives information about precision whereas the p-value approach only indicates whether they’re a significant difference or not.

B. THE CONCEPT OF NULL VALUE

Under the null hypothesis of no real difference between summary statistics of 2 samples that are compared, the difference between the sample statistics is zero and their ratio is 1.0. Thus zero and 1.0 are called the null values. Hypothesis testing is a form of proof by contradiction.

C. PROCEDURE OF HYPOTHESIS TESTING

The 95% confidence interval of a parameter consists of all values of the parameter that would not be rejected at the a level of significance. The procedure of testing starts with stating H0 and HA. Under the null assumptions the null value is defined as 0 (when the difference between comparison groups=0) or as 1.0 (when the ratio between comparison groups=1). At the start we assume the level of significance usually 0.05. The 95% lower and upper confidence intervals can be computed in 2 ways (a) using approximate methods based on the Gaussian distribution. The following test statistics are involved: t, chi. using the test statistic and applying a special formula, the lower and higher confidence intervals can be determined. (b) Exact methods based on the binomial distribution are used when the sample size is small. Exact methods require use of powerful computers and appropriate statistical software. The decision rule is that if the Interval contains the null value, H0 is not rejected. When the interval does not contain null value, H0 is rejected.

D. INTERPRETATION

If the 95% CI does not contain the null value, we reject the null hypothesis and we can conclude that there is statistical significance. In other words we are sure that the null value is not within the interval. Our chance of error is 5% i.e. there is a 5% chance that we have made a mistake by concluding that the null is not in the interval. If on the other hand, the 95% CI contains the null value, we do not reject the null hypothesis and conclude that there is no statistical significance.

2.4.5 CONCLUSIONS and INTERPRETATIONS

A. IMPLICATIONS OF STATISTICALLY SIGNIFICANT

H0 is false

H0 is rejected

Observations are not compatible with H0

Observations are not due to sampling variation

Observations are real/true biological phenomenon

B. IMPLICATIONS OF NOT STATISTICALLY SIGNIFICANT

H0 is not false (we do not say true)

H0 is not rejected

Observations are compatible with H0

Observations are due to sampling variation or random errors of measurement.   

Observations are artificial, apparent and not real biological phenomena

C. STATISTICAL AND PRACTICAL SIGNIFICANCE

Statistically significant may have no clinical/practical significance/importance. This may be due to (a) other factors being involved and not studied here (b) measurements that are not valid. Clinically important difference may not reach statistical significance due to 2 main reasons: (a) small sample size (b) measurement that are not discriminating enough

D. 1-TAIL AND 2-TAIL TESTS:

The test may be 2-tail or 1-tail. The 1-tail test may be right tail or left tail. The decision to use a 1-tail or 2-tail test. The decision on which test to use depends on the intention. The 1-sided test considers extraneous values on one side (1 tail).  Under the 1-tail test the following are true: (a) upper tail: mu = mu0 or mu > mu0 (b) lower tail: mean = mu0 or mu < mu0.  The test is rarely used; it is used in situations in which the direction of the difference is known. The 2-sided test considers extraneous values on 2 sides (2 tails). It is a more conservative test. The 2-tail test looks for any change in the parameter whatever its direction.

D. ERRORS OF TESTING

CLASSIFICATION

Type 1 = rejecting a true H0 (false positive)

Type 2 = not rejecting a false H0 (false negative)

Alpha = Pr (type 1 error) = Pr (rejecting H0 when H0 is true)

Beta = Pr (type 2 error) = Pr (not rejecting H0 when H0 is false).

DETERMINANTS OF TYPE 2 ERRORS

Type II error is determined by the discrepancy between the true and hypothesized value, the sample size,  variability/standard deviation, significance level, and the tail of the test.  The bigger the discrepancy between the tru and hypothesized values, the less the probability of type 2 error.  The larger the sample size the lower the probability of type 2 error. The lower the variability, the lower the probability of type 2 error. The lower the level of significance (type 1 error), the higher the probability of type 2 error. 1-tail tests have a higher probability of type 2 error

Professor Omar Hasan Kasule Sr August 2003