·
Biomathematics

·
Biometry

·
Mathematical Computing

·
Numerical Analysis

·
Numerical Data

·
Statistical
Conclusion

·
Statistical Methods

·
Statistical
questions

·
Statistics And Decisions

·
Statistics,
analytic statistics

·
Statistics,
applied statistics

·
Statistics,
descriptive statistics

· Statistics, health Statistics

· Statistics, inferential statistics

·
Statistics, mathematical statistics

·
Statistics,
medical Statistics

·
Statistics,
theoretical statistics

·
Substantive
Conclusion

· Substantive Question

*Unit Outline *

BIOSTATISTICS AS A DISCIPLINE

A. Statistics

B. Biostatistics

C. Importance of Biostatistics

D. Scope of Biostatistics

E. Rationale of Learning Biostatistics

HISTORY OF BIOSTATISTICS

A. Ancient Times:

B. Era of Vital Records:

C. Population Studies

D. Era of Descriptive Statistics

E. Era of Analytic Statistics

LIMITATIONS OF BIOSTATISTICS

A. Statistical Vs Substantive

B. Analysis Vs Interpretation:

C. Misuse of Statistics:

D. Mis-Use of the Computer:

BIOSTATISTICS AS A DISCIPLINE

The term statistics can be used to convey three meanings. Applied statistics is defined
as techniques of articulating, summarizing, analyzing, and interpreting numerical information. Theoretical statistics deals
with probability. Statistics are indices or summary statistics derived from data. Bio-statistics is a branch of applied statistics
that is management and analysis of numerical data on people, health, disease, medical treatments and procedures. It includes
vital statistics, public health statistics, and demography. Biostatistics is divided into 2 branches: descriptive and analytic.
Descriptive statistics deals with collection, organization, presentation, and summarization of data. Analytic statistics deals
with drawing logical and objective conclusions about a sample or a population. Biostatistics provides the tools for the summary
and digestion of a lot of numerical laboratory and clinical data including critical reading and understanding of scientific
literature.

HISTORY OF BIOSTATISTICS

Statistics has grown through successive eras: era of censuses, era of vital statistics,
era of descriptive statistics, era of analytic statistics, and era of probability statistics. Ancient civilizations counted
their populations for taxation and military purposes. Complete census were first carried out in Sweden
in 1749, the US in 1790, Spain
in 1798, England & Wales
in 1801, and Canada in 1871. John Graunt is considered the
founder of vital statistics. He analyzed London mortality data and also laid the
foundations of the science of demography. William Farr started the modern procedures of vital statistics registration. Pierre
Charles Alexandre Louis (1787-1872) introduced the numerical method in describing medical facts quantitatively.

The 19^{th} century and early 20^{th} centuries witnessed many theoretical
developments. Karl Pearson (1857-1936) introduced the mode, mean deviation, coefficient of variation, moments, measures of
symmetry and kurtosis, the chi-square, symbol of the null hypothesis (H_{0}), type 1 and type 11 errors, homoscedacity
and heteroscedacity, and the concept of partial correlation. Sir Arnold Fisher (1890-1962) introduced variance, methods for
small samples, factorial designs, the null hypothesis, random allocation, ANOVA, ANCOVA, relation between regression and ANOVA,
and testing significance of the regression coefficient. Karl Pearson and RA Fisher developed contingency table analysis using
the chi-square test. Adolph Quetelet developed vital statistics in its modern form and introduced the concept of the mean.
KF Gauss (1777-1855) introduced the median, re-discovered the normal distribution that has independently been discovered before
Pierre Simon Marquis de Laplace (1749-1827) and in 1733 by Abraham de Moivre (1667-1754). Sir Francis Galton used the term
‘normal’ to refer to the curve, applied statistical techniques to natural phenomena, described correlation and
regression. W.F. Sheppard introduced the standard normal curve in 1899. C Kremp published the first table of the area under
the curve in 1799. J Neyman developed the concept of confidence intervals in 1934. Charles Spearman (1863-1945) and Maurice
George Kendall (1907-1983) introduced non-parametric tests.

The bulk of statistical theory is probability theory since modern inferential statistics
depends on probability theory. Christian Huygens (1629-1695) was the first one to publish on probability and games. Modern probability theory owes a lot to the pioneers: Blaise Pascal (1623-1662), Pierre
de Fermat (1601-1665), Jacques Bernoulli (1654-1705), Nicolas Bernoulli (1687-1759), Abraham de Moivre (1667-1754), Pierre
Raymond de Montmart (1678-1719), and Pierre Simon Marquis de Laplace (1749-1827).

LIMITATIONS OF BIOSTATISTICS

An investigator starts with a substantive question that is formulated as a statistical
question. Data is then collected and is analyzed to reach a statistical conclusion. The statistical conclusion is used with
other knowledge to reach a substantive conclusion.

Statistics has several limitations. It gives statistical and not substantive answers.
The statistical conclusion refers to groups and not individuals. It only summarizes but does not interpret data.

Statistics can be misused by selective presentation of desired results. Computation is
not an end in itself. It is a tool that can be used well or can be mis-used. A human must have a clear idea of what is required
of the computer and must instruct it accordingly. The human must also be able to intelligently interpret the output from the
computer. All who tinker with computers must remember the adage ‘rubbish
in/rubbish out’.

EXERCISES: INTERNET RESEARCH

Using the internet to get more information on the following topics

1. History
of population censuses

2. Brief biographies
and contributions of the following pioneers of bio-statistics: Fisher, John Graunt, William Farr, Quetelet, La Place, Bernoulli.

3. Review the
abstract and methodology sections of one journal article in a current issue of any medical journal and draw a table showing
the frequency with which the following statistical terms are used: t-test, chi-square test, linear regression, logistic regression,
analysis of variance, and p-value.

4. List at
2-3 statistical packages or programs that are used for (a) data management (b) data analysis (c) both data management and
data analysis

5. Find out
and describe the following information about a computer: Random access memory, hard disk memory, speed, type of chip used,
byte, bit, local area network, server

** **

EXERCISES: PREPARING A QUESTIONNAIRE
FOR DATA COLLECTION

Prepare a **mailed questionnaire** and collect the following data on members of the
class:

·
Identifying information: ID (not real), gender,
year of study

·
Sociodemographic information: home address (urban/rural),
region of origin (East Coast, West Coast & Central, North, South, Other), primary school (religious, private, public)

·
Family information: number of siblings, paternal
grandfather living now? (yes/no) and if dead age at death

·
Wearing glasses for refractive errors (yes/no),
age at which glasses were first prescribed, does the father wear glasses (yes/no), does the mother wear glasses (yes/no),
does any sibling wear glasses (yes/no)

·
Color preference (choose one color only)

·
Desire to specialize or work as a general practitioner
(yes/no)

·
Ideal age for marriage

Desired number of children