By Professor Omar Hasan Kasule Sr.

Learning Objectives

               Definition, scope, and role of bio-statistics in medicine

               Definition of descriptive and inferential biostatistics

               Difference between substantive and statistical conclusions

               Uses and limitations of bio-statistics


Key Words and Terms




               Mathematical Computing

               Numerical Analysis

               Numerical Data

               Statistical Conclusion

               Statistical Methods

               Statistical questions

               Statistics And Decisions

               Statistics, analytic statistics

               Statistics, applied statistics

               Statistics, descriptive statistics

               Statistics, health Statistics

               Statistics, inferential statistics

               Statistics, mathematical statistics

               Statistics, medical Statistics

               Statistics, theoretical statistics

               Substantive Conclusion

               Substantive Question


Unit Outline



A. Statistics

B. Biostatistics

C. Importance of Biostatistics

D. Scope of Biostatistics

E. Rationale of Learning Biostatistics



A. Ancient Times:

B. Era of Vital Records:

C. Population Studies

D. Era of Descriptive Statistics

E. Era of Analytic Statistics



A. Statistical Vs Substantive

B. Analysis Vs Interpretation:

C. Misuse of Statistics:

D. Mis-Use of the Computer:






The term statistics can be used to convey three meanings. Applied statistics is defined as techniques of articulating, summarizing, analyzing, and interpreting numerical information. Theoretical statistics deals with probability. Statistics are indices or summary statistics derived from data. Bio-statistics is a branch of applied statistics that is management and analysis of numerical data on people, health, disease, medical treatments and procedures. It includes vital statistics, public health statistics, and demography. Biostatistics is divided into 2 branches: descriptive and analytic. Descriptive statistics deals with collection, organization, presentation, and summarization of data. Analytic statistics deals with drawing logical and objective conclusions about a sample or a population. Biostatistics provides the tools for the summary and digestion of a lot of numerical laboratory and clinical data including critical reading and understanding of scientific literature.



Statistics has grown through successive eras: era of censuses, era of vital statistics, era of descriptive statistics, era of analytic statistics, and era of probability statistics. Ancient civilizations counted their populations for taxation and military purposes. Complete census were first carried out in Sweden in 1749, the US in 1790, Spain in 1798, England & Wales in 1801, and Canada in 1871. John Graunt is considered the founder of vital statistics. He analyzed London mortality data and also laid the foundations of the science of demography. William Farr started the modern procedures of vital statistics registration. Pierre Charles Alexandre Louis (1787-1872) introduced the numerical method in describing medical facts quantitatively.


The 19th century and early 20th centuries witnessed many theoretical developments. Karl Pearson (1857-1936) introduced the mode, mean deviation, coefficient of variation, moments, measures of symmetry and kurtosis, the chi-square, symbol of the null hypothesis (H0), type 1 and type 11 errors, homoscedacity and heteroscedacity, and the concept of partial correlation. Sir Arnold Fisher (1890-1962) introduced variance, methods for small samples, factorial designs, the null hypothesis, random allocation, ANOVA, ANCOVA, relation between regression and ANOVA, and testing significance of the regression coefficient. Karl Pearson and RA Fisher developed contingency table analysis using the chi-square test. Adolph Quetelet developed vital statistics in its modern form and introduced the concept of the mean. KF Gauss (1777-1855) introduced the median, re-discovered the normal distribution that has independently been discovered before Pierre Simon Marquis de Laplace (1749-1827) and in 1733 by Abraham de Moivre (1667-1754). Sir Francis Galton used the term ‘normal’ to refer to the curve, applied statistical techniques to natural phenomena, described correlation and regression. W.F. Sheppard introduced the standard normal curve in 1899. C Kremp published the first table of the area under the curve in 1799. J Neyman developed the concept of confidence intervals in 1934. Charles Spearman (1863-1945) and Maurice George Kendall (1907-1983) introduced non-parametric tests.


The bulk of statistical theory is probability theory since modern inferential statistics depends on probability theory. Christian Huygens (1629-1695) was the first one to publish on probability and games. Modern probability theory owes a lot to the pioneers: Blaise Pascal (1623-1662), Pierre de Fermat (1601-1665), Jacques Bernoulli (1654-1705), Nicolas Bernoulli (1687-1759), Abraham de Moivre (1667-1754), Pierre Raymond de Montmart (1678-1719), and Pierre Simon Marquis de Laplace (1749-1827).



An investigator starts with a substantive question that is formulated as a statistical question. Data is then collected and is analyzed to reach a statistical conclusion. The statistical conclusion is used with other knowledge to reach a substantive conclusion.

Statistics has several limitations. It gives statistical and not substantive answers. The statistical conclusion refers to groups and not individuals. It only summarizes but does not interpret data.


Statistics can be misused by selective presentation of desired results. Computation is not an end in itself. It is a tool that can be used well or can be mis-used. A human must have a clear idea of what is required of the computer and must instruct it accordingly. The human must also be able to intelligently interpret the output from the computer.  All who tinker with computers must remember the adage ‘rubbish in/rubbish out’.





Using the internet to get more information on the following topics

1.       History of population censuses

2.       Brief biographies and contributions of the following pioneers of bio-statistics: Fisher, John Graunt, William Farr, Quetelet, La Place, Bernoulli.

3.       Review the abstract and methodology sections of one journal article in a current issue of any medical journal and draw a table showing the frequency with which the following statistical terms are used: t-test, chi-square test, linear regression, logistic regression, analysis of variance, and p-value.

4.       List at 2-3 statistical packages or programs that are used for (a) data management (b) data analysis (c) both data management and data analysis

5.       Find out and describe the following information about a computer: Random access memory, hard disk memory, speed, type of chip used, byte, bit, local area network, server



Prepare a mailed questionnaire and collect the following data on members of the class:

               Identifying information: ID (not real), gender, year of study

               Sociodemographic information: home address (urban/rural), region of origin (East Coast, West Coast & Central, North, South, Other), primary school (religious, private, public)

               Family information: number of siblings, paternal grandfather living now? (yes/no) and if dead age at death

               Wearing glasses for refractive errors (yes/no), age at which glasses were first prescribed, does the father wear glasses (yes/no), does the mother wear glasses (yes/no), does any sibling wear glasses (yes/no)

               Color preference (choose one color only)

               Desire to specialize or work as a general practitioner (yes/no)

               Ideal age for marriage

Desired number of children

Prof Omar Hasan Kasule Sr. August 2005