BIOSTATISTICS
PHR 211
Lecture 1
Chapter 1: Data and Statistics
◦ Data
◦ Data sources
◦ Types of statistics
◦ Nature of data
◦ Levels of measurement
Definition
◦ The word statistics has two meanings. In the most common usage, statistics refers to
numerical facts. The second meaning of statistics refers to the field or discipline of study.
In this sense, statistics is a group of methods used to collect, analyze, present and
interpret data and make decisions.
Why Do We Need Statistics?
◦ The study of statistics has become more popular than ever over the past four
decades or so. The increasing availability of computers and statistical packages
has enlarged the role of statistics as a tool of empirical research.
◦ Like almost all fields of study, statistics has two aspects: Theoretical and applied.
The former, also called mathematical statistics deals with the development of
theorems, formulas, rules, and laws. The latter involves the application of those
theorems, formulas, rules and laws. The main aim of this lecture is to introduce
statistics including the nature of data as well as the levels of measurement that
can be used.
Functions of Statistics
◦ Statistics provides methods for:
◦ Design: Planning and carrying out research studies.
◦ Description: Summarizing and exploring data
◦ Inference: Making predictions and generalizations about phenomena represented
by data.
Definitions
◦ Data are the facts and figures collected, summarized, analyzed, and interpreted.
◦ The data collected in a particular study are referred to as the data set.
◦ The elements are the entities on which data are collected.
◦ A variable is a characteristic of interest for the elements.
◦ The set of measurements collected for a particular element is called an
observation.
In statistics, we commonly use these key terms:
◦ Population is the complete collection of elements to be studied.
◦ Sample is a sub collection of elements drawn from a population.
◦ Variables: numerical or categorical
◦ Data are the actual values of the variable. They may be numbers or words
Types of Statistics
◦ Broadly speaking, statistics can be divided into two areas: descriptive
statistics and inferential statistics.
◦ Descriptive statistics consists of methods for organizing, displaying
and describing data by using tables, graphs and summary methods.
◦ Inferential statistics consists of methods that use sample results to
help make decisions or prediction about a population.
Data Sources
Existing Sources:
◦ Within an organization – almost any department Database services – NCBI
◦ Government agencies- Bangladesh Bureau of Statistics
◦ Industry associations – Bangladesh Association of Pharmaceutical Industries
◦ Special-interest organizations – Pharmacy Council of Bangladesh
◦ Internet – more and more organizations/firms
Statistical Studies
◦ In experimental studies the variable of interest is first identified. Then one or
more other variables are identified and controlled so that data can be obtained
about how they influence the variable of interest.
◦ In observational (non-experimental) studies no attempt is made to control or
influence the variables of interest e.g. a survey
Nature of Data
Two types of data can be identified as qualitative and quantitative
data.
1. Qualitative data deals with characteristics and descriptors that cannot be easily
measured.
◦ It can be separated into different categories that are distinguished by some non-
numerical characteristics.
◦ Qualitative data are the result of categorizing or describing attributes of a
population. Ethnic group, hair colour, blood type are all types of qualitative
data. They are generally described by words or letters.
Quantitative data
Discrete data
(counts)
Continous data
(measurements)
1. Quantitative data consist of number representing counts and
measurements.
◦ Discrete data (counts) have finite values such as sex and race and can be
grouped into mutually exclusive categories.
◦ For example, the number of students in class or the number of children in a
family ( you can’t have 2.5 children)
◦ Continuous data (measurements) is quantitative data that can be
measured but not counted. It has an infinite number of possible values within
a selected range.
◦ For example, age, height, weight are infinitely divisible and do not have
specific finite values.
◦ The statistical test to apply to data depends on whether the variables are
discrete or continuous.
Levels of measurement of data
The way a set of data is measured is called its level of measurement.
Data can classified into four levels of measurement. They are :
1. Nominal scale level
2. Ordinal scale level
3. Interval scale level
4. Ratio scale level
1. Nominal scale level:
◦ Data that is measured using a nominal scale is qualitative It is characterized by
data that consists of names, labels or categories. Nominal data commonly
identifies groups of two members, e.g. male or female, left or right, young or
old, yes or no, etc. Nominal data are not ordered and cannot be used in
calculations.
2. Ordinal scale level:
◦ This scale is similar to nominal scale but it is different as data can be ordered.
For example, when responses are ordered from the desired responses to the
least desired one: excellent, good, satisfactory, unsatisfactory.
* Like the nominal scale, ordinal scale data cannot be used in calculations.
3. The interval scale level
◦ Like the ordinal, with the additional property that meaningful amounts of
differences between data can be determined. However, there is no natural
zero starting point. In other words, the interval scale has a definite ordering,
the difference between interval scale data can be measured, but there is no
starting point.
Example: Temperature scales like Celsius (C) are measured by using the
interval scale. In both temperatures, 40 degree is equal to 100 degrees minus
60 degrees. Differences make sense. pH is also an example of an interval scale.
*Zero is not the absolute lowest temperature.
*This kind of data can be used in calculations.
4. The Ratio scale level
◦ Like the interval level but, in addition, it has a 0 point and ratios can be
calculated. For example, the final exam scores are 18, 15, 10 and 9 (out of
20). This scale must contain a zero value that indicates that nothing exists for
the variable at the zero point.
*The data can be put in order: 9, 10, 15 an 18
*The difference between data have meaning:
◦ the difference between score 18 and 9 is 9 points.
◦ Ratios can be calculated: The smallest ratio score is 0.
◦ So, 9 is twice 18. The score of 18 is better than the score of 9.
Interval and ratio measurement levels are the most desirable as we can use the
more powerful statistical procedures available for means and standard deviations.
Data anaylsis_introduction Data_and_Statistics.pdf

Data anaylsis_introduction Data_and_Statistics.pdf

  • 1.
  • 2.
    Chapter 1: Dataand Statistics ◦ Data ◦ Data sources ◦ Types of statistics ◦ Nature of data ◦ Levels of measurement
  • 3.
    Definition ◦ The wordstatistics has two meanings. In the most common usage, statistics refers to numerical facts. The second meaning of statistics refers to the field or discipline of study. In this sense, statistics is a group of methods used to collect, analyze, present and interpret data and make decisions.
  • 4.
    Why Do WeNeed Statistics? ◦ The study of statistics has become more popular than ever over the past four decades or so. The increasing availability of computers and statistical packages has enlarged the role of statistics as a tool of empirical research. ◦ Like almost all fields of study, statistics has two aspects: Theoretical and applied. The former, also called mathematical statistics deals with the development of theorems, formulas, rules, and laws. The latter involves the application of those theorems, formulas, rules and laws. The main aim of this lecture is to introduce statistics including the nature of data as well as the levels of measurement that can be used.
  • 5.
    Functions of Statistics ◦Statistics provides methods for: ◦ Design: Planning and carrying out research studies. ◦ Description: Summarizing and exploring data ◦ Inference: Making predictions and generalizations about phenomena represented by data.
  • 6.
    Definitions ◦ Data arethe facts and figures collected, summarized, analyzed, and interpreted. ◦ The data collected in a particular study are referred to as the data set. ◦ The elements are the entities on which data are collected. ◦ A variable is a characteristic of interest for the elements. ◦ The set of measurements collected for a particular element is called an observation.
  • 8.
    In statistics, wecommonly use these key terms: ◦ Population is the complete collection of elements to be studied. ◦ Sample is a sub collection of elements drawn from a population. ◦ Variables: numerical or categorical ◦ Data are the actual values of the variable. They may be numbers or words
  • 9.
    Types of Statistics ◦Broadly speaking, statistics can be divided into two areas: descriptive statistics and inferential statistics. ◦ Descriptive statistics consists of methods for organizing, displaying and describing data by using tables, graphs and summary methods. ◦ Inferential statistics consists of methods that use sample results to help make decisions or prediction about a population.
  • 10.
    Data Sources Existing Sources: ◦Within an organization – almost any department Database services – NCBI ◦ Government agencies- Bangladesh Bureau of Statistics ◦ Industry associations – Bangladesh Association of Pharmaceutical Industries ◦ Special-interest organizations – Pharmacy Council of Bangladesh ◦ Internet – more and more organizations/firms
  • 11.
    Statistical Studies ◦ Inexperimental studies the variable of interest is first identified. Then one or more other variables are identified and controlled so that data can be obtained about how they influence the variable of interest. ◦ In observational (non-experimental) studies no attempt is made to control or influence the variables of interest e.g. a survey
  • 12.
    Nature of Data Twotypes of data can be identified as qualitative and quantitative data. 1. Qualitative data deals with characteristics and descriptors that cannot be easily measured. ◦ It can be separated into different categories that are distinguished by some non- numerical characteristics. ◦ Qualitative data are the result of categorizing or describing attributes of a population. Ethnic group, hair colour, blood type are all types of qualitative data. They are generally described by words or letters.
  • 13.
  • 14.
    1. Quantitative dataconsist of number representing counts and measurements. ◦ Discrete data (counts) have finite values such as sex and race and can be grouped into mutually exclusive categories. ◦ For example, the number of students in class or the number of children in a family ( you can’t have 2.5 children) ◦ Continuous data (measurements) is quantitative data that can be measured but not counted. It has an infinite number of possible values within a selected range. ◦ For example, age, height, weight are infinitely divisible and do not have specific finite values. ◦ The statistical test to apply to data depends on whether the variables are discrete or continuous.
  • 15.
    Levels of measurementof data The way a set of data is measured is called its level of measurement. Data can classified into four levels of measurement. They are : 1. Nominal scale level 2. Ordinal scale level 3. Interval scale level 4. Ratio scale level
  • 16.
    1. Nominal scalelevel: ◦ Data that is measured using a nominal scale is qualitative It is characterized by data that consists of names, labels or categories. Nominal data commonly identifies groups of two members, e.g. male or female, left or right, young or old, yes or no, etc. Nominal data are not ordered and cannot be used in calculations. 2. Ordinal scale level: ◦ This scale is similar to nominal scale but it is different as data can be ordered. For example, when responses are ordered from the desired responses to the least desired one: excellent, good, satisfactory, unsatisfactory. * Like the nominal scale, ordinal scale data cannot be used in calculations.
  • 17.
    3. The intervalscale level ◦ Like the ordinal, with the additional property that meaningful amounts of differences between data can be determined. However, there is no natural zero starting point. In other words, the interval scale has a definite ordering, the difference between interval scale data can be measured, but there is no starting point. Example: Temperature scales like Celsius (C) are measured by using the interval scale. In both temperatures, 40 degree is equal to 100 degrees minus 60 degrees. Differences make sense. pH is also an example of an interval scale. *Zero is not the absolute lowest temperature. *This kind of data can be used in calculations.
  • 18.
    4. The Ratioscale level ◦ Like the interval level but, in addition, it has a 0 point and ratios can be calculated. For example, the final exam scores are 18, 15, 10 and 9 (out of 20). This scale must contain a zero value that indicates that nothing exists for the variable at the zero point. *The data can be put in order: 9, 10, 15 an 18 *The difference between data have meaning: ◦ the difference between score 18 and 9 is 9 points. ◦ Ratios can be calculated: The smallest ratio score is 0. ◦ So, 9 is twice 18. The score of 18 is better than the score of 9. Interval and ratio measurement levels are the most desirable as we can use the more powerful statistical procedures available for means and standard deviations.