Chapter 2
The Role of Statistics and
the Data Analysis Process
by Alexis J. Abella I, PhD
Statistics is the scientific discipline
that provides methods to help us make sense
of data.
Statistical methods are used in
business, medicine, agriculture, social
sciences, natural sciences, and applied
sciences, such as engineering.
Three Reasons to Study Statistics
Being Informed
“How do we
decide whether
claims based on
numerical
information are
reasonable?”
Making
Informed
Judgments
“Throughout
your personal
and
professional life,
you will need to
understand
statistical
information and
make informed
decisions using
this
Evaluating
Decisions That
Affect Your Life
The Nature and Role of Variability
Statistics is a science whose focus is on
collecting, analyzing, and drawing conclusions
from data.
Suppose that every student took the same number of
units, spent the same amount of money on textbooks this
semester, and favored increasing student fees to support
expanding library services. For this population, there is no
variability in the number of units, amount spent on books, or
student opinion on the fee increase.
The Nature and Role of Variability
Example. Monitoring Water Quality
Statistics and the Data
Analysis Process
Methods for organizing and summarizing
data make up the branch of statistics called
descriptive statistics.
For example, the admissions director at a large
university might be interested in learning why some
applicants who were accepted for the fall 2006
term failed to enroll at the university. The population
of interest to the director consists of all accepted
applicants who did not enroll in the fall 2006 term.
Because this population is large and it may be
difficult to contact all the individuals, the director
might decide to collect data from only 300 selected
students. These 300 students constitute a sample.
Statistics and the Data
Analysis Process
The entire collection of individuals or objects
about which information is desired is called
the population of interest. A sample is a
subset of the population, selected for study
in some prescribed manner.
Statistics and the Data
Analysis Process
The second major branch of
statistics, inferential statistics,
involves generalizing from a
sample to the population from
which it was selected.
Sample size
Sample size is a count the
of individual samples or
observations in any statistical
setting, such as a scientific
experiment or a public opinion
survey.
Sample size measures the
number of individual samples
measured or observations used in a
survey or experiment.
Calculation of Sample Size
To determine the sample
size needed for an experiment
or survey, researchers take a
number of desired factors into
account. First, the total size of
the population being studied
must be considered.
Researchers will also need to
consider the margin of error,
the reliability that the data
collected is generally accurate;
and the confidence level, the
probability that your margin of
error is accurate.
Dangers of Small Sample Size
Large sample sizes are
needed for a statistic to be
accurate and reliable, especially
if its findings are to be
extrapolated to a larger
population or group of data.
How to Determine Sample Size
Design your experiment
Calculate the population size
Specify the level of accuracy you
want from your research.
Calculate your ideal sample size.
Sample Size Estimation using
Yamane and Cochran
The Yamane sample size states
that:
where is your Yamane sample size, is
your underlying population size and
is determined from the confidence you
are seeking from your study. That is, if
you want to be 95% sure about the
results of your study then .
Example: Find the sample size the researcher wants
to include in her study if the population size of her
respondents 4,750 at 95% accuracy.
Sample Size Estimation using
Yamane and Cochran
Cochran’s formula is considered
especially appropriate in
situations with large
populations. A sample of any
given size provides more
information about a smaller
population than a larger one, so
there’s a ‘correction’ through
which the number given by
Cochran’s formula can be
reduced if the whole population
is relatively small.
Example:
Suppose we are doing a study on the inhabitants of a large town and
want to find out how many households serve breakfast in the mornings. We
don’t have much information on the subject to begin with, so we’re going to
assume that half of the families serve breakfast: this gives us maximum
variability. So, p = 0.5. Now let’s say we want 95% confidence, and at least 5
percent—plus or minus—precision. A 95 % confidence level gives us Z values
of 1.96, per the normal tables, so we get:
Z – 1.96
p – 50% (half of the families serve breakfast) = 0.5
q – (1 – 0.5) = 0.5
So, a random sample of 384 households in our target population
should be enough to give us the confidence levels we need.
The Cochran’s formula (if population is known):
In our earlier example, if there were just 1000
households in the target population, we would
calculate.
If no is 385 and N is equal 1000, then n is:
So, for this smaller population, all we need are 278
households in our sample: a substantially smaller sample
size.

MUP-501-Chapter-2- Data and Analytics Intro

  • 1.
    Chapter 2 The Roleof Statistics and the Data Analysis Process by Alexis J. Abella I, PhD
  • 2.
    Statistics is thescientific discipline that provides methods to help us make sense of data. Statistical methods are used in business, medicine, agriculture, social sciences, natural sciences, and applied sciences, such as engineering.
  • 3.
    Three Reasons toStudy Statistics Being Informed “How do we decide whether claims based on numerical information are reasonable?” Making Informed Judgments “Throughout your personal and professional life, you will need to understand statistical information and make informed decisions using this Evaluating Decisions That Affect Your Life
  • 4.
    The Nature andRole of Variability Statistics is a science whose focus is on collecting, analyzing, and drawing conclusions from data. Suppose that every student took the same number of units, spent the same amount of money on textbooks this semester, and favored increasing student fees to support expanding library services. For this population, there is no variability in the number of units, amount spent on books, or student opinion on the fee increase.
  • 5.
    The Nature andRole of Variability Example. Monitoring Water Quality
  • 6.
    Statistics and theData Analysis Process Methods for organizing and summarizing data make up the branch of statistics called descriptive statistics. For example, the admissions director at a large university might be interested in learning why some applicants who were accepted for the fall 2006 term failed to enroll at the university. The population of interest to the director consists of all accepted applicants who did not enroll in the fall 2006 term. Because this population is large and it may be difficult to contact all the individuals, the director might decide to collect data from only 300 selected students. These 300 students constitute a sample.
  • 7.
    Statistics and theData Analysis Process The entire collection of individuals or objects about which information is desired is called the population of interest. A sample is a subset of the population, selected for study in some prescribed manner.
  • 8.
    Statistics and theData Analysis Process The second major branch of statistics, inferential statistics, involves generalizing from a sample to the population from which it was selected.
  • 9.
    Sample size Sample sizeis a count the of individual samples or observations in any statistical setting, such as a scientific experiment or a public opinion survey. Sample size measures the number of individual samples measured or observations used in a survey or experiment.
  • 10.
    Calculation of SampleSize To determine the sample size needed for an experiment or survey, researchers take a number of desired factors into account. First, the total size of the population being studied must be considered. Researchers will also need to consider the margin of error, the reliability that the data collected is generally accurate; and the confidence level, the probability that your margin of error is accurate.
  • 11.
    Dangers of SmallSample Size Large sample sizes are needed for a statistic to be accurate and reliable, especially if its findings are to be extrapolated to a larger population or group of data.
  • 12.
    How to DetermineSample Size Design your experiment Calculate the population size Specify the level of accuracy you want from your research. Calculate your ideal sample size.
  • 13.
    Sample Size Estimationusing Yamane and Cochran The Yamane sample size states that: where is your Yamane sample size, is your underlying population size and is determined from the confidence you are seeking from your study. That is, if you want to be 95% sure about the results of your study then .
  • 14.
    Example: Find thesample size the researcher wants to include in her study if the population size of her respondents 4,750 at 95% accuracy.
  • 15.
    Sample Size Estimationusing Yamane and Cochran Cochran’s formula is considered especially appropriate in situations with large populations. A sample of any given size provides more information about a smaller population than a larger one, so there’s a ‘correction’ through which the number given by Cochran’s formula can be reduced if the whole population is relatively small.
  • 17.
    Example: Suppose we aredoing a study on the inhabitants of a large town and want to find out how many households serve breakfast in the mornings. We don’t have much information on the subject to begin with, so we’re going to assume that half of the families serve breakfast: this gives us maximum variability. So, p = 0.5. Now let’s say we want 95% confidence, and at least 5 percent—plus or minus—precision. A 95 % confidence level gives us Z values of 1.96, per the normal tables, so we get: Z – 1.96 p – 50% (half of the families serve breakfast) = 0.5 q – (1 – 0.5) = 0.5
  • 18.
    So, a randomsample of 384 households in our target population should be enough to give us the confidence levels we need.
  • 19.
    The Cochran’s formula(if population is known): In our earlier example, if there were just 1000 households in the target population, we would calculate.
  • 20.
    If no is385 and N is equal 1000, then n is: So, for this smaller population, all we need are 278 households in our sample: a substantially smaller sample size.