Chapter 2
The Roleof Statistics and
the Data Analysis Process
by Alexis J. Abella I, PhD
2.
Statistics is thescientific discipline
that provides methods to help us make sense
of data.
Statistical methods are used in
business, medicine, agriculture, social
sciences, natural sciences, and applied
sciences, such as engineering.
3.
Three Reasons toStudy Statistics
Being Informed
“How do we
decide whether
claims based on
numerical
information are
reasonable?”
Making
Informed
Judgments
“Throughout
your personal
and
professional life,
you will need to
understand
statistical
information and
make informed
decisions using
this
Evaluating
Decisions That
Affect Your Life
4.
The Nature andRole of Variability
Statistics is a science whose focus is on
collecting, analyzing, and drawing conclusions
from data.
Suppose that every student took the same number of
units, spent the same amount of money on textbooks this
semester, and favored increasing student fees to support
expanding library services. For this population, there is no
variability in the number of units, amount spent on books, or
student opinion on the fee increase.
5.
The Nature andRole of Variability
Example. Monitoring Water Quality
6.
Statistics and theData
Analysis Process
Methods for organizing and summarizing
data make up the branch of statistics called
descriptive statistics.
For example, the admissions director at a large
university might be interested in learning why some
applicants who were accepted for the fall 2006
term failed to enroll at the university. The population
of interest to the director consists of all accepted
applicants who did not enroll in the fall 2006 term.
Because this population is large and it may be
difficult to contact all the individuals, the director
might decide to collect data from only 300 selected
students. These 300 students constitute a sample.
7.
Statistics and theData
Analysis Process
The entire collection of individuals or objects
about which information is desired is called
the population of interest. A sample is a
subset of the population, selected for study
in some prescribed manner.
8.
Statistics and theData
Analysis Process
The second major branch of
statistics, inferential statistics,
involves generalizing from a
sample to the population from
which it was selected.
9.
Sample size
Sample sizeis a count the
of individual samples or
observations in any statistical
setting, such as a scientific
experiment or a public opinion
survey.
Sample size measures the
number of individual samples
measured or observations used in a
survey or experiment.
10.
Calculation of SampleSize
To determine the sample
size needed for an experiment
or survey, researchers take a
number of desired factors into
account. First, the total size of
the population being studied
must be considered.
Researchers will also need to
consider the margin of error,
the reliability that the data
collected is generally accurate;
and the confidence level, the
probability that your margin of
error is accurate.
11.
Dangers of SmallSample Size
Large sample sizes are
needed for a statistic to be
accurate and reliable, especially
if its findings are to be
extrapolated to a larger
population or group of data.
12.
How to DetermineSample Size
Design your experiment
Calculate the population size
Specify the level of accuracy you
want from your research.
Calculate your ideal sample size.
13.
Sample Size Estimationusing
Yamane and Cochran
The Yamane sample size states
that:
where is your Yamane sample size, is
your underlying population size and
is determined from the confidence you
are seeking from your study. That is, if
you want to be 95% sure about the
results of your study then .
14.
Example: Find thesample size the researcher wants
to include in her study if the population size of her
respondents 4,750 at 95% accuracy.
15.
Sample Size Estimationusing
Yamane and Cochran
Cochran’s formula is considered
especially appropriate in
situations with large
populations. A sample of any
given size provides more
information about a smaller
population than a larger one, so
there’s a ‘correction’ through
which the number given by
Cochran’s formula can be
reduced if the whole population
is relatively small.
17.
Example:
Suppose we aredoing a study on the inhabitants of a large town and
want to find out how many households serve breakfast in the mornings. We
don’t have much information on the subject to begin with, so we’re going to
assume that half of the families serve breakfast: this gives us maximum
variability. So, p = 0.5. Now let’s say we want 95% confidence, and at least 5
percent—plus or minus—precision. A 95 % confidence level gives us Z values
of 1.96, per the normal tables, so we get:
Z – 1.96
p – 50% (half of the families serve breakfast) = 0.5
q – (1 – 0.5) = 0.5
18.
So, a randomsample of 384 households in our target population
should be enough to give us the confidence levels we need.
19.
The Cochran’s formula(if population is known):
In our earlier example, if there were just 1000
households in the target population, we would
calculate.
20.
If no is385 and N is equal 1000, then n is:
So, for this smaller population, all we need are 278
households in our sample: a substantially smaller sample
size.