DESCRIPTIVE AND
INFERENTIAL
STATISTICS
• BY: ROMMEL LUIS C. ISRAEL III
BY: ROMMEL LUIS C. ISRAEL III
1
•Review on
descriptive
statistics
•Hypothesis
Testing
BY: ROMMEL LUIS C. ISRAEL III
2
ILO:
DEMONSTRATE ABILITY TO SOLVE PROBLEMS
UTILIZING DESCRIPTIVE STATISTICS AND
SAMPLING, AND HYPOTHESIS TESTING.
PREPARATION: Buzz Session (2 activities)
PRESENTATION: Teacher-made PPT and video clips
PRACTICE: Mean, media, mode, range, variance, SD
PERFORMANCE: Analyzing Research Articles (1 examination)
Appreciate the process of solving statistical
problems in descriptive and inferential
statistics.
BY: ROMMEL LUIS C. ISRAEL III
3
PREPARATION:
• What is statistics?
• Differentiate between descriptive and inferential
statistics.
• Enumerate the ways data sets are presented and
summarize.
• Give a brief description of the following:
• Mean
• Median
• Mode
• Range
• Variance
• Standard Deviation
BY: ROMMEL LUIS C. ISRAEL III
4
WHAT IS
STATISTICS?
Statistics is the science concerned with
developing and studying methods for
collecting, analyzing, interpreting and
presenting empirical data.
(https://www.stat.uci.edu/what-is-
Statistics is a branch of applied
that involves the collection, description,
analysis, and inference of conclusions
quantitative data.
(https://www.investopedia.com/terms/s/
stics.asp)
Statistics is the discipline that concerns
collection, organization, analysis,
interpretation, and presentation of data.
(https://en.wikipedia.org/wiki/Statistics)
BY: ROMMEL LUIS C. ISRAEL III
5
• https://datatab
• .net/tutorial/de
scriptive- inferential-
statistics
BY: ROMMEL LUIS C. ISRAEL III
6
• https://www.pint erest.ph/pin/5814
57001869645007/
BY: ROMMEL LUIS C. ISRAEL III
7
https://medium.datadrive
ninvestor.com/ what-is-
descriptive-statistics-
85890bc451ae
BY: ROMMEL LUIS C. ISRAEL III
8
BY: ROMMEL LUIS C. ISRAEL III
9
https://www.ma
kemyassignmen
ts.co
m/blog/what-
are-descriptive-
statistics-when-
to-use-them-
and- why/
BY: ROMMEL LUIS C. ISRAEL III
10
• MEASURES OF FREQUENCY
• Measures of Frequency provide us with
the most basic kind of information
which is how often something occurs.
This most basic kind of information helps
us in forming very simple tables and
graphs by using very simple arithmetic
calculations such as count, percentage
etc. Thus we get to know about the
frequency of values in the data that
provide us with frequency distribution
and when this distribution is plotted on
a graph, it leads us to the concepts of
Measures of Central Tendency.
BY: ROMMEL LUIS C. ISRAEL III
11
• 1. Central Tendency
• Central tendency (also called measures
of location or central location) is a
method to
• describe what’s typical for a group (set)
of data.
• It means central tendency doesn’t
show us what is typical about each one
piece of data,
• but it gives us an overview of the
whole picture of the entire data set.
• It tells us what is normal or average for
a given set of data. There are three key
methods to show central tendency:
mean, mode, and median.
BY: ROMMEL LUIS C. ISRAEL III
12
• MEASURES OF CENTRAL TENDENCY
• Measures of Central Tendency is that kind of
descriptive statistics that allow us to describe
our data with a single value. This value is
generally the number that acquires the
central positions in the data set. This value
can be calculated by using Mean, Median and
Mode which form the different Measures of
Central Tendency and each of these measures
have their own importance and are used in
different situations. Also, it is important to
remember, that Measures of Central Tendency
is also known as Measures of Central
Location.
BY: ROMMEL LUIS C. ISRAEL III
13
•Mean
AS THE NAME SUGGESTS, MEAN IS THE AVERAGE
OF A GIVEN SET OF NUMBERS. THE MEAN IS
CALCULATED IN TWO VERY EASY STEPS:
1.Find the whole sum as add the data together
2.Divide the sum by the total number of data
•Median
Simply said, the median is the middle value in a
data set. As you might guess, in order to calculate
the middle, you need:
– first listing the data in a numerical order
–second, locating the value in the middle of the list.
•Mode
The mode of a set of data is the
number in the set that
occurs most often.
BY: ROMMEL LUIS C. ISRAEL III
14
Measures of dispersion do a lot more – they complement the averages and
allow us to interpret them much better.
Dispersion in statistics describes the spread of the data values in a given dataset.
In
other words, it shows how the data is “dispersed” around the mean (the central
value).
2. Dispersion
Central tendency tells us
important information but it
doesn’t show everything we
want to know about average
values. Central tendency fails to
reveal the extent to which the
values of the individual items
differ in a data set.
BY: ROMMEL LUIS C. ISRAEL III
15
• MEASURES OF VARIABILITY
• These descriptive statistics help in
explaining the spread of the data.
Different Measures of Variability are
used to explain how the data is
distributed. This distribution will later
help in understanding how Inferential
Statistics can be used to draw various
conclusions about the population. The
Measures of Variability are Range,
Variance, Standard Deviation etc. and
these play an important role in almost
all kind of advanced statistics.
BY: ROMMEL LUIS C. ISRAEL III
16
THE RANGE IS SIMPLY THE DIFFERENCE
BETWEEN THE LARGEST AND
SMALLEST VALUE IN A DATA SET. IT
SHOWS HOW MUCH VARIATION FROM
THE AVERAGE EXISTS.
You might guess that low range tells us that the data
points are very close to the mean. And a high range shows
the opposite.
Here is the formula for calculating the range:
Range = max. value – min. value
•The Range
BY: ROMMEL LUIS C. ISRAEL III
17
•The StandardDeviation
STANDARD DEVIATION ALSO PROVIDES INFORMATION ON HOW MUCH
VARIATION FROM THE MEAN EXISTS. HOWEVER, THE STANDARD
DEVIATION GOES FURTHER THAN RANGE AND SHOWS HOW EACH VALUE IN A
DATASET VARIES FROM THE MEAN.
As in the Range, a low standard deviation tells us that the data points are very
close to the mean. And a high standard deviation shows the opposite.
BY: ROMMEL LUIS C. ISRAEL III
18
•Variance measures how far a data set is
spread out. It is mathematically defined
as the average of the squared differences
from the mean.
BY: ROMMEL LUIS C. ISRAEL III
19
INFERENTIAL STATISTICS
• When we want to draw a conclusion about the whole
population, it is a great deal to know what are the different types of
calculation of inferential statistics.
• Inferential statistics is a technique used to draw conclusions and
trends about a large population based on a sample taken from it
• Linear Regression Analysis (included in Content 3)
• Linear regression models show a relationship between two
variables with a linear algorithm.
• Linear regression is a statistical method for studying relationships
between one or
• more independent variables (X) and one dependent variable (Y).
• To say it another way, it is a mathematical modeling
which lets you make predictions for the value of Y
depending on the different values of X.
• Simple linear regression – when there is only one
independent variable X which changes lead to different
values for Y.
• Multiple linear regression is used to show the
relationship between one dependent variable and two
or more independent variables.
BY: ROMMEL LUIS C. ISRAEL III
20
• Logistic Regression Analysis (included in Content 4)
• Logistic regression (also known as logit
regression) is a regression model where the
dependent variable is categorical.
• Logistic regression is conducted when the
dependent variable is dichotomous (i.e the
• dependent variable has only two possible values).
• Examples of dichotomous (binary) variables are: 0
and 1, Yes and No.
• As the other linear regression models, the logistic
regression is a predictive analysis. It aims to find the
best fitting model to describe the relationship
between the dichotomous characteristics of a
dependent variable and a set of independent
variables.
• Analysis of Variance (ANOVA) (included in Content 2)
• Analysis of Variance (ANOVA) is a popular
statistical method used to test and analyze
differences between two or more means
(averages). It searches significant differences between
BY: ROMMEL LUIS C. ISRAEL III
21
• 4. Analysis of Covariance (ANCOVA)
• When a continuous covariate is included in an ANOVA we
have ANCOVA (just to remind that a covariate is a continuous
independent variable). The continuous covariates enter the
model as regression variables.
• To put in another way, ANCOVA blends ANOVA and regression.
• ANCOVA is a type of inferential statistics modeling used in
studying the differences in the mean values of the dependent
variables. Those dependent variables relate to the impact of
the controlled independent variables while taking into
consideration the influence of the uncontrolled independent
variables.
5. Statistical Significance (T-Test) (included in Content 2)
The t-test compares two means (averages of 2 groups) and tells us if they are different
from
each other. The t-test also tells us how significant the differences are.
The t-test is used when comparing two groups on a given dependent variable.
BY: ROMMEL LUIS C. ISRAEL III
22
• Correlation Analysis (included in Content 3)
• Correlation analysis studies the strength of a relationship between two
variables. It is useful when you want to find out if there are possible connections
BY: ROMMEL LUIS C. ISRAEL III
23
Upcoming topics:
HypothesisTesting
• T-test
• ANOVA
BY: ROMMEL LUIS C. ISRAEL III
24
PREPARATION:
•What is a hypothesis?
•What is hypothesis testing?
•Some terminologies in hypothesis
testing
•Parameter
•Null hypothesis
•Alternative hypothesis
•One-tailed test
•Two-tailed test
•Test statistics
•Steps in hypothesis testing.
BY: ROMMEL LUIS C. ISRAEL III
25
HYPOTHESIS
A hypothesis is a proposition that is consistent with known data,
but has been neither verified nor shown to be false.
https://mathworld.wolfram.com/Hypothesis.html
In mathematics, a hypothesis is an unproven statement
which is supported by all the available data and by many
weaker results. https://planetmath.org/hypothesis
A statement that might be true, which might then be tested.
https://www.mathsisfun.com/definitions/hypothesis.html
BY: ROMMEL LUIS C. ISRAEL III
26
EXAMPLES OF HYPOTHESES:
• The number of pets in a household is
unrelated to the number of people
living in it.
• If you get at least 6 hours of sleep,
you will do better on tests than if you
get less sleep.
• "Students who eat breakfast will
perform better on a math exam
than students who do not eat
breakfast."
BY: ROMMEL LUIS C. ISRAEL III
27
HYPOTHESIS AND
MATHEMATICS
• So where does mathematics enter into this picture? In many ways, both
obvious and subtle:
• A good hypothesis needs to be clear, precisely stated and testable in
some way. Creation of these clear hypotheses requires clear general
mathematical thinking.
• Hypothesistestinghttps://latrobe.libguides.com/maths/hypothesis-testing
• Hypothesis testingis a systematic procedure for deciding whether the
results of a research study support a particular theory which applies to a
population.
• Hypothesis testing uses sample data to evaluateahypothesisabouta
population.
• A hypothesis test assesses how unusual the result is, whether it is
reasonable chance variation or whether the result is too extreme to
be considered chance variation.
BY: ROMMEL LUIS C. ISRAEL III
28
LET’S LOOK AT THE TERMINOLOGY
THAT WE SHOULD BE
AWARE OF IN HYPOTHESIS TESTING
• 1. A Parameter is a summary description
of a fixed characteristic or measure of the
target population. A Parameter denotes
the true value that would be obtained if a
census rather than a sample were
undertaken
• Ex:
• Mean (μ),
• Variance (σ²),
• Standard Deviation (σ),
BY: ROMMEL LUIS C. ISRAEL III
29
2. H0: The null hypothesis: It is a statement of no difference between sample means or
proportions or no difference between a sample mean or proportion and a population mean or
proportion. In other words, the difference equals 0.
BY: ROMMEL LUIS C. ISRAEL III
30
3. Ha: The alternative hypothesis: It is a claim about the population that is contradictory
to H0 and what we conclude when we reject H0.
BY: ROMMEL LUIS C. ISRAEL III
31
BY: ROMMEL LUIS C. ISRAEL III
32
BY: ROMMEL LUIS C. ISRAEL III
33
BY: ROMMEL LUIS C. ISRAEL III
34
BY: ROMMEL LUIS C. ISRAEL III
35
BY: ROMMEL LUIS C. ISRAEL III
36
4. ONE-TAILED
TEST:
A one-tailed test is a statistical hypothesis test in which the critical area of a distribution
is one- sided so that it is either greater than or less than a certain value, but not both. If
the sample being tested falls into the one-sided critical area, the alternative hypothesis
will be accepted instead of the null hypothesis.
A one-tailed test is also known as a directional hypothesis or directional test.
Critical Region: The critical region is the region of values that corresponds to the rejection
of the null hypothesis at some chosen probability level.
BY: ROMMEL LUIS C. ISRAEL III
37
5. TWO-TAILED TEST:
A two-tailed test is a method in which
the critical area of a distribution is
two- sided and tests whether a
sample is greater than or less than a
certain range of values. If the sample
being tested falls into either of the
critical areas, the alternative
hypothesis is accepted instead of the
null hypothesis.
By convention, two-tailed tests are
used to determine significance at the
5% level, meaning each side of the
distribution is cut at 2.5%
BY: ROMMEL LUIS C. ISRAEL III
38
BY: ROMMEL LUIS C. ISRAEL III
39
• 6. Test Statistic:
• The test statistic measures how close the
sample has come to the null hypothesis. Its
observed value changes randomly from one
random sample to a different sample. A test
statistic contains information about the data
that is relevant for deciding whether to reject
the null hypothesis or not.
• Different hypothesis tests use different test
statistics based on the probability model
assumed in the null hypothesis. Common
tests and their test statistics include:
BY: ROMMEL LUIS C. ISRAEL III
40
TEST STATISTICS
HTTPS://WWW.ANALYTICSSTEPS.COM/BLOGS/WHAT-ARE-DIFFERENCES-BETWEEN-Z-TEST-AND-T-TEST
• Assuming that you already have formulated the null and alternative hypothesis
• Example:
• H0= 2000, and
• H1> 2000.
• Here the observed mean is >2000, and expected population mean is 2000.
• Next step would be to run test statistics that compare the value of both means.
• Here, the test statistic is a numerical summary of the data which is compared to what
would be expected under null hypothesis.
• It can take many forms such a s t-test (usually used when the dataset is small) or z-test etc
(preferred when the dataset is large), or ANOVA test, etc.
BY: ROMMEL LUIS C. ISRAEL III
41
• Level of significance is the amount of some percentage that is required to reject a null hypothesis
when it is true, it is denoted by 𝝰 (alpha). In general, alpha is taken as 1%, 5% and 10%.
• Confidence level: (1-𝝰) is accounted as confidence level in which null hypothesis exists when it is true.
• For instance, assuming the level of significance as 0.05, then smaller the p-value (generally p≤ 0.05),
rejecting the
null hypothesis. As this is a substantial confirmation against the null hypothesis that proves it is invalid.
• Also, if the p-value is greater than 0.05, accepting the null hypothesis. As this gives evidence
that alternate hypothesis is weak therefore null hypothesis can be accepted.
The p-value is only a piece of information that signifies the null hypothesis is
valid or
not.
Ideally, following rules are used in determining whether to support or reject the
null hypothesis;
•If p > 0.10 : the observed difference is “not significant”
•If p ≤ 0.10 : the observed difference is “marginally significant”
•If p ≤ 0.05 : the observed difference is “significant”
BY: ROMMEL LUIS C. ISRAEL III
42
WHAT IS Z-TEST?
Z-test is the statistical test,
• used to analyze whether two population means are
different or not when the variances are known and the
sample size is large.
• This test statistic is assumed to have a normal
distribution, and standard deviation must be known to
perform an accurate z-test.
• A z-statistic, or z-score, is a number representing
the value’s relationship to the mean of a group of
values, it is measured with population parameters
such as population standard deviation and used to
validate a hypothesis.
BY: ROMMEL LUIS C. ISRAEL III
43
BY: ROMMEL LUIS C. ISRAEL III
44
BY: ROMMEL LUIS C. ISRAEL III
45
WHAT IS T-TEST?
In order to know how significant the difference between two groups
are, T- test is used, basically it tells that difference (measured in
means) between two separate groups could have occurred by
chance.
This test assumes to have a normal distribution while based
on t- distribution, and population parameters such as mean, or
standard deviation are unknown.
The ratio between the difference between two groups and the
difference within the group is known as T-score. Greater is the t-
score, more is the difference between groups, and smaller is the
t-score, more similarities are there among groups.
For example, a t-score value of 2 indicates that the groups are two
times as different from each other as they are with each other.
BY: ROMMEL LUIS C. ISRAEL III
46
• Also, after running t-test, if the larger t-value
is obtained, it is highly likely that the
outcomes are more repeatable, such that
• A larger t-score states that groups are different
• A smaller t-score states that groups are similar.
• Mainly, there are three types of t-test:
• An Independent Sample t-test, compare the
means for two groups.
• APaired Sample t-test, compare means from
same group but at different times, such as six
months apart.
• AOne Sample t-test, test a mean of a group
against the known mean.
BY: ROMMEL LUIS C. ISRAEL III
47
BY: ROMMEL LUIS C. ISRAEL III
48
BY: ROMMEL LUIS C. ISRAEL III
49
BY: ROMMEL LUIS C. ISRAEL III
50
BY: ROMMEL LUIS C. ISRAEL III
51
BY: ROMMEL LUIS C. ISRAEL III
52
BY: ROMMEL LUIS C. ISRAEL III
53
BY: ROMMEL LUIS C. ISRAEL III
54
https://www.wallstreetmojo.com/z-test-vs-t-test/
BY: ROMMEL LUIS C. ISRAEL III
55
BY: ROMMEL LUIS C. ISRAEL III
56
BY: ROMMEL LUIS C. ISRAEL III
57
BY: ROMMEL LUIS C. ISRAEL III
58
BY: ROMMEL LUIS C. ISRAEL III
59
BY: ROMMEL LUIS C. ISRAEL III
60

DESCRIPTIVE AND INFERENTIAL STATISTICS

  • 1.
    DESCRIPTIVE AND INFERENTIAL STATISTICS • BY:ROMMEL LUIS C. ISRAEL III BY: ROMMEL LUIS C. ISRAEL III 1
  • 2.
  • 3.
    ILO: DEMONSTRATE ABILITY TOSOLVE PROBLEMS UTILIZING DESCRIPTIVE STATISTICS AND SAMPLING, AND HYPOTHESIS TESTING. PREPARATION: Buzz Session (2 activities) PRESENTATION: Teacher-made PPT and video clips PRACTICE: Mean, media, mode, range, variance, SD PERFORMANCE: Analyzing Research Articles (1 examination) Appreciate the process of solving statistical problems in descriptive and inferential statistics. BY: ROMMEL LUIS C. ISRAEL III 3
  • 4.
    PREPARATION: • What isstatistics? • Differentiate between descriptive and inferential statistics. • Enumerate the ways data sets are presented and summarize. • Give a brief description of the following: • Mean • Median • Mode • Range • Variance • Standard Deviation BY: ROMMEL LUIS C. ISRAEL III 4
  • 5.
    WHAT IS STATISTICS? Statistics isthe science concerned with developing and studying methods for collecting, analyzing, interpreting and presenting empirical data. (https://www.stat.uci.edu/what-is- Statistics is a branch of applied that involves the collection, description, analysis, and inference of conclusions quantitative data. (https://www.investopedia.com/terms/s/ stics.asp) Statistics is the discipline that concerns collection, organization, analysis, interpretation, and presentation of data. (https://en.wikipedia.org/wiki/Statistics) BY: ROMMEL LUIS C. ISRAEL III 5
  • 6.
    • https://datatab • .net/tutorial/de scriptive-inferential- statistics BY: ROMMEL LUIS C. ISRAEL III 6
  • 7.
  • 8.
  • 9.
    BY: ROMMEL LUISC. ISRAEL III 9
  • 10.
  • 11.
    • MEASURES OFFREQUENCY • Measures of Frequency provide us with the most basic kind of information which is how often something occurs. This most basic kind of information helps us in forming very simple tables and graphs by using very simple arithmetic calculations such as count, percentage etc. Thus we get to know about the frequency of values in the data that provide us with frequency distribution and when this distribution is plotted on a graph, it leads us to the concepts of Measures of Central Tendency. BY: ROMMEL LUIS C. ISRAEL III 11
  • 12.
    • 1. CentralTendency • Central tendency (also called measures of location or central location) is a method to • describe what’s typical for a group (set) of data. • It means central tendency doesn’t show us what is typical about each one piece of data, • but it gives us an overview of the whole picture of the entire data set. • It tells us what is normal or average for a given set of data. There are three key methods to show central tendency: mean, mode, and median. BY: ROMMEL LUIS C. ISRAEL III 12
  • 13.
    • MEASURES OFCENTRAL TENDENCY • Measures of Central Tendency is that kind of descriptive statistics that allow us to describe our data with a single value. This value is generally the number that acquires the central positions in the data set. This value can be calculated by using Mean, Median and Mode which form the different Measures of Central Tendency and each of these measures have their own importance and are used in different situations. Also, it is important to remember, that Measures of Central Tendency is also known as Measures of Central Location. BY: ROMMEL LUIS C. ISRAEL III 13
  • 14.
    •Mean AS THE NAMESUGGESTS, MEAN IS THE AVERAGE OF A GIVEN SET OF NUMBERS. THE MEAN IS CALCULATED IN TWO VERY EASY STEPS: 1.Find the whole sum as add the data together 2.Divide the sum by the total number of data •Median Simply said, the median is the middle value in a data set. As you might guess, in order to calculate the middle, you need: – first listing the data in a numerical order –second, locating the value in the middle of the list. •Mode The mode of a set of data is the number in the set that occurs most often. BY: ROMMEL LUIS C. ISRAEL III 14
  • 15.
    Measures of dispersiondo a lot more – they complement the averages and allow us to interpret them much better. Dispersion in statistics describes the spread of the data values in a given dataset. In other words, it shows how the data is “dispersed” around the mean (the central value). 2. Dispersion Central tendency tells us important information but it doesn’t show everything we want to know about average values. Central tendency fails to reveal the extent to which the values of the individual items differ in a data set. BY: ROMMEL LUIS C. ISRAEL III 15
  • 16.
    • MEASURES OFVARIABILITY • These descriptive statistics help in explaining the spread of the data. Different Measures of Variability are used to explain how the data is distributed. This distribution will later help in understanding how Inferential Statistics can be used to draw various conclusions about the population. The Measures of Variability are Range, Variance, Standard Deviation etc. and these play an important role in almost all kind of advanced statistics. BY: ROMMEL LUIS C. ISRAEL III 16
  • 17.
    THE RANGE ISSIMPLY THE DIFFERENCE BETWEEN THE LARGEST AND SMALLEST VALUE IN A DATA SET. IT SHOWS HOW MUCH VARIATION FROM THE AVERAGE EXISTS. You might guess that low range tells us that the data points are very close to the mean. And a high range shows the opposite. Here is the formula for calculating the range: Range = max. value – min. value •The Range BY: ROMMEL LUIS C. ISRAEL III 17
  • 18.
    •The StandardDeviation STANDARD DEVIATIONALSO PROVIDES INFORMATION ON HOW MUCH VARIATION FROM THE MEAN EXISTS. HOWEVER, THE STANDARD DEVIATION GOES FURTHER THAN RANGE AND SHOWS HOW EACH VALUE IN A DATASET VARIES FROM THE MEAN. As in the Range, a low standard deviation tells us that the data points are very close to the mean. And a high standard deviation shows the opposite. BY: ROMMEL LUIS C. ISRAEL III 18
  • 19.
    •Variance measures howfar a data set is spread out. It is mathematically defined as the average of the squared differences from the mean. BY: ROMMEL LUIS C. ISRAEL III 19
  • 20.
    INFERENTIAL STATISTICS • Whenwe want to draw a conclusion about the whole population, it is a great deal to know what are the different types of calculation of inferential statistics. • Inferential statistics is a technique used to draw conclusions and trends about a large population based on a sample taken from it • Linear Regression Analysis (included in Content 3) • Linear regression models show a relationship between two variables with a linear algorithm. • Linear regression is a statistical method for studying relationships between one or • more independent variables (X) and one dependent variable (Y). • To say it another way, it is a mathematical modeling which lets you make predictions for the value of Y depending on the different values of X. • Simple linear regression – when there is only one independent variable X which changes lead to different values for Y. • Multiple linear regression is used to show the relationship between one dependent variable and two or more independent variables. BY: ROMMEL LUIS C. ISRAEL III 20
  • 21.
    • Logistic RegressionAnalysis (included in Content 4) • Logistic regression (also known as logit regression) is a regression model where the dependent variable is categorical. • Logistic regression is conducted when the dependent variable is dichotomous (i.e the • dependent variable has only two possible values). • Examples of dichotomous (binary) variables are: 0 and 1, Yes and No. • As the other linear regression models, the logistic regression is a predictive analysis. It aims to find the best fitting model to describe the relationship between the dichotomous characteristics of a dependent variable and a set of independent variables. • Analysis of Variance (ANOVA) (included in Content 2) • Analysis of Variance (ANOVA) is a popular statistical method used to test and analyze differences between two or more means (averages). It searches significant differences between BY: ROMMEL LUIS C. ISRAEL III 21
  • 22.
    • 4. Analysisof Covariance (ANCOVA) • When a continuous covariate is included in an ANOVA we have ANCOVA (just to remind that a covariate is a continuous independent variable). The continuous covariates enter the model as regression variables. • To put in another way, ANCOVA blends ANOVA and regression. • ANCOVA is a type of inferential statistics modeling used in studying the differences in the mean values of the dependent variables. Those dependent variables relate to the impact of the controlled independent variables while taking into consideration the influence of the uncontrolled independent variables. 5. Statistical Significance (T-Test) (included in Content 2) The t-test compares two means (averages of 2 groups) and tells us if they are different from each other. The t-test also tells us how significant the differences are. The t-test is used when comparing two groups on a given dependent variable. BY: ROMMEL LUIS C. ISRAEL III 22
  • 23.
    • Correlation Analysis(included in Content 3) • Correlation analysis studies the strength of a relationship between two variables. It is useful when you want to find out if there are possible connections BY: ROMMEL LUIS C. ISRAEL III 23
  • 24.
    Upcoming topics: HypothesisTesting • T-test •ANOVA BY: ROMMEL LUIS C. ISRAEL III 24
  • 25.
    PREPARATION: •What is ahypothesis? •What is hypothesis testing? •Some terminologies in hypothesis testing •Parameter •Null hypothesis •Alternative hypothesis •One-tailed test •Two-tailed test •Test statistics •Steps in hypothesis testing. BY: ROMMEL LUIS C. ISRAEL III 25
  • 26.
    HYPOTHESIS A hypothesis isa proposition that is consistent with known data, but has been neither verified nor shown to be false. https://mathworld.wolfram.com/Hypothesis.html In mathematics, a hypothesis is an unproven statement which is supported by all the available data and by many weaker results. https://planetmath.org/hypothesis A statement that might be true, which might then be tested. https://www.mathsisfun.com/definitions/hypothesis.html BY: ROMMEL LUIS C. ISRAEL III 26
  • 27.
    EXAMPLES OF HYPOTHESES: •The number of pets in a household is unrelated to the number of people living in it. • If you get at least 6 hours of sleep, you will do better on tests than if you get less sleep. • "Students who eat breakfast will perform better on a math exam than students who do not eat breakfast." BY: ROMMEL LUIS C. ISRAEL III 27
  • 28.
    HYPOTHESIS AND MATHEMATICS • Sowhere does mathematics enter into this picture? In many ways, both obvious and subtle: • A good hypothesis needs to be clear, precisely stated and testable in some way. Creation of these clear hypotheses requires clear general mathematical thinking. • Hypothesistestinghttps://latrobe.libguides.com/maths/hypothesis-testing • Hypothesis testingis a systematic procedure for deciding whether the results of a research study support a particular theory which applies to a population. • Hypothesis testing uses sample data to evaluateahypothesisabouta population. • A hypothesis test assesses how unusual the result is, whether it is reasonable chance variation or whether the result is too extreme to be considered chance variation. BY: ROMMEL LUIS C. ISRAEL III 28
  • 29.
    LET’S LOOK ATTHE TERMINOLOGY THAT WE SHOULD BE AWARE OF IN HYPOTHESIS TESTING • 1. A Parameter is a summary description of a fixed characteristic or measure of the target population. A Parameter denotes the true value that would be obtained if a census rather than a sample were undertaken • Ex: • Mean (μ), • Variance (σ²), • Standard Deviation (σ), BY: ROMMEL LUIS C. ISRAEL III 29
  • 30.
    2. H0: Thenull hypothesis: It is a statement of no difference between sample means or proportions or no difference between a sample mean or proportion and a population mean or proportion. In other words, the difference equals 0. BY: ROMMEL LUIS C. ISRAEL III 30
  • 31.
    3. Ha: Thealternative hypothesis: It is a claim about the population that is contradictory to H0 and what we conclude when we reject H0. BY: ROMMEL LUIS C. ISRAEL III 31
  • 32.
    BY: ROMMEL LUISC. ISRAEL III 32
  • 33.
    BY: ROMMEL LUISC. ISRAEL III 33
  • 34.
    BY: ROMMEL LUISC. ISRAEL III 34
  • 35.
    BY: ROMMEL LUISC. ISRAEL III 35
  • 36.
    BY: ROMMEL LUISC. ISRAEL III 36
  • 37.
    4. ONE-TAILED TEST: A one-tailedtest is a statistical hypothesis test in which the critical area of a distribution is one- sided so that it is either greater than or less than a certain value, but not both. If the sample being tested falls into the one-sided critical area, the alternative hypothesis will be accepted instead of the null hypothesis. A one-tailed test is also known as a directional hypothesis or directional test. Critical Region: The critical region is the region of values that corresponds to the rejection of the null hypothesis at some chosen probability level. BY: ROMMEL LUIS C. ISRAEL III 37
  • 38.
    5. TWO-TAILED TEST: Atwo-tailed test is a method in which the critical area of a distribution is two- sided and tests whether a sample is greater than or less than a certain range of values. If the sample being tested falls into either of the critical areas, the alternative hypothesis is accepted instead of the null hypothesis. By convention, two-tailed tests are used to determine significance at the 5% level, meaning each side of the distribution is cut at 2.5% BY: ROMMEL LUIS C. ISRAEL III 38
  • 39.
    BY: ROMMEL LUISC. ISRAEL III 39
  • 40.
    • 6. TestStatistic: • The test statistic measures how close the sample has come to the null hypothesis. Its observed value changes randomly from one random sample to a different sample. A test statistic contains information about the data that is relevant for deciding whether to reject the null hypothesis or not. • Different hypothesis tests use different test statistics based on the probability model assumed in the null hypothesis. Common tests and their test statistics include: BY: ROMMEL LUIS C. ISRAEL III 40
  • 41.
    TEST STATISTICS HTTPS://WWW.ANALYTICSSTEPS.COM/BLOGS/WHAT-ARE-DIFFERENCES-BETWEEN-Z-TEST-AND-T-TEST • Assumingthat you already have formulated the null and alternative hypothesis • Example: • H0= 2000, and • H1> 2000. • Here the observed mean is >2000, and expected population mean is 2000. • Next step would be to run test statistics that compare the value of both means. • Here, the test statistic is a numerical summary of the data which is compared to what would be expected under null hypothesis. • It can take many forms such a s t-test (usually used when the dataset is small) or z-test etc (preferred when the dataset is large), or ANOVA test, etc. BY: ROMMEL LUIS C. ISRAEL III 41
  • 42.
    • Level ofsignificance is the amount of some percentage that is required to reject a null hypothesis when it is true, it is denoted by 𝝰 (alpha). In general, alpha is taken as 1%, 5% and 10%. • Confidence level: (1-𝝰) is accounted as confidence level in which null hypothesis exists when it is true. • For instance, assuming the level of significance as 0.05, then smaller the p-value (generally p≤ 0.05), rejecting the null hypothesis. As this is a substantial confirmation against the null hypothesis that proves it is invalid. • Also, if the p-value is greater than 0.05, accepting the null hypothesis. As this gives evidence that alternate hypothesis is weak therefore null hypothesis can be accepted. The p-value is only a piece of information that signifies the null hypothesis is valid or not. Ideally, following rules are used in determining whether to support or reject the null hypothesis; •If p > 0.10 : the observed difference is “not significant” •If p ≤ 0.10 : the observed difference is “marginally significant” •If p ≤ 0.05 : the observed difference is “significant” BY: ROMMEL LUIS C. ISRAEL III 42
  • 43.
    WHAT IS Z-TEST? Z-testis the statistical test, • used to analyze whether two population means are different or not when the variances are known and the sample size is large. • This test statistic is assumed to have a normal distribution, and standard deviation must be known to perform an accurate z-test. • A z-statistic, or z-score, is a number representing the value’s relationship to the mean of a group of values, it is measured with population parameters such as population standard deviation and used to validate a hypothesis. BY: ROMMEL LUIS C. ISRAEL III 43
  • 44.
    BY: ROMMEL LUISC. ISRAEL III 44
  • 45.
    BY: ROMMEL LUISC. ISRAEL III 45
  • 46.
    WHAT IS T-TEST? Inorder to know how significant the difference between two groups are, T- test is used, basically it tells that difference (measured in means) between two separate groups could have occurred by chance. This test assumes to have a normal distribution while based on t- distribution, and population parameters such as mean, or standard deviation are unknown. The ratio between the difference between two groups and the difference within the group is known as T-score. Greater is the t- score, more is the difference between groups, and smaller is the t-score, more similarities are there among groups. For example, a t-score value of 2 indicates that the groups are two times as different from each other as they are with each other. BY: ROMMEL LUIS C. ISRAEL III 46
  • 47.
    • Also, afterrunning t-test, if the larger t-value is obtained, it is highly likely that the outcomes are more repeatable, such that • A larger t-score states that groups are different • A smaller t-score states that groups are similar. • Mainly, there are three types of t-test: • An Independent Sample t-test, compare the means for two groups. • APaired Sample t-test, compare means from same group but at different times, such as six months apart. • AOne Sample t-test, test a mean of a group against the known mean. BY: ROMMEL LUIS C. ISRAEL III 47
  • 48.
    BY: ROMMEL LUISC. ISRAEL III 48
  • 49.
    BY: ROMMEL LUISC. ISRAEL III 49
  • 50.
    BY: ROMMEL LUISC. ISRAEL III 50
  • 51.
    BY: ROMMEL LUISC. ISRAEL III 51
  • 52.
    BY: ROMMEL LUISC. ISRAEL III 52
  • 53.
    BY: ROMMEL LUISC. ISRAEL III 53
  • 54.
    BY: ROMMEL LUISC. ISRAEL III 54
  • 55.
  • 56.
    BY: ROMMEL LUISC. ISRAEL III 56
  • 57.
    BY: ROMMEL LUISC. ISRAEL III 57
  • 58.
    BY: ROMMEL LUISC. ISRAEL III 58
  • 59.
    BY: ROMMEL LUISC. ISRAEL III 59
  • 60.
    BY: ROMMEL LUISC. ISRAEL III 60