‫جامعة‬
‫سويف‬ ‫بني‬
Probability and Statistics for Engineers
Lecture 1
Introduction
Data Organization
Chapter 1: Lesson 1
Definition:
• Statistics:
A collection of methods for planning
experiments, obtaining data, and then
organizing, summarizing, presenting,
analyzing, interpreting, and drawing
conclusions.
3
The field of statistics divided into two parts:
1. Descriptive statistics:
Describe data that have been collected. Commonly used
descriptive statistics include frequency counts, ranges
(high and low scores or values), means, modes, median
scores, and standard deviations.
2. Inferential Statistics :
Generalizing from samples to populations using
probabilities. Performing hypothesis testing, determining
relationships between variables, and making predictions.
4
Definitions:
• Data:
Are observations (such as measurements,
genders, survey responses) that have been
collected.
• Variable:
Is a characteristic or attribute that can
assume (take) different values.
• Random Variable: A variable whose values
are determined by chance 5
• Population:
Is the complete collection of all elements
(scores, people, measurements, and so on)
to be studied
• Sample:
A subgroup or subset of the population.
• Parameter: Characteristic or measure
obtained from a population.
• Statistic: Characteristic or measure
obtained from a sample. 6
7
Table below explains some parameters
and statistics
Measure Population Sample
Size N n
Mean µ
Variance σ2
S2
Standard Deviation σ S
X
8
Populations and Samples:
Population
(Some Unknown
Parameters)
Example: TU
Students (Height
Mean)
N=Population Size
Sample = Observations
(We calculate Some
Statistics)
Example: 20 Students
from TU (Sample Mean)
n = Sample Size
9
Let X1,X2,…,XN be the population
values (in general, they are unknown)
Let X1,X2,…,Xn be the sample values
(these values are known)
Statistics obtained from the sample
are used to estimate (approximate) the
parameters of the population. 10
Types of Data
Key Terms
• Categorical variables
• Quantity variables
• Nominal variables
• Ordinal Variables
• Binary data.
• Discrete and continuous data.
• Interval and ratio variables
• Qualitative and Quantitative traits/
characteristics of data.
12
Categorical Data
• The objects being studied are
grouped into categories based
on some qualitative trait.
• The resulting data are merely
labels or categories.
13
Examples: Categorical Data
• Eye color
blue, brown, hazel, green, etc.
• Gender:
Male , Female.
• Smoking status
smoker, non-smoker
• Attitudes towards the death penalty
Strongly disagree, disagree, neutral, agree,
strongly agree.
14
Categorical data classified as
Nominal, Ordinal, and/or Binary
Categorical data
Not binary
Binary
Ordinal
data
Nominal
data
Binary Not binary
15
Nominal Data
• A type of categorical data in
which objects fall into unordered
categories.
16
Examples: Nominal Data
• Gender
– Male . Female .
• Nationality
– French , Japanese, Egyptian, Chinese,… etc
• Smoking status
– smoker, non-smoker
17
Ordinal Data
•A type of categorical
data in which order is
important.
18
Examples: Ordinal Data
• Class of degree
– 1st
class, 2nd
, 3rd class, fail
• Degree of illness
– none, mild, moderate, acute, chronic.
• Opinion of students about stats classes
– Very unhappy, unhappy, neutral, happy,
ecstatic!
19
Binary Data
• A type of categorical data in which there are
only two categories.
• Binary data can either be nominal or ordinal.
• Smoking status- smoker, non-smoker
• Attendance- present, absent
• Class of mark- pass, fail.
• Status of student- undergraduate,
postgraduate.
20
Quantity Data
• The objects being studied are ‘measured’
based on some quantitative trait.
• The resulting data are set of numbers.
21
Examples: quantity Data
• Pulse rate
• Height
• Age
• Exam marks
• Time to complete a statistics test
• Family Size
22
Quantity data can be classified as
‘Discrete or Continuous’
Quantity
data
Continuous
Discrete
23
Discrete Data
If the values / observations belonging to it may take
only specific values[(integer) .
There are gaps between the possible values).
It does not containing fraction.
Implies counting.
24
Continuous Data
If the values / observations belonging to it may
take on any value within a finite or infinite
interval (real).
Can contain fraction.
Implies Measurement.
25
Discrete data -- Gaps between possible values- count
0 1 2 3 4 5 6 7
Continuous data
no gaps between possible values- measure
0 1000
26
Examples: Discrete Data
• Number of children in a family
• Number of students passing a stats exam
• Number of crimes reported to the police
• Number of cars sold in a day.
Generally, discrete data are counts.
We would not expect to find 2.2 children in a family or
88.5 students passing an exam or 127.2 crimes being
reported to the police or half a bicycle being sold in
one day.
27
Examples: Continuous data
• Weight
• Height
• Time to run 500 metres
• Age
‘Generally, continuous data come from
measurements.
(any value within an interval is possible with a fine enough
measuring device.).
28
Variables
Category Quantity
Nominal Ordinal Discrete
(counting)
Continuous
(measuring)
Ordered
categories Ranks.
Relationships between Variables.
29
Organization and Presentation
of Data
Introduction
• After the data have been collected, the main
tasks a statistician must accomplish are the
organization and presentation of the data
. The organization must be done in a meaningful
way and the presentation should be such that an
interested reader of the study can understand
the data distribution. 31
Definitions:
• Raw data:
Data collected in original form (before it
has been organized).
• Example :
• The following data is raw data.
32
Class: Is quantitative or qualitative category
in which the raw data is placed .
must satisfy the following conditions:
1. There is usually between 5 and 20
2. No. of classes usually between (5 and 15) Select No. of classes = 5
3. classes; Class interval = range/Classes No. =17/6
4. The classes must be mutually exclusive;
5. The classes must be exhaustive.
Definitions:
33
Frequency Distribution
• The researches organizes the raw data by
using frequency distribution.
• The frequency is the number of values in a
specific class of data.
• A frequency distribution is the organizing of
raw data in table form, using classes and
frequencies. 34
Frequency Distribution
• For the first data set, a frequency distribution
is shown as follow:
Class limits Tally Frequency
1-3 ///// / 6
4-6 ///// ///// / 11
7-9 //// 4
10-12 / 1
13-15 //// 4
16-18 //// 4
35
Types of Frequency Distribution
• There are three basic types of frequency
distribution:
– Categorical
– Ungrouped
– Grouped
36
Categorical Frequency Distribution
• The categorical frequency distribution is used
for data that can be placed in specific
categories, such as nominal or ordinal data.
• For example, data such as political affiliations,
religion affiliations, or major field of study
would use categorical frequency distribution.
37
Example
• The blood type of different students:
38
Example
Class Tally Frequency
A ///// 5
B ///// // 7
O ///// //// 9
AB //// 4
Total 25
39
Ungrouped Frequency Distribution
• When the range of data is small, the data must
be grouped into classes that are not more than
one unit in width.
8 9 8 8 4
11 10 9 9 5
8 7 8 7 7
7 5 7 8 4
9 8 8 5 6
Example
40
• The range in the example is
R = highest value – lowest value
11 – 4 = 7
• Since the range is small, classes
consisting of single data value can be
used.
Example Cont.
41
Class Tally Frequency
4 // 2
5 /// 3
6 / 1
7 ///// 5
8 ///// // 7
9 //// 4
10 // 2
11 / 1
Example.
42
Grouped Frequency Distribution
• When the range of the data is large, the data
must be grouped into classes that are more
than one unit in width.
In this case we have additional conditions for the
classes:
1. The class width should be preferably an odd
number;
2. The classes must be equal in width.
3. The classes must be continuous.
43
Example
44
Class limits Tally Frequency
1-3 ///// ///// 10
4-6 ///// ///// //// 14
7-9 ///// ///// 10
10-12 //// / 6
13-15 //// 5
16-18 //// 5
• In this distribution, the values 1 and 3 of the first
class are called “class limits”.
• 1 is the “lower class limit” and 3 is the “upper
class limit.”
Example
45
1.Frequency Table
• The researches organizes the raw data by using frequency
distribution.
• The frequency is the number of values in a specific class of
data.
• The frequency of a data value is the number of times it
occurs. A frequency table shows the frequency of each
data value. If the data is divided into intervals, the
table shows the frequency of each interval.
Example 1: Making a Frequency
Table
 n : total of frequency
 The interval must equal width.
Use for qualitative and discrete data.
You should cover all values and categories.
Example 2: Making a Frequency Table
The numbers of students enrolled in Western
Civilization classes at a university are given below.
Use the data to make a frequency table with
intervals.
12, 22, 18, 9, 25, 31, 28, 19, 22, 27, 32, 14
Step 1 Identify the least and greatest values.
The least value is 9. The greatest value is 32.
Example 2 Continued
Number
Enrolled
Frequency
1 – 10 1
11 – 20 4
21 – 30 5
31 – 40 2
Enrollment in Western
Civilization Classes
Step 2 Divide the data into equal intervals.
For this data set, use an
interval of 10.
Step 3 List the intervals in
the first column of the
table. Count the number of
data values in each interval
and list the count in the last
column. Give the table a
title.
Example:3
The number of days of Maria’s last 15 vacations are
listed below. Use the data to make a frequency table
with intervals.
4, 8, 6, 7, 5, 4, 10, 6, 7, 14, 12, 8, 10, 15, 12
Step 1 Identify the least and greatest values.
The least value is 4. The greatest value is 15.
Step 2 Divide the data into equal intervals.
For this data set use an interval of 3.
Step 3 List the intervals in the first column of the
table. Count the number of data values in each
interval and list the count in the last column. Give
the table a title.
Example3 Continued
Interval Frequency
4 – 6 5
7 – 9 4
10 – 12 4
13 – 15 2
Number of Vacation Days
Cumulative ‫التراكمى‬ Frequency
• The cumulative frequency is the sum of
the frequencies accumulated up to the
upper boundary of a class in the
distribution.
• They are used to visually represent how
many values are below a certain upper
class boundary.
52
53
Example of Cumulative Frequency
Distribution
Class Frequency
Cumulative
frequency
1-4 6 6
5-8 2 8
9-12 5 13
12-16 3 16
54
Homework 1
For the STAT course it is found the degrees of the students are as follow
1. What type of Data is represented?
2. Calculate range of data
3. Use classes to construct the frequency table
4. What is the most common range of degrees?
5. Calculate the cumulative frequency table

Lesson1lecture 1 in Data Definitions.pptx

  • 1.
  • 2.
  • 3.
    Definition: • Statistics: A collectionof methods for planning experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions. 3
  • 4.
    The field ofstatistics divided into two parts: 1. Descriptive statistics: Describe data that have been collected. Commonly used descriptive statistics include frequency counts, ranges (high and low scores or values), means, modes, median scores, and standard deviations. 2. Inferential Statistics : Generalizing from samples to populations using probabilities. Performing hypothesis testing, determining relationships between variables, and making predictions. 4
  • 5.
    Definitions: • Data: Are observations(such as measurements, genders, survey responses) that have been collected. • Variable: Is a characteristic or attribute that can assume (take) different values. • Random Variable: A variable whose values are determined by chance 5
  • 6.
    • Population: Is thecomplete collection of all elements (scores, people, measurements, and so on) to be studied • Sample: A subgroup or subset of the population. • Parameter: Characteristic or measure obtained from a population. • Statistic: Characteristic or measure obtained from a sample. 6
  • 7.
  • 8.
    Table below explainssome parameters and statistics Measure Population Sample Size N n Mean µ Variance σ2 S2 Standard Deviation σ S X 8
  • 9.
    Populations and Samples: Population (SomeUnknown Parameters) Example: TU Students (Height Mean) N=Population Size Sample = Observations (We calculate Some Statistics) Example: 20 Students from TU (Sample Mean) n = Sample Size 9
  • 10.
    Let X1,X2,…,XN bethe population values (in general, they are unknown) Let X1,X2,…,Xn be the sample values (these values are known) Statistics obtained from the sample are used to estimate (approximate) the parameters of the population. 10
  • 11.
  • 12.
    Key Terms • Categoricalvariables • Quantity variables • Nominal variables • Ordinal Variables • Binary data. • Discrete and continuous data. • Interval and ratio variables • Qualitative and Quantitative traits/ characteristics of data. 12
  • 13.
    Categorical Data • Theobjects being studied are grouped into categories based on some qualitative trait. • The resulting data are merely labels or categories. 13
  • 14.
    Examples: Categorical Data •Eye color blue, brown, hazel, green, etc. • Gender: Male , Female. • Smoking status smoker, non-smoker • Attitudes towards the death penalty Strongly disagree, disagree, neutral, agree, strongly agree. 14
  • 15.
    Categorical data classifiedas Nominal, Ordinal, and/or Binary Categorical data Not binary Binary Ordinal data Nominal data Binary Not binary 15
  • 16.
    Nominal Data • Atype of categorical data in which objects fall into unordered categories. 16
  • 17.
    Examples: Nominal Data •Gender – Male . Female . • Nationality – French , Japanese, Egyptian, Chinese,… etc • Smoking status – smoker, non-smoker 17
  • 18.
    Ordinal Data •A typeof categorical data in which order is important. 18
  • 19.
    Examples: Ordinal Data •Class of degree – 1st class, 2nd , 3rd class, fail • Degree of illness – none, mild, moderate, acute, chronic. • Opinion of students about stats classes – Very unhappy, unhappy, neutral, happy, ecstatic! 19
  • 20.
    Binary Data • Atype of categorical data in which there are only two categories. • Binary data can either be nominal or ordinal. • Smoking status- smoker, non-smoker • Attendance- present, absent • Class of mark- pass, fail. • Status of student- undergraduate, postgraduate. 20
  • 21.
    Quantity Data • Theobjects being studied are ‘measured’ based on some quantitative trait. • The resulting data are set of numbers. 21
  • 22.
    Examples: quantity Data •Pulse rate • Height • Age • Exam marks • Time to complete a statistics test • Family Size 22
  • 23.
    Quantity data canbe classified as ‘Discrete or Continuous’ Quantity data Continuous Discrete 23
  • 24.
    Discrete Data If thevalues / observations belonging to it may take only specific values[(integer) . There are gaps between the possible values). It does not containing fraction. Implies counting. 24
  • 25.
    Continuous Data If thevalues / observations belonging to it may take on any value within a finite or infinite interval (real). Can contain fraction. Implies Measurement. 25
  • 26.
    Discrete data --Gaps between possible values- count 0 1 2 3 4 5 6 7 Continuous data no gaps between possible values- measure 0 1000 26
  • 27.
    Examples: Discrete Data •Number of children in a family • Number of students passing a stats exam • Number of crimes reported to the police • Number of cars sold in a day. Generally, discrete data are counts. We would not expect to find 2.2 children in a family or 88.5 students passing an exam or 127.2 crimes being reported to the police or half a bicycle being sold in one day. 27
  • 28.
    Examples: Continuous data •Weight • Height • Time to run 500 metres • Age ‘Generally, continuous data come from measurements. (any value within an interval is possible with a fine enough measuring device.). 28
  • 29.
    Variables Category Quantity Nominal OrdinalDiscrete (counting) Continuous (measuring) Ordered categories Ranks. Relationships between Variables. 29
  • 30.
  • 31.
    Introduction • After thedata have been collected, the main tasks a statistician must accomplish are the organization and presentation of the data . The organization must be done in a meaningful way and the presentation should be such that an interested reader of the study can understand the data distribution. 31
  • 32.
    Definitions: • Raw data: Datacollected in original form (before it has been organized). • Example : • The following data is raw data. 32
  • 33.
    Class: Is quantitativeor qualitative category in which the raw data is placed . must satisfy the following conditions: 1. There is usually between 5 and 20 2. No. of classes usually between (5 and 15) Select No. of classes = 5 3. classes; Class interval = range/Classes No. =17/6 4. The classes must be mutually exclusive; 5. The classes must be exhaustive. Definitions: 33
  • 34.
    Frequency Distribution • Theresearches organizes the raw data by using frequency distribution. • The frequency is the number of values in a specific class of data. • A frequency distribution is the organizing of raw data in table form, using classes and frequencies. 34
  • 35.
    Frequency Distribution • Forthe first data set, a frequency distribution is shown as follow: Class limits Tally Frequency 1-3 ///// / 6 4-6 ///// ///// / 11 7-9 //// 4 10-12 / 1 13-15 //// 4 16-18 //// 4 35
  • 36.
    Types of FrequencyDistribution • There are three basic types of frequency distribution: – Categorical – Ungrouped – Grouped 36
  • 37.
    Categorical Frequency Distribution •The categorical frequency distribution is used for data that can be placed in specific categories, such as nominal or ordinal data. • For example, data such as political affiliations, religion affiliations, or major field of study would use categorical frequency distribution. 37
  • 38.
    Example • The bloodtype of different students: 38
  • 39.
    Example Class Tally Frequency A///// 5 B ///// // 7 O ///// //// 9 AB //// 4 Total 25 39
  • 40.
    Ungrouped Frequency Distribution •When the range of data is small, the data must be grouped into classes that are not more than one unit in width. 8 9 8 8 4 11 10 9 9 5 8 7 8 7 7 7 5 7 8 4 9 8 8 5 6 Example 40
  • 41.
    • The rangein the example is R = highest value – lowest value 11 – 4 = 7 • Since the range is small, classes consisting of single data value can be used. Example Cont. 41
  • 42.
    Class Tally Frequency 4// 2 5 /// 3 6 / 1 7 ///// 5 8 ///// // 7 9 //// 4 10 // 2 11 / 1 Example. 42
  • 43.
    Grouped Frequency Distribution •When the range of the data is large, the data must be grouped into classes that are more than one unit in width. In this case we have additional conditions for the classes: 1. The class width should be preferably an odd number; 2. The classes must be equal in width. 3. The classes must be continuous. 43
  • 44.
  • 45.
    Class limits TallyFrequency 1-3 ///// ///// 10 4-6 ///// ///// //// 14 7-9 ///// ///// 10 10-12 //// / 6 13-15 //// 5 16-18 //// 5 • In this distribution, the values 1 and 3 of the first class are called “class limits”. • 1 is the “lower class limit” and 3 is the “upper class limit.” Example 45
  • 46.
    1.Frequency Table • Theresearches organizes the raw data by using frequency distribution. • The frequency is the number of values in a specific class of data. • The frequency of a data value is the number of times it occurs. A frequency table shows the frequency of each data value. If the data is divided into intervals, the table shows the frequency of each interval.
  • 47.
    Example 1: Makinga Frequency Table  n : total of frequency  The interval must equal width. Use for qualitative and discrete data. You should cover all values and categories.
  • 48.
    Example 2: Makinga Frequency Table The numbers of students enrolled in Western Civilization classes at a university are given below. Use the data to make a frequency table with intervals. 12, 22, 18, 9, 25, 31, 28, 19, 22, 27, 32, 14 Step 1 Identify the least and greatest values. The least value is 9. The greatest value is 32.
  • 49.
    Example 2 Continued Number Enrolled Frequency 1– 10 1 11 – 20 4 21 – 30 5 31 – 40 2 Enrollment in Western Civilization Classes Step 2 Divide the data into equal intervals. For this data set, use an interval of 10. Step 3 List the intervals in the first column of the table. Count the number of data values in each interval and list the count in the last column. Give the table a title.
  • 50.
    Example:3 The number ofdays of Maria’s last 15 vacations are listed below. Use the data to make a frequency table with intervals. 4, 8, 6, 7, 5, 4, 10, 6, 7, 14, 12, 8, 10, 15, 12 Step 1 Identify the least and greatest values. The least value is 4. The greatest value is 15. Step 2 Divide the data into equal intervals. For this data set use an interval of 3.
  • 51.
    Step 3 Listthe intervals in the first column of the table. Count the number of data values in each interval and list the count in the last column. Give the table a title. Example3 Continued Interval Frequency 4 – 6 5 7 – 9 4 10 – 12 4 13 – 15 2 Number of Vacation Days
  • 52.
    Cumulative ‫التراكمى‬ Frequency •The cumulative frequency is the sum of the frequencies accumulated up to the upper boundary of a class in the distribution. • They are used to visually represent how many values are below a certain upper class boundary. 52
  • 53.
    53 Example of CumulativeFrequency Distribution Class Frequency Cumulative frequency 1-4 6 6 5-8 2 8 9-12 5 13 12-16 3 16
  • 54.
    54 Homework 1 For theSTAT course it is found the degrees of the students are as follow 1. What type of Data is represented? 2. Calculate range of data 3. Use classes to construct the frequency table 4. What is the most common range of degrees? 5. Calculate the cumulative frequency table

Editor's Notes

  • #3 Also is a set of tools used to collect , summarize ,organize ,present , analyze and interpret data
  • #6 A census: Is the collection of data from every member of the population