Basics Of Data Analysis
Presented By: Ankur Jain
Swati
Biraj Choudhary
Abhijeet
Prateek Rajpal
Data Analysis
• Turning raw data into useful information.
• Purpose is to provide answers to questions being
asked at a program site or research questions.
• Even the greatest amount and best quality data
mean nothing if not properly analyzed—or if not
analyzed at all.
• Analysis is looking at the data in light of the
questions you need to answer:
– How would you analyze data to determine: “Is my
program/research meeting its objectives?”
Answering Programmatic
Questions
• Question: Is my program meeting its objectives?
• Analysis: Compare program targets and actual
program performance to learn how far you are
from target.
• Interpretation: Why you have or have not
achieved the target and what this means for your
program.
• May require more information.
Data Preparation Process
Prepare preliminary plan of data analysis

Check questionnaires

Edit

Code

Transcribe

Clean data

Select a data analysis strategy
Types of Statistical Analyses Used in
Marketing Research
• Data summarization: the process of describing a
data matrix by computing a small number of
measures that characterize the data set.
• Four functions of data summarization:
– Summarizes the data
– Applies understandable conceptualizations
– Communicates underlying patterns
– Generalizes sample findings to the population
Coding
• Coding – process of translating information gathered
from questionnaires or other sources into something
that can be analyzed.
• Involves assigning a value to the information given—
often value is given a label.
• Coding can make data more consistent:
– Example: Question = Sex
– Answers = Male, Female, M, or F
– Coding will avoid inconsistencies
Coding System
• Common coding systems (code and label) for variables:
– 0=No 1=Yes
(1 = value assigned, Yes= label of value)
– OR: 1=No 2=Yes
• When you assign a value you must also make it clear what
that value means.
– In first example above, 1=Yes but in second example 1=No
– As long as it is clear how the data are coded, either is fine
• You can make it clear by creating a data dictionary to
accompany the dataset.
Coding: Dummy Variable
• A “dummy” variable is any variable that is coded to
have 2 levels (yes/no, male/female, etc.)
• Dummy variables may be used to represent more
complicated variables
– Example: No. of cigarettes smoked per week--
answers total 75 different responses ranging from
0 cigarettes to 3 packs per week.
– Can be recoded as a dummy variable:
1=smokes (at all) 0=non-smoker
• This type of coding is useful in later stages of
analysis.
 Attaching Labels to values:
• Many analysis software packages allow you to attach a label
to the variable values
Example: Label 0’s as male and 1’s as female
• Makes reading data output easier:
Without label: Variable SEX Frequency Percent
0 21 60%
1 14 40%
With label: Variable SEX Frequency Percent
Male 21 60%
Female 14 40%
Coding – Original Variables
• Coding process is similar with other categorical
variables.
• Example: Variable EDUCATION, possible coding:
0 = Did not graduate from high school
1 = High school graduate
2 = Some college or post-high school education
3 = College graduate
• Could be coded in reverse order (0=college graduate,
3=did not graduate high school).
• For this ordinal categorical variable we want to be
consistent with numbering because the value of the
code assigned has significance.
• Example of bad coding:
0 = Some college or post-high school education
1 = High school graduate
2 = College graduate
3 = Did not graduate from high school
• Data has an inherent order but coding does not
follow that order—NOT appropriate coding for an
ordinal categorical variable.
Basic Terminology and
Concepts
• Statistical terms
– Ratio
– Mean
– Median
– Mode
– Frequency Distribution
– Standard Deviation
Conclusion
• Purpose of analysis is to provide answers to
programmatic questions.
• Data analysis describe the sample/target population.
• Analysis of a data is a process of inspecting, cleaning,
transforming and modeling data with a goal of
highlighting useful information, suggesting
conclusion and supporting decision making.
Thank You

Basics of Data Analysis

  • 1.
    Basics Of DataAnalysis Presented By: Ankur Jain Swati Biraj Choudhary Abhijeet Prateek Rajpal
  • 2.
    Data Analysis • Turningraw data into useful information. • Purpose is to provide answers to questions being asked at a program site or research questions. • Even the greatest amount and best quality data mean nothing if not properly analyzed—or if not analyzed at all. • Analysis is looking at the data in light of the questions you need to answer: – How would you analyze data to determine: “Is my program/research meeting its objectives?”
  • 3.
    Answering Programmatic Questions • Question:Is my program meeting its objectives? • Analysis: Compare program targets and actual program performance to learn how far you are from target. • Interpretation: Why you have or have not achieved the target and what this means for your program. • May require more information.
  • 4.
    Data Preparation Process Preparepreliminary plan of data analysis  Check questionnaires  Edit  Code  Transcribe  Clean data  Select a data analysis strategy
  • 5.
    Types of StatisticalAnalyses Used in Marketing Research • Data summarization: the process of describing a data matrix by computing a small number of measures that characterize the data set. • Four functions of data summarization: – Summarizes the data – Applies understandable conceptualizations – Communicates underlying patterns – Generalizes sample findings to the population
  • 6.
    Coding • Coding –process of translating information gathered from questionnaires or other sources into something that can be analyzed. • Involves assigning a value to the information given— often value is given a label. • Coding can make data more consistent: – Example: Question = Sex – Answers = Male, Female, M, or F – Coding will avoid inconsistencies
  • 7.
    Coding System • Commoncoding systems (code and label) for variables: – 0=No 1=Yes (1 = value assigned, Yes= label of value) – OR: 1=No 2=Yes • When you assign a value you must also make it clear what that value means. – In first example above, 1=Yes but in second example 1=No – As long as it is clear how the data are coded, either is fine • You can make it clear by creating a data dictionary to accompany the dataset.
  • 8.
    Coding: Dummy Variable •A “dummy” variable is any variable that is coded to have 2 levels (yes/no, male/female, etc.) • Dummy variables may be used to represent more complicated variables – Example: No. of cigarettes smoked per week-- answers total 75 different responses ranging from 0 cigarettes to 3 packs per week. – Can be recoded as a dummy variable: 1=smokes (at all) 0=non-smoker • This type of coding is useful in later stages of analysis.
  • 9.
     Attaching Labelsto values: • Many analysis software packages allow you to attach a label to the variable values Example: Label 0’s as male and 1’s as female • Makes reading data output easier: Without label: Variable SEX Frequency Percent 0 21 60% 1 14 40% With label: Variable SEX Frequency Percent Male 21 60% Female 14 40%
  • 10.
    Coding – OriginalVariables • Coding process is similar with other categorical variables. • Example: Variable EDUCATION, possible coding: 0 = Did not graduate from high school 1 = High school graduate 2 = Some college or post-high school education 3 = College graduate • Could be coded in reverse order (0=college graduate, 3=did not graduate high school). • For this ordinal categorical variable we want to be consistent with numbering because the value of the code assigned has significance.
  • 11.
    • Example ofbad coding: 0 = Some college or post-high school education 1 = High school graduate 2 = College graduate 3 = Did not graduate from high school • Data has an inherent order but coding does not follow that order—NOT appropriate coding for an ordinal categorical variable.
  • 12.
    Basic Terminology and Concepts •Statistical terms – Ratio – Mean – Median – Mode – Frequency Distribution – Standard Deviation
  • 13.
    Conclusion • Purpose ofanalysis is to provide answers to programmatic questions. • Data analysis describe the sample/target population. • Analysis of a data is a process of inspecting, cleaning, transforming and modeling data with a goal of highlighting useful information, suggesting conclusion and supporting decision making.
  • 14.