DATA VISUALIZATION IN
EXPLORATORY DATA ANALYSIS
CS- E4450: EXPLORATIVE INFORMATION VISUALIZATION
Eva Durall Gazulla
Aalto University, Nov. 2018
Time Activity
12:15 – 12:30 Introduction: Data visualization in Exploratory Data Analysis (EDA)
12:30 – 13:00 Activity 1
13:00– 13:10 Break
13:10 – 14:00 Activity 2
SESSION STRUCTURE
- Introduction -
Data visualization in
Exploratory Data Analysis
• Statistical tradition proposed by J. Tukey
• Focus on discovering patterns to foster hypothesis development and refinement
• Complementary to Confirmatory Data Analysis
About Exploratory Data Analysis
1 INTRODUCTION
EDA can be considered as an attitude toward the data.
Emphasis on:
- General understanding of the data (What is going on?)
- Graphic representations of the data
- Tentative model building and hypothesis generation
- Iterations
- Flexibility of methods
About Exploratory Data Analysis
1 INTRODUCTION
”The role of the data analyst is to listen to the data in as many ways as
possible until a plausible "story" of the data is apparent”
Behrens, 1997
About Exploratory Data Analysis
1 INTRODUCTION
Data visualization is a tool for defining relevant research questions.
Data visualization a powerful tool because:
- Synthesizes complex information
- Reduces cognitive load
- Offloads short-term memory
Data visualization & EDA
1 INTRODUCTION
Perceptual hierarchy of visual cues
1 INTRODUCTION
Generic Accurate
Color hue
Volume
Area
Color
intensity
Slope
Angle
Length
Length
aligned
Source: https://paldhous.github.io/ucb/2016/dataviz/week2.html#
EDA main techniques
1 INTRODUCTION
EXPLORING DISTRIBUTIONS
Focus on revealing the general pattern and individual deviations.
* Importance on identifying the median
INSPECTING INTERRELATIONS BETWEEN VARIABLES
Focus on revealing the general pattern and the extreme deviations by visualizing
interrelations between 2 or more variables.
Supports the recall of contextual knowledge for explaining the deviations.
Datavis for exploring distributions
1 INTRODUCTION
The median
• The "middle" of a sorted list of numbers.
• Facilitates to see a centre and detect extreme values.
• To find the Median, place the numbers in value order and find the middle number.
1, 3, 7, 13, 17
Datavis for exploring distributions
1 INTRODUCTION
Box Plot (Box and Whisker Plot)
Visualizes the distribution of the data through their
quartiles.
Help to make the following observations:
• Key values: the average, median 25th percentile etc.
• Outliers and their values.
• Symmetry of the data.
• If the data is skewed and if so, in what direction.
Source: https://datavizcatalogue.com/methods/box_plot.html
1 INTRODUCTION
Source: https://datavizcatalogue.com/methods/scatterplot.html
Datavis for inspecting interrelations
Scatter Plot
Visualizes if a relationship or correlation between the two
variables exists.
Types of correlation that can be observed:
• positive (values increase together),
• Negative (one value decreases as the other increases),
• Null (no correlation),
• Linear,
• Exponential
• U-shaped
Correlation strength: strong, weak, none
- The case -
SySTEM 2020: Connecting
science learning outside the
classroom map
”Obtaining a quality education is the foundation to creating sustainable development.
In addition to improving quality of life, access to inclusive education can help equip
locals with the tools required to develop innovative solutions to the world’s greatest
problems.”
https://www.un.org/sustainabledevelopment/sustainable-development-goals
2 THE CASE: EXPLORING EQUITY IN SCIENCE EDUCATION
GOAL 4: QUALITY EDUCATION
Learning about Science, Technology,
Engineering, Arts and Mathematics (STEAM) in
acontexts that are outside formal education.
Such contexts can be science museums,
makerspaces, science centers, public libraries,
hacklabs…
2 THE CASE: EXPLORING EQUITY IN SCIENCE EDUCATION
Science Learning Outside the Classroom
Aims:
• Gain understanding on science education informal contexts.
• Identify calls for action to support equity.
2 THE CASE: EXPLORING EQUITY IN SCIENCE EDUCATION
SYSTEM 2020: Connecting science
learning outside the classroom
Research and innovation project aiming to promote science
learning outside the classroom at European level.
2 THE CASE: EXPLORING EQUITY IN SCIENCE EDUCATION
Source: https://laout.org/community-equity-event/
The concept of equity is strongly connected to fairness and social justice. Equity in education
means ensuring that everyone has access and opportunities to learn and perform successfully.
Indicators of equity:
• Access: the means and opportunity to enter non-formal science education contexts.
• Diversity: the representation of various identities and differences.
• Inclusion: the active engagement of the contributions and participation of all people.
2 THE CASE: EXPLORING EQUITY IN SCIENCE EDUCATION
Equity
- Activity 1 -
Analysis of an interactive
data visualization
SySTEM 2020 map
Open database with over 2,200 entries providing information about
organisations and activities focused on science learning outside the classroom.
Access to the .cvs files:
https://form.system2020.education/apidoc
2 THE CASE: EXPLORING EQUITY IN SCIENCE EDUCATION
3 ACTIVITY 1: ANALYSIS OF INTERACTIVE VISUALIZATIONS
Source: https://system2020.education/the-map
ACTIVITY 1: ANALYSIS OF INTERACTIVE VISUALIZATIONS
Activity Duration
Make groups of 4 people
Access: https://system2020.education/the-map
Explore SySTEM 2020 map data visualizations.
Select a combination of filters.
5 min.
Analysis of the data visualization analysis
Follow the guidelines
15 min.
Open discussion 10 min.
Guidelines to analyse the data visualization:
• What is the visualization about? (specify the parameters you have used)
• What visual cues are employed? To what extent do they support accurate or
generic understanding?
• Does the visualization help to generate new questions and research hypothesis?
• Is something particularly good/problematic of the visualization?
• What would you do differently?
3 ACTIVITY 1: ANALYSIS OF INTERACTIVE VISUALIZATIONS
- Activity 2-
Exploring equity in science
education outside the
classroom
ACTIVITY 2: EXPLORING EQUITY IN SCIENCE EDUCATION
OUTSIDE THE CLASSROOM
Activity Duration
PART 1: Generating questions based on different indicators 15 min.
PART 2: Creating data visualization(s) 20 min.
Sharing and discussing 15 min.
ACTIVITY 2: EXPLORING EQUITY IN SCIENCE EDUCATION
OUTSIDE THE CLASSROOM
Workflow for creating a data visualization:
DEFINE:
• What do you want to achieve? What is the datavis for?
FIND & COLLECT:
• What parameters are you going to visualize?
• Specify the dimensions of equity you focus on
EXPLORE & ORGANIZE
• How do need to prepare the data? What relevant values might be missing?
SKETCH & EXPERIMENT
• What datavis type do you plan to use?
ACTIVITY 2: EXPLORING EQUITY IN SCIENCE EDUCATION
OUTSIDE THE CLASSROOM
PRODUCE & REFINE:
• What other data visualization types would help you explore the data?
ASSESS:
• What questions/hypothesis do the data visualization arise? How would you explore these questions?
Behrens, J. T. (1997). Principles and procedures of exploratory data analysis. Psychological Methods, 2(2), 131.
Jebb, A. T., Parrigon, S., & Woo, S. E. (2017). Exploratory data analysis as a foundation of inductive research. Human
Resource Management Review, 27(2), 265-276.
Tukey, J. W. (1976). Exploratory data analysis. 1977. Massachusetts: Addison-Wesley.
Shneiderman, Ben. "The eyes have it: A task by data type taxonomy for information visualizations." Proceedings 1996
IEEE symposium on visual languages. IEEE, 1996.
Tufte, E. R., Goeler, N. H., & Benson, R. (1990). Envisioning information (Vol. 126). Cheshire, CT: Graphics press.
Tufte, E. R., McKay, S. R., Christian, W., & Matey, J. R. (1998). Visual explanations: Images and quantities, evidence and
narrative.
PRACTICAL TIPS
Top Ten Dos and Don'ts for Charts and Graphs
https://guides.library.duke.edu/datavis/topten
ADDITIONAL READINGS
Interested in exploring
this dataset further?
eva.durall@aalto.fi

Data Visualization in Exploratory Data Analysis

  • 1.
    DATA VISUALIZATION IN EXPLORATORYDATA ANALYSIS CS- E4450: EXPLORATIVE INFORMATION VISUALIZATION Eva Durall Gazulla Aalto University, Nov. 2018
  • 2.
    Time Activity 12:15 –12:30 Introduction: Data visualization in Exploratory Data Analysis (EDA) 12:30 – 13:00 Activity 1 13:00– 13:10 Break 13:10 – 14:00 Activity 2 SESSION STRUCTURE
  • 3.
    - Introduction - Datavisualization in Exploratory Data Analysis
  • 4.
    • Statistical traditionproposed by J. Tukey • Focus on discovering patterns to foster hypothesis development and refinement • Complementary to Confirmatory Data Analysis About Exploratory Data Analysis 1 INTRODUCTION EDA can be considered as an attitude toward the data.
  • 5.
    Emphasis on: - Generalunderstanding of the data (What is going on?) - Graphic representations of the data - Tentative model building and hypothesis generation - Iterations - Flexibility of methods About Exploratory Data Analysis 1 INTRODUCTION
  • 6.
    ”The role ofthe data analyst is to listen to the data in as many ways as possible until a plausible "story" of the data is apparent” Behrens, 1997 About Exploratory Data Analysis 1 INTRODUCTION
  • 7.
    Data visualization isa tool for defining relevant research questions. Data visualization a powerful tool because: - Synthesizes complex information - Reduces cognitive load - Offloads short-term memory Data visualization & EDA 1 INTRODUCTION
  • 9.
    Perceptual hierarchy ofvisual cues 1 INTRODUCTION Generic Accurate Color hue Volume Area Color intensity Slope Angle Length Length aligned Source: https://paldhous.github.io/ucb/2016/dataviz/week2.html#
  • 10.
    EDA main techniques 1INTRODUCTION EXPLORING DISTRIBUTIONS Focus on revealing the general pattern and individual deviations. * Importance on identifying the median INSPECTING INTERRELATIONS BETWEEN VARIABLES Focus on revealing the general pattern and the extreme deviations by visualizing interrelations between 2 or more variables. Supports the recall of contextual knowledge for explaining the deviations.
  • 11.
    Datavis for exploringdistributions 1 INTRODUCTION The median • The "middle" of a sorted list of numbers. • Facilitates to see a centre and detect extreme values. • To find the Median, place the numbers in value order and find the middle number. 1, 3, 7, 13, 17
  • 12.
    Datavis for exploringdistributions 1 INTRODUCTION Box Plot (Box and Whisker Plot) Visualizes the distribution of the data through their quartiles. Help to make the following observations: • Key values: the average, median 25th percentile etc. • Outliers and their values. • Symmetry of the data. • If the data is skewed and if so, in what direction. Source: https://datavizcatalogue.com/methods/box_plot.html
  • 13.
    1 INTRODUCTION Source: https://datavizcatalogue.com/methods/scatterplot.html Datavisfor inspecting interrelations Scatter Plot Visualizes if a relationship or correlation between the two variables exists. Types of correlation that can be observed: • positive (values increase together), • Negative (one value decreases as the other increases), • Null (no correlation), • Linear, • Exponential • U-shaped Correlation strength: strong, weak, none
  • 14.
    - The case- SySTEM 2020: Connecting science learning outside the classroom map
  • 15.
    ”Obtaining a qualityeducation is the foundation to creating sustainable development. In addition to improving quality of life, access to inclusive education can help equip locals with the tools required to develop innovative solutions to the world’s greatest problems.” https://www.un.org/sustainabledevelopment/sustainable-development-goals 2 THE CASE: EXPLORING EQUITY IN SCIENCE EDUCATION GOAL 4: QUALITY EDUCATION
  • 16.
    Learning about Science,Technology, Engineering, Arts and Mathematics (STEAM) in acontexts that are outside formal education. Such contexts can be science museums, makerspaces, science centers, public libraries, hacklabs… 2 THE CASE: EXPLORING EQUITY IN SCIENCE EDUCATION Science Learning Outside the Classroom
  • 17.
    Aims: • Gain understandingon science education informal contexts. • Identify calls for action to support equity. 2 THE CASE: EXPLORING EQUITY IN SCIENCE EDUCATION SYSTEM 2020: Connecting science learning outside the classroom Research and innovation project aiming to promote science learning outside the classroom at European level.
  • 18.
    2 THE CASE:EXPLORING EQUITY IN SCIENCE EDUCATION Source: https://laout.org/community-equity-event/
  • 19.
    The concept ofequity is strongly connected to fairness and social justice. Equity in education means ensuring that everyone has access and opportunities to learn and perform successfully. Indicators of equity: • Access: the means and opportunity to enter non-formal science education contexts. • Diversity: the representation of various identities and differences. • Inclusion: the active engagement of the contributions and participation of all people. 2 THE CASE: EXPLORING EQUITY IN SCIENCE EDUCATION Equity
  • 20.
    - Activity 1- Analysis of an interactive data visualization
  • 21.
    SySTEM 2020 map Opendatabase with over 2,200 entries providing information about organisations and activities focused on science learning outside the classroom. Access to the .cvs files: https://form.system2020.education/apidoc 2 THE CASE: EXPLORING EQUITY IN SCIENCE EDUCATION
  • 23.
    3 ACTIVITY 1:ANALYSIS OF INTERACTIVE VISUALIZATIONS Source: https://system2020.education/the-map
  • 24.
    ACTIVITY 1: ANALYSISOF INTERACTIVE VISUALIZATIONS Activity Duration Make groups of 4 people Access: https://system2020.education/the-map Explore SySTEM 2020 map data visualizations. Select a combination of filters. 5 min. Analysis of the data visualization analysis Follow the guidelines 15 min. Open discussion 10 min.
  • 25.
    Guidelines to analysethe data visualization: • What is the visualization about? (specify the parameters you have used) • What visual cues are employed? To what extent do they support accurate or generic understanding? • Does the visualization help to generate new questions and research hypothesis? • Is something particularly good/problematic of the visualization? • What would you do differently? 3 ACTIVITY 1: ANALYSIS OF INTERACTIVE VISUALIZATIONS
  • 26.
    - Activity 2- Exploringequity in science education outside the classroom
  • 27.
    ACTIVITY 2: EXPLORINGEQUITY IN SCIENCE EDUCATION OUTSIDE THE CLASSROOM Activity Duration PART 1: Generating questions based on different indicators 15 min. PART 2: Creating data visualization(s) 20 min. Sharing and discussing 15 min.
  • 28.
    ACTIVITY 2: EXPLORINGEQUITY IN SCIENCE EDUCATION OUTSIDE THE CLASSROOM Workflow for creating a data visualization: DEFINE: • What do you want to achieve? What is the datavis for? FIND & COLLECT: • What parameters are you going to visualize? • Specify the dimensions of equity you focus on EXPLORE & ORGANIZE • How do need to prepare the data? What relevant values might be missing? SKETCH & EXPERIMENT • What datavis type do you plan to use?
  • 29.
    ACTIVITY 2: EXPLORINGEQUITY IN SCIENCE EDUCATION OUTSIDE THE CLASSROOM PRODUCE & REFINE: • What other data visualization types would help you explore the data? ASSESS: • What questions/hypothesis do the data visualization arise? How would you explore these questions?
  • 30.
    Behrens, J. T.(1997). Principles and procedures of exploratory data analysis. Psychological Methods, 2(2), 131. Jebb, A. T., Parrigon, S., & Woo, S. E. (2017). Exploratory data analysis as a foundation of inductive research. Human Resource Management Review, 27(2), 265-276. Tukey, J. W. (1976). Exploratory data analysis. 1977. Massachusetts: Addison-Wesley. Shneiderman, Ben. "The eyes have it: A task by data type taxonomy for information visualizations." Proceedings 1996 IEEE symposium on visual languages. IEEE, 1996. Tufte, E. R., Goeler, N. H., & Benson, R. (1990). Envisioning information (Vol. 126). Cheshire, CT: Graphics press. Tufte, E. R., McKay, S. R., Christian, W., & Matey, J. R. (1998). Visual explanations: Images and quantities, evidence and narrative. PRACTICAL TIPS Top Ten Dos and Don'ts for Charts and Graphs https://guides.library.duke.edu/datavis/topten ADDITIONAL READINGS
  • 31.