EXPLORATORY
DATA ANALYSIS
DATA VISUALIZATION - reveal different
patterns or hidden properties of the data.
Introduction
Exploratory data analysis as the name implies is a technique used for exploring data. EDA is
an iterative process that entails:
1. Generating questions about your data.
2. Searching for answers through visualizations, transformations, and modeling.
3. Using acquired answers to generate new questions.
Most Data scientists consider EDA to be an art because it does not follow a formal process
with strict rules (well I belong to that category 😁). This group of slides, however, is not
exhaustive but covers many important ideas that will help you find the most common
patterns in data using data visualization, and it is a recommended read especially for
beginners.
Histogram
Histograms show the distribution of values a
variable takes in a particular set of data.
It’s particularly useful for seeing the shape
of the data distribution patterns in some detail.
Box plot
Box plots show the range of values a variable can take.
It’s useful for seeing where most of the data fall,
and to catch outliers.
The red line in the center is the median, the edges
of the box are the 25th and 75th percentiles,
and the lone points by themselves are outliers.
Cumulative Distribution Function (CDF)
CFDs show how much of the data is less than a
certain amount. It’s useful for comparing the
data distribution to some reference distribution.
Scatter plot
Scatter plots show the relationship between two variables.
It’s useful when trying to find out what kind of a
relationship exists between variables.
Bar plot
Bar plots show comparisons between discrete
Categories. Thus, they are highly useful for exploring
and summarising categorical data.
One axis of the plot shows the specific categories
being compared, and the other axis represents a
measured value.
Line graph
Line graphs are useful for visualizing trends over time.
The vertical axis could represent any variable, but
the horizontal axis ordinarily represents a time variable.
The continuous line implies some quantity that
increases sequentially (one that increases over time).
Thank you for
your time.
Oluwatobi Adefami

Visualizations in Exploratory Data Analysis

  • 1.
    EXPLORATORY DATA ANALYSIS DATA VISUALIZATION- reveal different patterns or hidden properties of the data.
  • 2.
    Introduction Exploratory data analysisas the name implies is a technique used for exploring data. EDA is an iterative process that entails: 1. Generating questions about your data. 2. Searching for answers through visualizations, transformations, and modeling. 3. Using acquired answers to generate new questions. Most Data scientists consider EDA to be an art because it does not follow a formal process with strict rules (well I belong to that category 😁). This group of slides, however, is not exhaustive but covers many important ideas that will help you find the most common patterns in data using data visualization, and it is a recommended read especially for beginners.
  • 3.
    Histogram Histograms show thedistribution of values a variable takes in a particular set of data. It’s particularly useful for seeing the shape of the data distribution patterns in some detail.
  • 4.
    Box plot Box plotsshow the range of values a variable can take. It’s useful for seeing where most of the data fall, and to catch outliers. The red line in the center is the median, the edges of the box are the 25th and 75th percentiles, and the lone points by themselves are outliers.
  • 5.
    Cumulative Distribution Function(CDF) CFDs show how much of the data is less than a certain amount. It’s useful for comparing the data distribution to some reference distribution.
  • 6.
    Scatter plot Scatter plotsshow the relationship between two variables. It’s useful when trying to find out what kind of a relationship exists between variables.
  • 7.
    Bar plot Bar plotsshow comparisons between discrete Categories. Thus, they are highly useful for exploring and summarising categorical data. One axis of the plot shows the specific categories being compared, and the other axis represents a measured value.
  • 8.
    Line graph Line graphsare useful for visualizing trends over time. The vertical axis could represent any variable, but the horizontal axis ordinarily represents a time variable. The continuous line implies some quantity that increases sequentially (one that increases over time).
  • 9.
    Thank you for yourtime. Oluwatobi Adefami