ARCOT SRI MAHALAKSHMI WOMEN’S COLLEGE
ADVANCED DATA SCIENCE USING PYTHON
NAAN MUDHALVAN
SUBJECT CODE:23UNM40A
Presented By,
Name:SOWMIYA.R.
Reg.no:30323U09086
Bachelor Of Computer Application
INTRODUCTION TO DATA SCIENCE
INTRODUCTION:
Data Science is a combination of multiple
disciplines that uses statistics, data analysis, and
machine learning to analyse data and to extract
knowledge and insights from it.
By using Data Science, companies are able to make:
1.Better decisions (should we choose A or B)
2.Predictive analysis (what will happen next?)
3.Pattern discoveries (find pattern, or maybe hidden
information in the data)
INTRODUCTION TO DATA SCIENCE
Types:
Data Science is used in almost every industry today that
can predict customer behaviour and trends and identify new
opportunities.
Businesses can use it to make informed decisions about
product development and marketing. It is used as a tool to
detect fraud and optimize processes.
INTRODUCTION TO DATA SCIENCE
Key points:
Data science is really a progression of three steps.
We collect data, then analyse the trends within the data,
and lastly we make decisions based on the data.
Data science is a process in which the goal is to make
better choices.
EXPLORATORY DATAANALYSIS
INTRODUCTION:
Exploratory Data Analysis (EDA) is an analysis approach that identifies general
patterns in the data. These patterns include outliers and features of the data that
might be unexpected.
EDA is an important first step in any data analysis.
The goal of this tutorial document is to walk through some of the common
issues encountered in the early stages of an exploratory analysis on a set of data. It
gives examples of common problem areas in:
1. reading in data
2.dealing with blanks
3.dealing with factors
EXPLORATORY DATAANALYSIS
Types
There are four primary types of EDA:
• Univariate non-graphical
• Univariate graphical
• Multivariate non-graphical
• Multivariate graphical
EXPLORATORY DATAANALYSIS
Benefits of exoratory data analysis:
1.Deeper Insights
2. Improved Data Quality
3. Better Decision-Making
4.Enhanced Communication
5. Enhanced Communication
PYTHON FOR DATA SCIENCE
INTRODUCTION:
Python is a programming language widely used by
Data Scientists.
Python has in-built mathematical libraries and
functions, making it easier to calculate mathematical problems
and to perform data analysis.
Python's Pandas library provided that tools for reading
and writing data in various formats, such as CSV, Excel, and
SQL databases.
It is particularly useful for working with tabular data,
such as data in spreadsheets or databases.
PYTHON FOR DATA SCIENCE
Python has libraries with large collections of mathematical
functions and analytical tools.
1.Pandas - This library is used for structured data
operations, like import CSV files, create data frames, and
data preparation
2.Numpy - This is a mathematical library. Has a powerful
N-dimensional array object, linear algebra, Fourier
transform, etc.
3.Matplotlib - This library is used for visualization of data.
4.SciPy - This library has linear algebra modules
PYTHON FOR DATA SCIENCE
Key features:
Python's key features for data analysis include
its simplicity, expressive syntax, large library ecosystem,
easy integration with other languages, and scalability.
These features enable data scientists to perform complex
tasks efficiently and effectively.
EXPLORE MACHINE LEARNING USING
PYTHON
DEFINITION:
Machine learning is a section of Artificial Intelligence
(AI) that aims at making a machine learn from experience
and automatically do the work without necessarily being
programmed on a task.
The Python programming language best fits machine
learning due to its independent platform and its popularity in
the programming community.
EXPLORE MACHINE LEARNING USING
PYTHON
Types of machine learning:
1.Supervised learning
2.Unsupervised learning
3.Semisupervised learning
4.Reinforcement learning
EXPLORE MACHINE LEARNING USING PYTHON
Advantages of machine learning:
1.Automation of Everything.
2.Wide Range of Applications.
3.Scope of Improvement.
4.Best for Education.
5.Efficient handling Of data.
DATA VISUALISING USING PYTHON
DEFINITION:
The process of finding trends and correlations in our
data by representing it pictorially is called Data
Visualization.
To perform data visualization in python, we can use
various python data visualization modules such as
Matplotlib, Seaborne, Plotly, etc.
DATA VISUALISING USING PYTHON
Types of data visualisation:
1.Bar chart
2.Pie chart
3.Line chart
4.Scatter plot
5.Box plot
6.Histogram
DATA VISUALISING USING PYTHON
Python visualisation libraries:
1.Matplotlib is one of the best Python visualization library
for generating powerful yet simple visualization.
It is a 2-D plotting library that can be used in various ways,
including Python, iPython sheets, and Jupyter notebooks.
2.Seaborn is the best python libraries for data visualization,
which offers a variety of visualized patterns.
It is designed to work more compatible with Pandas data
form and is widely used for statistical visualization.

R.SOWMIYA (30323U09086).pptx data science with python

  • 1.
    ARCOT SRI MAHALAKSHMIWOMEN’S COLLEGE ADVANCED DATA SCIENCE USING PYTHON NAAN MUDHALVAN SUBJECT CODE:23UNM40A Presented By, Name:SOWMIYA.R. Reg.no:30323U09086 Bachelor Of Computer Application
  • 2.
    INTRODUCTION TO DATASCIENCE INTRODUCTION: Data Science is a combination of multiple disciplines that uses statistics, data analysis, and machine learning to analyse data and to extract knowledge and insights from it. By using Data Science, companies are able to make: 1.Better decisions (should we choose A or B) 2.Predictive analysis (what will happen next?) 3.Pattern discoveries (find pattern, or maybe hidden information in the data)
  • 3.
    INTRODUCTION TO DATASCIENCE Types: Data Science is used in almost every industry today that can predict customer behaviour and trends and identify new opportunities. Businesses can use it to make informed decisions about product development and marketing. It is used as a tool to detect fraud and optimize processes.
  • 4.
    INTRODUCTION TO DATASCIENCE Key points: Data science is really a progression of three steps. We collect data, then analyse the trends within the data, and lastly we make decisions based on the data. Data science is a process in which the goal is to make better choices.
  • 5.
    EXPLORATORY DATAANALYSIS INTRODUCTION: Exploratory DataAnalysis (EDA) is an analysis approach that identifies general patterns in the data. These patterns include outliers and features of the data that might be unexpected. EDA is an important first step in any data analysis. The goal of this tutorial document is to walk through some of the common issues encountered in the early stages of an exploratory analysis on a set of data. It gives examples of common problem areas in: 1. reading in data 2.dealing with blanks 3.dealing with factors
  • 6.
    EXPLORATORY DATAANALYSIS Types There arefour primary types of EDA: • Univariate non-graphical • Univariate graphical • Multivariate non-graphical • Multivariate graphical
  • 7.
    EXPLORATORY DATAANALYSIS Benefits ofexoratory data analysis: 1.Deeper Insights 2. Improved Data Quality 3. Better Decision-Making 4.Enhanced Communication 5. Enhanced Communication
  • 8.
    PYTHON FOR DATASCIENCE INTRODUCTION: Python is a programming language widely used by Data Scientists. Python has in-built mathematical libraries and functions, making it easier to calculate mathematical problems and to perform data analysis. Python's Pandas library provided that tools for reading and writing data in various formats, such as CSV, Excel, and SQL databases. It is particularly useful for working with tabular data, such as data in spreadsheets or databases.
  • 9.
    PYTHON FOR DATASCIENCE Python has libraries with large collections of mathematical functions and analytical tools. 1.Pandas - This library is used for structured data operations, like import CSV files, create data frames, and data preparation 2.Numpy - This is a mathematical library. Has a powerful N-dimensional array object, linear algebra, Fourier transform, etc. 3.Matplotlib - This library is used for visualization of data. 4.SciPy - This library has linear algebra modules
  • 10.
    PYTHON FOR DATASCIENCE Key features: Python's key features for data analysis include its simplicity, expressive syntax, large library ecosystem, easy integration with other languages, and scalability. These features enable data scientists to perform complex tasks efficiently and effectively.
  • 11.
    EXPLORE MACHINE LEARNINGUSING PYTHON DEFINITION: Machine learning is a section of Artificial Intelligence (AI) that aims at making a machine learn from experience and automatically do the work without necessarily being programmed on a task. The Python programming language best fits machine learning due to its independent platform and its popularity in the programming community.
  • 12.
    EXPLORE MACHINE LEARNINGUSING PYTHON Types of machine learning: 1.Supervised learning 2.Unsupervised learning 3.Semisupervised learning 4.Reinforcement learning
  • 13.
    EXPLORE MACHINE LEARNINGUSING PYTHON Advantages of machine learning: 1.Automation of Everything. 2.Wide Range of Applications. 3.Scope of Improvement. 4.Best for Education. 5.Efficient handling Of data.
  • 14.
    DATA VISUALISING USINGPYTHON DEFINITION: The process of finding trends and correlations in our data by representing it pictorially is called Data Visualization. To perform data visualization in python, we can use various python data visualization modules such as Matplotlib, Seaborne, Plotly, etc.
  • 15.
    DATA VISUALISING USINGPYTHON Types of data visualisation: 1.Bar chart 2.Pie chart 3.Line chart 4.Scatter plot 5.Box plot 6.Histogram
  • 16.
    DATA VISUALISING USINGPYTHON Python visualisation libraries: 1.Matplotlib is one of the best Python visualization library for generating powerful yet simple visualization. It is a 2-D plotting library that can be used in various ways, including Python, iPython sheets, and Jupyter notebooks. 2.Seaborn is the best python libraries for data visualization, which offers a variety of visualized patterns. It is designed to work more compatible with Pandas data form and is widely used for statistical visualization.