INTRODUCTION
TO
MACHINE LEARNING
CHAPTER 1
Topics Covered
1.1 Introduction to Machine Learning
 Artificial Intelligence
 Machine Learning
 Application of Machine Learning
1.2 Types of Machine Learning
1.3 Supervised Machine Learning
1.3.1 Classification
1.4 Unsupervised Machine Learning and its Application
1.4.1 Difference between Supervised and Unsupervised Machine
Learning
1.5 Semi-Supervised Machine Learning
1.6 Reinforcement Machine Learning and its Application
1.7 Hypothesis Space and Inductive Bias
1.8 Underfitting and Overfitting
1.9 Evaluation and Sampling Methods
 1.9.1 Regression Metrics
 1.9.2 Classification Metrics
1.10 Training and Test Dataset and Need of
Cross Validation
1.11 Linear Regression
 1.111 Linear Models
1.12 Decision Trees
 1.12.1 The Decision Tree Learning Algorithm
 1.12.2 Entropy
 1.12.3 Information Gain
 1.124 Impurity Measures
 Exercise
Introduction to Machine Learning
 Machine learning is a branch of Artificial Intelligence (AI) and Computer Science which focuses on the use of
data and algorithms to imitate the way that humans learn, gradually improving its accuracy.
 Machine Learning is an umbrella term used to describe a variety of different tools and techniques which
allow a machine or a computer program to learn and improve over time.
 ML tools and techniques include to Statistical Reasoning, Data Mining, Mathematics and Programming.
 Apache Mahout AWS Machine Learning BigML Colab Google Cloud AutoML
 IBM Watson Studio Microsoft Azure Machine Learning OpenNN PyTorch Scikit-learn
 Shogun TensorFlow Vertex AI Weka XGBoost
https://builtin.com/machine-learning/machine-learning-tools
Introduction to Machine Learning
 Machines/computers an ability to learn the way humans do, i.e. without explicitly telling them what to do.
 Machine learning gives computers the ability to learn without being explicitly programmed.
Arthur Samuel
 Machine learning refers to teaching devices to learn information given to a dataset without manual human
interference.
Machine Learning (ML) is a subset of artificial intelligence (AI) that uses statistics, trial and error, and huge
amount of data to learn a specific task without ever having to be specifically programmed to do that task.
 It involves identifying patterns in data, and then optimizing those findings through both trial and error and
feedback.
Well Posed Learning Problem
A well-posed learning problem is a task in which the Input, Output, and Learning objective are clearly defined, and there exists a
unique solution to the problem.
A well-posed learning problem has three properties:
1. Existence: The problem must have at least one solution. There must be a possible relationship between the input and output data.
2. Uniqueness: The problem must have a unique solution. There must be only one correct relationship between the input and output
data.
3. Stability: The solution to the problem must be stable with respect to small changes in the input data. The output produced by the
machine learning algorithm should not change significantly when the input data is slightly modified.
4. A well-posed learning problem is essential for the development of effective and reliable machine learning algorithms. Without a
well-posed problem, the algorithm may produce incorrect or unstable results, making it difficult to use in practical applications.
So it is important to carefully define the input, output, and learning objective when formulating a machine learning
problem.
Well Posed Learning Problem
A learning problem can be defined as a task in which an agent (such as A Machine Learning
Algorithm or a Human) must learn to perform a specific task or make predictions based on a set of
inputs or data.
Three features that can be identified in a learning problem are:
Input data: This refers to the set of data or information that the agent uses to learn and make
predictions. The input data can be structured or unstructured, and may come from a variety of sources
such as text, images, audio, or sensor data.
Output or prediction: This refers to the task that the agent is trying to learn or the prediction that it is
trying to make based on the input data. The output can be a single value, a set of values, or a
probability distribution over possible outcomes.
Well Posed Learning Problem
Evaluation metric / Performance measure: This refers to the measure or metric that is used to evaluate
the performance of the agent on the learning task.
The evaluation metric may vary depending on the specific learning problem and may include metrics such as
Accuracy, Precision, Recall, F1 Score, or Mean Squared Error.
Definition:-
A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.
Tom Mitchell
Well Posed Learning Problem
Spam Email Classification.
Input: The input is an email, which can be represented as a collection of words or phrases, or even
more complex structures like the email header information.
Output: The output is a binary label indicating whether the email is spam (1) or not spam (0).
Learning Objective: The learning objective is to train a model that can accurately classify emails as
spam or not spam. This is typically achieved by minimizing a loss function, such as the cross-entropy loss
for binary classification problems.
Examples of well-posed learning problems:
2. Sentiment analysis: Given a set of text documents,
Task:- Is to learn a model that can predict the sentiment
of new documents (e.g., positive, negative, or neutral).
Input:- Is the text data,
Output:- Is the sentiment label
Learning objective:- Is to minimize the prediction error.
Performance Measure :- Percentage of prediction of
the sentiments of new documents.
Training Experience :- A database of sentiments of
given documents.
1. Image classification: Given a set of labeled images,
Task:- Is to learn a model that can Correctly classify
new images into their respective classes.
Input:- Is the image data
Output:- Is the class label,
Learning objective:- Is to Minimize the Classification
Error.
Performance Measure :- Percentage of images
correctly classified.
Training Experience :- A Database of images with
given classification
Examples of well-posed learning problems:
3. Fraud detection: Given a set of transaction data,
Task:- Is to learn a model that can identify fraudulent transactions.
Input:- Is the transaction data
Output:- Is a binary label (fraudulent or not),
Learning objective:- Is to minimize the false positive and false
negative rates.
Performance Measure :- Percentage of False Positive and False
Negative Rates.
4. Regression: Given a set of input features and corresponding target
values,
Task:- Task is to learn a model that can predict the target value for
new input data
Input:- Is the feature data
Output:- Is the target value,
Learning objective:- Is to minimize the prediction error (e.g., mean
squared error).
Performance Measure :- Percentage of the prediction error.
History of Machine Learning
 Year 1950 : Alan Turing developed the Turing Test during this year.
 Year 1957 : Perceptron - The first ever Neural Network
 Year 1960 : MIT developed a Natural Language Processing program to act as a therapist. The program was called ELIZA.
 Year 1967 : The advent of Nearest Neighbor algorithm, very prominently used in Search and Approximation
 Year 1970 : Backpropagation takes shape. Backpropagation is a set of algorithms used extensively in Deep Learning.
 Year 1980 : Kunihiko Fukushima successfully built a multilayered Neural Network called ANN.
Year 1981 : Explanation Based Learning
Year 1989 : Reinforcement Learning is finally realized. Q-Learning algorithm.
Year 2009 : ImageNet
Year 2010 : Google Brain and Facebook's DeepFace
Year 2022 : ChatGPT Chat Generative Pre-trained Transformer
https://www.zeolearn.com/magazine/what-is-machine-learning
Artificial Intelligence vs. Machine Learning vs. Deep Learning vs. Neural
Networks
 Machine learning, Deep learning, and Neural networks are all sub-fields of Artificial Intelligence.
 Neural networks is a sub-field of Machine learning, and Deep learning.
 Deep" Machine learning can use labeled datasets, also known as Supervised learning. Eliminates some of the
human intervention required and enables the use of larger data sets.
 “Non-deep", Machine learning is more dependent on human intervention to learn. Human experts determine
the set of features to understand the differences between data inputs, requiring more structured data to learn.
 Neural networks, or artificial neural networks (ANNs), are comprised of node layers, containing an input layer,
one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to another and has an
associated weight and threshold.
 Deep learning and Neural Networks are accelerate progress in areas such as computer vision, natural language
processing, and speech recognition.
Artificial Intelligence vs. Machine Learning vs. Deep Learning vs.
Neural Networks
 AI refers to the software and processes that are designed to mimic the way humans think and process
information. It includes computer vision, natural language processing, robotics, autonomous vehicle operating
systems, and machine learning.
 With the help of artificial intelligence, Devices are able to learn and identify information in order to solve
problems and offer key insights into various domains.
Artificial Intelligence vs. Machine Learning vs. Deep
Learning vs. Neural Networks
AI enables machines to understand data and make decisions based on patterns hidden
in data without any human intervention.
 Machines adjust their knowledge based on new inputs.
 Example, Self-driving cars , Alexa and Cortana - Conversations with us in our natural
human language
 Machine Learning:- Subset of AI
 Machine learning with the help of the algorithm can process the surplus of
information and output an accurate prediction within moments. Use deep learning all
the time.
 Uses statistical models to explore, analyze and find patterns in large amounts of
data.
 Perform tasks without being explicitly programmed, allows them to learn from
experience and improve over time without human intervention.
https://learnerjoy.com/artificial-intelligence-vs-machine-learning-vs-deep-learning-vs-data-science/
Artificial Intelligence vs. Machine Learning vs. Deep
Learning vs. Neural Networks
 Approaches:- 1. Supervised learning, 2. Unsupervised learning and 3.
Reinforcement learning.
1. Supervised learning:- Requires a human to input labelled data /Past
Labeled data into the machine and outputs a prediction of a new sample.
 2. Unsupervised learning:- Takes unlabeled data as input, groups the
data based on its similarity and outputs clusters of similar samples for the
human to analyze further reinforcement. O/p Not known. Algorithms- L-
means, Hierarchical Clustering, PCA , Neural Network.
3. Reinforcement learning. :- Reinforcement learning is also known as
semi-supervised learning. A small amount of labeled data and a large
amount of unlabeled data and utilizes a reward or trial and error system
to learn over time. Good Action and Bad Action
Artificial Intelligence vs. Machine Learning vs. Deep
Learning vs. Neural Networks
 Deep Learning - Deep learning is the subset of machine learning.
 The main idea behind deep learning is machines to learn things like the human
brain.
 Human brain is made of multitudes of neurons that allow us to operate the way
we do.
 The collection of connected neurons in a human brain, scientists create a multi-
layer network that machines could use to learn from experience and predict.
Techniques
Artificial Neural Networks (ANN):- I/P in the form of Numbers
Convolutional Neural Networks (CNN):- I/P in the form of Images
Recurrent neural networks (RNN). I/P in the form of Time Series Data
Two popular frameworks used in Deep learning are
•PyTorch by Facebook
•TensorFlow by Google
Artificial Intelligence vs. Machine Learning vs. Deep
Learning vs. Neural Networks
 Data Science
Data science is to perform exploratory analysis to better understand
the data.
It plays a huge role when building ML models. If you have a huge
amount of data, you will get more insights from data and accurate
results that can be applied to business use cases.
 Statistical tools –Linear algebra
Machine Learning Applications
 Image Recognition: It is used to identify objects, persons,
places, digital images, etc. ex Automatic friend tagging
suggestion. Deep Face
 Traffic prediction: Google Maps, Real Time location of the
vehicle form Google Map app and sensors. Average time has
taken on past days at the same time.
 Product recommendations: used by various e-
commerce and entertainment companies such
as Amazon, Netflix, etc., for product recommendation to
the user.
Machine Learning Applications

 Self-driving cars: It is using unsupervised learning method to train the car models to detect people and
objects while driving.
 Email Spam and Malware Filtering: Filtered automatically normal, and spam. Multi-Layer
Perceptron, Decision tree, and Naïve Bayes classifier.
 Virtual Personal Assistant: Google assistant, Alexa, Cortana, Siri.
 Email Spam and Malware Filtering:
Medical Sector
 Banking and Stock Market, Search Engine , Chat Bot,
Speech Recognition: Search by voice, Speech to text", Computer speech recognition
Machine learning Life cycle
 Machine learning life cycle is a cyclic process to build an
efficient machine learning project.
 Gathering Data
 Data preparation
 Data Wrangling
 Analyse Data
 Train the model
 Test the model
 Deployment
 1. Gathering Data: Obtain all data-related problems.
This step includes the below tasks:
•Identify various data sources
•Collect data
•Integrate the data obtained from different sources
2. Data preparation: Data preparation is a step where we put our data into a suitable place and
prepare it to use in our machine learning training. This step can be further divided into two processes:
•Data exploration: Understand the nature of data, understand the characteristics, format, and quality
of data.
•Data pre-processing:
3. Data Wrangling: Data wrangling is the process of cleaning and converting raw data into a
useable format. It is the process of cleaning the data, selecting the variable to use, and
transforming the data in a proper format to make it more suitable for analysis in the next step.
Cleaning of data is required to address the quality issues.
collected data may have various issues, including:
•Missing Values
•Duplicate data
•Invalid data
•Noise
 4. Data Analysis
The cleaned and prepared data is passed on to the analysis step. This step involves:
• Selection of analytical techniques
• Building models
• Review the result
•Where we select the machine learning techniques such as Classification, Regression, Cluster
analysis, Association, etc. then build the model using prepared data, and evaluate the model.
 5. Train Model
Train our model to improve its performance for better outcome of the problem.
Use datasets to train the model using various machine learning algorithms. Training a model is
required to understand the various patterns, rules, and, features.
6. Test Model: trained on a given dataset
7. Deployment
What is a dataset?
 A dataset is a collection of data in which data is arranged in order. A dataset can contain any data from a series of an array
to a database table.
Types of data in datasets
• Numerical data:Such as house price, temperature, etc.
• Categorical data:Such as Yes/No, True/False, Blue/green, etc.
• Ordinal data:These data are similar to categorical data but can be measured on the basis of comparison.
 Types of datasets
 Image Datasets,
 Text Datasets:
Time Series Datasets: Tabular Datasets:
Data Pre-processing:
Pre-processing procedures incorporate data cleaning to eliminate irregularities or blunders,
standardization to scale data inside a particular reach, highlight scaling to guarantee highlights have
comparative ranges, and taking care of missing qualities through ascription or evacuation.
Datasets are divided into two parts:
•Training dataset:
•Test Dataset
Popular sources for Machine Learning
datasets
 1. Kaggle Datasets
UCI Machine Learning Repository
Datasets via AWS
Google's Dataset Search Engine
Microsoft Datasets
Scikit-learn dataset


















































Machine Learning Chapter one introduction

  • 1.
  • 2.
    Topics Covered 1.1 Introductionto Machine Learning  Artificial Intelligence  Machine Learning  Application of Machine Learning 1.2 Types of Machine Learning 1.3 Supervised Machine Learning 1.3.1 Classification 1.4 Unsupervised Machine Learning and its Application 1.4.1 Difference between Supervised and Unsupervised Machine Learning 1.5 Semi-Supervised Machine Learning 1.6 Reinforcement Machine Learning and its Application 1.7 Hypothesis Space and Inductive Bias 1.8 Underfitting and Overfitting 1.9 Evaluation and Sampling Methods  1.9.1 Regression Metrics  1.9.2 Classification Metrics 1.10 Training and Test Dataset and Need of Cross Validation 1.11 Linear Regression  1.111 Linear Models 1.12 Decision Trees  1.12.1 The Decision Tree Learning Algorithm  1.12.2 Entropy  1.12.3 Information Gain  1.124 Impurity Measures  Exercise
  • 3.
    Introduction to MachineLearning  Machine learning is a branch of Artificial Intelligence (AI) and Computer Science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy.  Machine Learning is an umbrella term used to describe a variety of different tools and techniques which allow a machine or a computer program to learn and improve over time.  ML tools and techniques include to Statistical Reasoning, Data Mining, Mathematics and Programming.  Apache Mahout AWS Machine Learning BigML Colab Google Cloud AutoML  IBM Watson Studio Microsoft Azure Machine Learning OpenNN PyTorch Scikit-learn  Shogun TensorFlow Vertex AI Weka XGBoost https://builtin.com/machine-learning/machine-learning-tools
  • 4.
    Introduction to MachineLearning  Machines/computers an ability to learn the way humans do, i.e. without explicitly telling them what to do.  Machine learning gives computers the ability to learn without being explicitly programmed. Arthur Samuel  Machine learning refers to teaching devices to learn information given to a dataset without manual human interference. Machine Learning (ML) is a subset of artificial intelligence (AI) that uses statistics, trial and error, and huge amount of data to learn a specific task without ever having to be specifically programmed to do that task.  It involves identifying patterns in data, and then optimizing those findings through both trial and error and feedback.
  • 5.
    Well Posed LearningProblem A well-posed learning problem is a task in which the Input, Output, and Learning objective are clearly defined, and there exists a unique solution to the problem. A well-posed learning problem has three properties: 1. Existence: The problem must have at least one solution. There must be a possible relationship between the input and output data. 2. Uniqueness: The problem must have a unique solution. There must be only one correct relationship between the input and output data. 3. Stability: The solution to the problem must be stable with respect to small changes in the input data. The output produced by the machine learning algorithm should not change significantly when the input data is slightly modified. 4. A well-posed learning problem is essential for the development of effective and reliable machine learning algorithms. Without a well-posed problem, the algorithm may produce incorrect or unstable results, making it difficult to use in practical applications. So it is important to carefully define the input, output, and learning objective when formulating a machine learning problem.
  • 6.
    Well Posed LearningProblem A learning problem can be defined as a task in which an agent (such as A Machine Learning Algorithm or a Human) must learn to perform a specific task or make predictions based on a set of inputs or data. Three features that can be identified in a learning problem are: Input data: This refers to the set of data or information that the agent uses to learn and make predictions. The input data can be structured or unstructured, and may come from a variety of sources such as text, images, audio, or sensor data. Output or prediction: This refers to the task that the agent is trying to learn or the prediction that it is trying to make based on the input data. The output can be a single value, a set of values, or a probability distribution over possible outcomes.
  • 7.
    Well Posed LearningProblem Evaluation metric / Performance measure: This refers to the measure or metric that is used to evaluate the performance of the agent on the learning task. The evaluation metric may vary depending on the specific learning problem and may include metrics such as Accuracy, Precision, Recall, F1 Score, or Mean Squared Error. Definition:- A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E. Tom Mitchell
  • 8.
    Well Posed LearningProblem Spam Email Classification. Input: The input is an email, which can be represented as a collection of words or phrases, or even more complex structures like the email header information. Output: The output is a binary label indicating whether the email is spam (1) or not spam (0). Learning Objective: The learning objective is to train a model that can accurately classify emails as spam or not spam. This is typically achieved by minimizing a loss function, such as the cross-entropy loss for binary classification problems.
  • 9.
    Examples of well-posedlearning problems: 2. Sentiment analysis: Given a set of text documents, Task:- Is to learn a model that can predict the sentiment of new documents (e.g., positive, negative, or neutral). Input:- Is the text data, Output:- Is the sentiment label Learning objective:- Is to minimize the prediction error. Performance Measure :- Percentage of prediction of the sentiments of new documents. Training Experience :- A database of sentiments of given documents. 1. Image classification: Given a set of labeled images, Task:- Is to learn a model that can Correctly classify new images into their respective classes. Input:- Is the image data Output:- Is the class label, Learning objective:- Is to Minimize the Classification Error. Performance Measure :- Percentage of images correctly classified. Training Experience :- A Database of images with given classification
  • 10.
    Examples of well-posedlearning problems: 3. Fraud detection: Given a set of transaction data, Task:- Is to learn a model that can identify fraudulent transactions. Input:- Is the transaction data Output:- Is a binary label (fraudulent or not), Learning objective:- Is to minimize the false positive and false negative rates. Performance Measure :- Percentage of False Positive and False Negative Rates. 4. Regression: Given a set of input features and corresponding target values, Task:- Task is to learn a model that can predict the target value for new input data Input:- Is the feature data Output:- Is the target value, Learning objective:- Is to minimize the prediction error (e.g., mean squared error). Performance Measure :- Percentage of the prediction error.
  • 13.
    History of MachineLearning  Year 1950 : Alan Turing developed the Turing Test during this year.  Year 1957 : Perceptron - The first ever Neural Network  Year 1960 : MIT developed a Natural Language Processing program to act as a therapist. The program was called ELIZA.  Year 1967 : The advent of Nearest Neighbor algorithm, very prominently used in Search and Approximation  Year 1970 : Backpropagation takes shape. Backpropagation is a set of algorithms used extensively in Deep Learning.  Year 1980 : Kunihiko Fukushima successfully built a multilayered Neural Network called ANN. Year 1981 : Explanation Based Learning Year 1989 : Reinforcement Learning is finally realized. Q-Learning algorithm. Year 2009 : ImageNet Year 2010 : Google Brain and Facebook's DeepFace Year 2022 : ChatGPT Chat Generative Pre-trained Transformer https://www.zeolearn.com/magazine/what-is-machine-learning
  • 14.
    Artificial Intelligence vs.Machine Learning vs. Deep Learning vs. Neural Networks  Machine learning, Deep learning, and Neural networks are all sub-fields of Artificial Intelligence.  Neural networks is a sub-field of Machine learning, and Deep learning.  Deep" Machine learning can use labeled datasets, also known as Supervised learning. Eliminates some of the human intervention required and enables the use of larger data sets.  “Non-deep", Machine learning is more dependent on human intervention to learn. Human experts determine the set of features to understand the differences between data inputs, requiring more structured data to learn.  Neural networks, or artificial neural networks (ANNs), are comprised of node layers, containing an input layer, one or more hidden layers, and an output layer. Each node, or artificial neuron, connects to another and has an associated weight and threshold.  Deep learning and Neural Networks are accelerate progress in areas such as computer vision, natural language processing, and speech recognition.
  • 15.
    Artificial Intelligence vs.Machine Learning vs. Deep Learning vs. Neural Networks  AI refers to the software and processes that are designed to mimic the way humans think and process information. It includes computer vision, natural language processing, robotics, autonomous vehicle operating systems, and machine learning.  With the help of artificial intelligence, Devices are able to learn and identify information in order to solve problems and offer key insights into various domains.
  • 16.
    Artificial Intelligence vs.Machine Learning vs. Deep Learning vs. Neural Networks AI enables machines to understand data and make decisions based on patterns hidden in data without any human intervention.  Machines adjust their knowledge based on new inputs.  Example, Self-driving cars , Alexa and Cortana - Conversations with us in our natural human language  Machine Learning:- Subset of AI  Machine learning with the help of the algorithm can process the surplus of information and output an accurate prediction within moments. Use deep learning all the time.  Uses statistical models to explore, analyze and find patterns in large amounts of data.  Perform tasks without being explicitly programmed, allows them to learn from experience and improve over time without human intervention. https://learnerjoy.com/artificial-intelligence-vs-machine-learning-vs-deep-learning-vs-data-science/
  • 17.
    Artificial Intelligence vs.Machine Learning vs. Deep Learning vs. Neural Networks  Approaches:- 1. Supervised learning, 2. Unsupervised learning and 3. Reinforcement learning. 1. Supervised learning:- Requires a human to input labelled data /Past Labeled data into the machine and outputs a prediction of a new sample.  2. Unsupervised learning:- Takes unlabeled data as input, groups the data based on its similarity and outputs clusters of similar samples for the human to analyze further reinforcement. O/p Not known. Algorithms- L- means, Hierarchical Clustering, PCA , Neural Network. 3. Reinforcement learning. :- Reinforcement learning is also known as semi-supervised learning. A small amount of labeled data and a large amount of unlabeled data and utilizes a reward or trial and error system to learn over time. Good Action and Bad Action
  • 18.
    Artificial Intelligence vs.Machine Learning vs. Deep Learning vs. Neural Networks  Deep Learning - Deep learning is the subset of machine learning.  The main idea behind deep learning is machines to learn things like the human brain.  Human brain is made of multitudes of neurons that allow us to operate the way we do.  The collection of connected neurons in a human brain, scientists create a multi- layer network that machines could use to learn from experience and predict. Techniques Artificial Neural Networks (ANN):- I/P in the form of Numbers Convolutional Neural Networks (CNN):- I/P in the form of Images Recurrent neural networks (RNN). I/P in the form of Time Series Data Two popular frameworks used in Deep learning are •PyTorch by Facebook •TensorFlow by Google
  • 19.
    Artificial Intelligence vs.Machine Learning vs. Deep Learning vs. Neural Networks  Data Science Data science is to perform exploratory analysis to better understand the data. It plays a huge role when building ML models. If you have a huge amount of data, you will get more insights from data and accurate results that can be applied to business use cases.  Statistical tools –Linear algebra
  • 20.
    Machine Learning Applications Image Recognition: It is used to identify objects, persons, places, digital images, etc. ex Automatic friend tagging suggestion. Deep Face  Traffic prediction: Google Maps, Real Time location of the vehicle form Google Map app and sensors. Average time has taken on past days at the same time.  Product recommendations: used by various e- commerce and entertainment companies such as Amazon, Netflix, etc., for product recommendation to the user.
  • 21.
    Machine Learning Applications  Self-driving cars: It is using unsupervised learning method to train the car models to detect people and objects while driving.  Email Spam and Malware Filtering: Filtered automatically normal, and spam. Multi-Layer Perceptron, Decision tree, and Naïve Bayes classifier.  Virtual Personal Assistant: Google assistant, Alexa, Cortana, Siri.  Email Spam and Malware Filtering: Medical Sector  Banking and Stock Market, Search Engine , Chat Bot, Speech Recognition: Search by voice, Speech to text", Computer speech recognition
  • 22.
    Machine learning Lifecycle  Machine learning life cycle is a cyclic process to build an efficient machine learning project.  Gathering Data  Data preparation  Data Wrangling  Analyse Data  Train the model  Test the model  Deployment
  • 23.
     1. GatheringData: Obtain all data-related problems. This step includes the below tasks: •Identify various data sources •Collect data •Integrate the data obtained from different sources 2. Data preparation: Data preparation is a step where we put our data into a suitable place and prepare it to use in our machine learning training. This step can be further divided into two processes: •Data exploration: Understand the nature of data, understand the characteristics, format, and quality of data. •Data pre-processing:
  • 24.
    3. Data Wrangling:Data wrangling is the process of cleaning and converting raw data into a useable format. It is the process of cleaning the data, selecting the variable to use, and transforming the data in a proper format to make it more suitable for analysis in the next step. Cleaning of data is required to address the quality issues. collected data may have various issues, including: •Missing Values •Duplicate data •Invalid data •Noise
  • 25.
     4. DataAnalysis The cleaned and prepared data is passed on to the analysis step. This step involves: • Selection of analytical techniques • Building models • Review the result •Where we select the machine learning techniques such as Classification, Regression, Cluster analysis, Association, etc. then build the model using prepared data, and evaluate the model.
  • 26.
     5. TrainModel Train our model to improve its performance for better outcome of the problem. Use datasets to train the model using various machine learning algorithms. Training a model is required to understand the various patterns, rules, and, features. 6. Test Model: trained on a given dataset 7. Deployment
  • 27.
    What is adataset?  A dataset is a collection of data in which data is arranged in order. A dataset can contain any data from a series of an array to a database table. Types of data in datasets • Numerical data:Such as house price, temperature, etc. • Categorical data:Such as Yes/No, True/False, Blue/green, etc. • Ordinal data:These data are similar to categorical data but can be measured on the basis of comparison.  Types of datasets  Image Datasets,  Text Datasets: Time Series Datasets: Tabular Datasets:
  • 28.
    Data Pre-processing: Pre-processing proceduresincorporate data cleaning to eliminate irregularities or blunders, standardization to scale data inside a particular reach, highlight scaling to guarantee highlights have comparative ranges, and taking care of missing qualities through ascription or evacuation. Datasets are divided into two parts: •Training dataset: •Test Dataset
  • 29.
    Popular sources forMachine Learning datasets  1. Kaggle Datasets UCI Machine Learning Repository Datasets via AWS Google's Dataset Search Engine Microsoft Datasets Scikit-learn dataset
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66.
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.