A N I N T R O D U C T I O N T O B A Y E S I A N B E L I E F
N E T W O R K S A N D N A Ï V E B A Y E S I A N
C L A S S I F I C A T I O N
A D N A N M A S O O D
S C I S . N O V A . E D U / ~ A D N A N
A D N A N @ N O V A . E D U
Belief Networks &
Bayesian Classification
Overview
 Probability and Uncertainty
 Probability Notation
 Bayesian Statistics
 Notation of Probability
 Axioms of Probability
 Probability Table
 Bayesian Belief Network
 Joint Probability Table
 Probability of Disjunctions
 Conditional Probability
 Conditional Independence
 Bayes' Rule
 Classification with Bayes rule
 Bayesian Classification
 Conclusion & Further Reading
Probability and Uncertainty
 Probability provide a way of summarizing the uncertainty.
 60% chance of rain today
 85% chance of alarm in case of a burglary
 Probability is calculated based upon past performance, or
degree of belief.
Bayesian Statistics
 Three approaches to Probability
 Axiomatic
 Probability by definition and properties
 Relative Frequency
 Repeated trials
 Degree of belief (subjective)
 Personal measure of uncertainty
 Examples
 The chance that a meteor strikes earth is 1%
 The probability of rain today is 30%
 The chance of getting an A on the exam is 50%
Notation of Probability

Notation of Probability

Axioms of Probability

Probability Table
 P(Weather= sunny)=P(sunny)=5/13
 P(Weather)={5/14, 4/14, 5/14}
 Calculate probabilities from data
sunny overcast rainy
5/14 4/14 5/14
Outlook
An expert built belief network using weather
dataset(Mitchell; Witten & Frank)
Bayesian inference can help answer questions like probability of
game play if
a. Outlook=sunny, Temperature=cool, Humidity=high,
Wind=strong
b. Outlook=overcast, Temperature=cool, Humidity=high,
Wind=strong
Bayesian Belief Network
 Bayesian belief network allows a subset of the
variables conditionally independent
 A graphical model of causal relationships
 Several cases of learning Bayesian belief networks
• Given both network structure and all the variables: easy
• Given network structure but only some variables
• When the network structure is not known in advance
Bayesian Belief Network
Family
History
Smoker
Lung Cancer Emphysema
Positive X Ray Dyspnea
LC 0.8 0.5 0.7 0.1
~LC 0.2 0.5 0.3 0.9
(FH, S) (FH, ~S)(~FH, S) (~FH, ~S)
Bayesian Belief Network
The conditional probability table
for the variable Lung Cancer
A Hypothesis for playing tennis
Joint Probability Table

2/14 2/14 0/14
2/14 1/14 3/14
1/14 1/14 2/14
Outlook
Sunny overcast rainy
Hot
mild
cool
Temperature
Example: Calculating Global Probabilistic Beliefs
 P(PlayTennis) = 9/14 = 0.64
 P(~PlayTennis) = 5/14 = 0.36
 P(Outlook=sunny|PlayTennis) = 2/9 = 0.22
 P(Outlook=sunny|~PlayTennis) = 3/5 = 0.60
 P(Temperature=cool|PlayTennis) = 3/9 = 0.33
 P(Temperature=cool|~PlayTennis) = 1/5 = .20
 P(Humidity=high|PlayTennis) = 3/9 = 0.33
 P(Humidity=high|~PlayTennis) = 4/5 = 0.80
 P(Wind=strong|PlayTennis) = 3/9 = 0.33
 P(Wind=strong|~PlayTennis) = 3/5 = 0.60
Probability of Disjunctions

Conditional Probability
 Probabilities discussed so far are called prior probabilities
or unconditional probabilities
 Probabilities depend only on the data, not on any other variable
 But what if you have some evidence or knowledge about the
situation? You know have a toothache. Now what is the
probability of having a cavity?
Conditional Probability

Conditional Probability

Conditional Independence

The independence hypothesis…
 … makes computation possible
 … yields optimal classifiers when satisfied
 … but is seldom satisfied in practice, as attributes
(variables) are often correlated.
 Attempts to overcome this limitation:
• Bayesian networks, that combine Bayesian reasoning with
causal relationships between attributes
• Decision trees, that reason on one attribute at the time,
considering most important attributes first
Conditional Independence

Bayes’ Rule
 Remember Conditional Probabilities:
 P(A|B)=P(A,B)/P(B)
 P(B)P(A|B)=P(A.B)
 P(B|A)=P(B,A)/P(A)
 P(A)P(B|A)=P(B,A)
 P(B,A)=P(A,B)
 P(B)P(A|B)=P(A)P(B|A)
Bayes’ Rule: P(A|B)=P(B|A)P(A)/P(B)
Bayes’ Rule

Classification with Bayes Rule

Naïve Bayes Classifier

Bayesian Classification: Why?
 Probabilistic learning: Computation of explicit
probabilities for hypothesis, among the most practical
approaches to certain types of learning problems
 Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is
correct. Prior knowledge can be combined with
observed data.
 Probabilistic prediction: Predict multiple
hypotheses, weighted by their probabilities
 Benchmark: Even if Bayesian methods are
computationally intractable, they can provide a
benchmark for other algorithms
Classification with Bayes Rule
Courtesy, Simafore - http://www.simafore.com/blog/bid/100934/Beware-of-2-facts-when-using-Naive-Bayes-classification-for-analytics
Issues with naïve Bayes
 Change in Classifier Data (on the fly, during classification)
 Conditional independence assumption is violated
 Consider the task of classifying whether or not a certain word is
corporation name
 E.g. “Google,” “Microsoft,”” “IBM,” and “ACME”
 Two useful features we might want to use are capitalized, and all-
capitals
 Native Bayes will assume that these two features are independent
given the class, but this clearly isn’t the case (things that are all-caps
must also be capitalized )!!
 However naïve Bayes seems to work well in practice even
when this assumption is violated
Naïve Bayes Classifier
Naive Bayesian Classifier
 Given a training set, we can compute the probabilities
Outlook P N
Sunny 2/9 3/5
Overcast 4/9 0
rain 3/9 2/5
Temperature
Hot 2/9 2/5
Mild 4/9 2/5
cool 3/9 1/5
Humidity P N
High 3/9 4/5
normal 6/9 1/5
Windy
true 3/9 3/5
false 6/9 2/5

Estimating a-posteriori probabilities

Naïve Bayesian Classification

P(p) = 9/14
P(n) = 5/14
outlook
P(sunny|p) = 2/9 P(sunny|n) = 3/5
P(overcast|p) =4/9 P(overcast|n) = 0
P(rain|p) = 3/9 P(rain|n) = 2/5
temperature
P(hot|p) = 2/9 P(hot|n) = 2/5
P(mild|p) = 4/9 P(mild|n) = 2/5
P(cool|p) = 3/9 P(cool|n) = 1/5
humidity
P(high|p) = 3/9 P(high|n) = 4/5
P(normal|p) = 6/9 P(normal|n) = 2/5
windy
P(true|p) = 3/9 P(true|n) = 3/5
P(false|p) = 6/9 P(false|n) = 2/5
Play Tennis example

Conclusion & Future Reading
 Probabilities
 Joint Probabilities
 Conditional Probabilities
 Independence, Conditional Independence
 Naïve Bayes Classifier
References
 J. Han, M. Kamber; Data Mining; Morgan Kaufmann Publishers: San
Francisco, CA.
 Bayesian Networks without Tears. | Charniak | AI Magazine
http://www.aaai.org/ojs/index.php/aimagazine/article/view/918
 Bayesian networks - Automated Reasoning Group – UCLA – Adnan
Darwiche

Belief Networks & Bayesian Classification

  • 1.
    A N IN T R O D U C T I O N T O B A Y E S I A N B E L I E F N E T W O R K S A N D N A Ï V E B A Y E S I A N C L A S S I F I C A T I O N A D N A N M A S O O D S C I S . N O V A . E D U / ~ A D N A N A D N A N @ N O V A . E D U Belief Networks & Bayesian Classification
  • 2.
    Overview  Probability andUncertainty  Probability Notation  Bayesian Statistics  Notation of Probability  Axioms of Probability  Probability Table  Bayesian Belief Network  Joint Probability Table  Probability of Disjunctions  Conditional Probability  Conditional Independence  Bayes' Rule  Classification with Bayes rule  Bayesian Classification  Conclusion & Further Reading
  • 3.
    Probability and Uncertainty Probability provide a way of summarizing the uncertainty.  60% chance of rain today  85% chance of alarm in case of a burglary  Probability is calculated based upon past performance, or degree of belief.
  • 4.
    Bayesian Statistics  Threeapproaches to Probability  Axiomatic  Probability by definition and properties  Relative Frequency  Repeated trials  Degree of belief (subjective)  Personal measure of uncertainty  Examples  The chance that a meteor strikes earth is 1%  The probability of rain today is 30%  The chance of getting an A on the exam is 50%
  • 5.
  • 6.
  • 7.
  • 8.
    Probability Table  P(Weather=sunny)=P(sunny)=5/13  P(Weather)={5/14, 4/14, 5/14}  Calculate probabilities from data sunny overcast rainy 5/14 4/14 5/14 Outlook
  • 9.
    An expert builtbelief network using weather dataset(Mitchell; Witten & Frank) Bayesian inference can help answer questions like probability of game play if a. Outlook=sunny, Temperature=cool, Humidity=high, Wind=strong b. Outlook=overcast, Temperature=cool, Humidity=high, Wind=strong
  • 10.
    Bayesian Belief Network Bayesian belief network allows a subset of the variables conditionally independent  A graphical model of causal relationships  Several cases of learning Bayesian belief networks • Given both network structure and all the variables: easy • Given network structure but only some variables • When the network structure is not known in advance
  • 11.
    Bayesian Belief Network Family History Smoker LungCancer Emphysema Positive X Ray Dyspnea LC 0.8 0.5 0.7 0.1 ~LC 0.2 0.5 0.3 0.9 (FH, S) (FH, ~S)(~FH, S) (~FH, ~S) Bayesian Belief Network The conditional probability table for the variable Lung Cancer
  • 12.
    A Hypothesis forplaying tennis
  • 13.
    Joint Probability Table  2/142/14 0/14 2/14 1/14 3/14 1/14 1/14 2/14 Outlook Sunny overcast rainy Hot mild cool Temperature
  • 14.
    Example: Calculating GlobalProbabilistic Beliefs  P(PlayTennis) = 9/14 = 0.64  P(~PlayTennis) = 5/14 = 0.36  P(Outlook=sunny|PlayTennis) = 2/9 = 0.22  P(Outlook=sunny|~PlayTennis) = 3/5 = 0.60  P(Temperature=cool|PlayTennis) = 3/9 = 0.33  P(Temperature=cool|~PlayTennis) = 1/5 = .20  P(Humidity=high|PlayTennis) = 3/9 = 0.33  P(Humidity=high|~PlayTennis) = 4/5 = 0.80  P(Wind=strong|PlayTennis) = 3/9 = 0.33  P(Wind=strong|~PlayTennis) = 3/5 = 0.60
  • 15.
  • 16.
    Conditional Probability  Probabilitiesdiscussed so far are called prior probabilities or unconditional probabilities  Probabilities depend only on the data, not on any other variable  But what if you have some evidence or knowledge about the situation? You know have a toothache. Now what is the probability of having a cavity?
  • 17.
  • 18.
  • 19.
  • 20.
    The independence hypothesis… … makes computation possible  … yields optimal classifiers when satisfied  … but is seldom satisfied in practice, as attributes (variables) are often correlated.  Attempts to overcome this limitation: • Bayesian networks, that combine Bayesian reasoning with causal relationships between attributes • Decision trees, that reason on one attribute at the time, considering most important attributes first
  • 21.
  • 22.
    Bayes’ Rule  RememberConditional Probabilities:  P(A|B)=P(A,B)/P(B)  P(B)P(A|B)=P(A.B)  P(B|A)=P(B,A)/P(A)  P(A)P(B|A)=P(B,A)  P(B,A)=P(A,B)  P(B)P(A|B)=P(A)P(B|A) Bayes’ Rule: P(A|B)=P(B|A)P(A)/P(B)
  • 23.
  • 24.
  • 25.
  • 26.
    Bayesian Classification: Why? Probabilistic learning: Computation of explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems  Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct. Prior knowledge can be combined with observed data.  Probabilistic prediction: Predict multiple hypotheses, weighted by their probabilities  Benchmark: Even if Bayesian methods are computationally intractable, they can provide a benchmark for other algorithms
  • 27.
    Classification with BayesRule Courtesy, Simafore - http://www.simafore.com/blog/bid/100934/Beware-of-2-facts-when-using-Naive-Bayes-classification-for-analytics
  • 28.
    Issues with naïveBayes  Change in Classifier Data (on the fly, during classification)  Conditional independence assumption is violated  Consider the task of classifying whether or not a certain word is corporation name  E.g. “Google,” “Microsoft,”” “IBM,” and “ACME”  Two useful features we might want to use are capitalized, and all- capitals  Native Bayes will assume that these two features are independent given the class, but this clearly isn’t the case (things that are all-caps must also be capitalized )!!  However naïve Bayes seems to work well in practice even when this assumption is violated
  • 29.
  • 30.
    Naive Bayesian Classifier Given a training set, we can compute the probabilities Outlook P N Sunny 2/9 3/5 Overcast 4/9 0 rain 3/9 2/5 Temperature Hot 2/9 2/5 Mild 4/9 2/5 cool 3/9 1/5 Humidity P N High 3/9 4/5 normal 6/9 1/5 Windy true 3/9 3/5 false 6/9 2/5
  • 31.
  • 32.
  • 33.
  • 34.
    P(p) = 9/14 P(n)= 5/14 outlook P(sunny|p) = 2/9 P(sunny|n) = 3/5 P(overcast|p) =4/9 P(overcast|n) = 0 P(rain|p) = 3/9 P(rain|n) = 2/5 temperature P(hot|p) = 2/9 P(hot|n) = 2/5 P(mild|p) = 4/9 P(mild|n) = 2/5 P(cool|p) = 3/9 P(cool|n) = 1/5 humidity P(high|p) = 3/9 P(high|n) = 4/5 P(normal|p) = 6/9 P(normal|n) = 2/5 windy P(true|p) = 3/9 P(true|n) = 3/5 P(false|p) = 6/9 P(false|n) = 2/5
  • 35.
  • 36.
    Conclusion & FutureReading  Probabilities  Joint Probabilities  Conditional Probabilities  Independence, Conditional Independence  Naïve Bayes Classifier
  • 37.
    References  J. Han,M. Kamber; Data Mining; Morgan Kaufmann Publishers: San Francisco, CA.  Bayesian Networks without Tears. | Charniak | AI Magazine http://www.aaai.org/ojs/index.php/aimagazine/article/view/918  Bayesian networks - Automated Reasoning Group – UCLA – Adnan Darwiche