Introduction to Probability
Last Updated: 20 March 2015
Slideshare: http://www.slideshare.net/marinasantini1/introduction-to-probability-theory
Mathematics for Language Technology
http://stp.lingfil.uu.se/~matsd/uv/uv15/mfst/
Marina Santini
santinim@stp.lingfil.uu.se
Department of Linguistics and Philology
Uppsala University, Uppsala, Sweden
Spring 2015
1
Acknowledgements
  Several slides borrowed from Prof Joakim Nivre.
  Practical Activity by Mats Dahllöf
  Required Reading:
  E&G (2013): Ch. 5 (pp. 105-110)
  Compendium (4): 9.1
  E&G (2013): Ch. 5.2-5.3 (self-study)
  Recommended Reading:
  Sections 1-3 in Goldsmith J. (2007) Probability for
Linguists. The University of Chicago. The Department of
Linguistics:
•  http://hum.uchicago.edu/~jagoldsm/Papers/probability.pdf
2
Outline
 The Notion of Probability
 Events
 Axioms and Theorems of Probability
 Addition Rule
3
Why study probability and statistics?
Developments in NLP have led to the
exploitation of language corpora to refine
and develop computational models of
language.
Many of these models exploit basic axioms,
theorems and approximations from the field
of probability theory and statistical
inference.
4
Deterministic vs Non-Deterministic
Generally speaking, a deterministic system is a system in which no
randomness is involved in the development of future states of the
system. That is, a deterministic model will always produce the
same behaviour from a given state.
In automata theory, a deterministic finite automaton (DFA) is a
finite state machine that accepts/rejects finite strings of symbols
and only produces a unique computation (or run) of the automaton
for each input string.
A nondeterministic finite automaton (NFA), or nondeterministic
finite state machine, needn't obey these restrictions.
5
Deterministic vs Non-Deterministic
6
Input string:
baaaa!
Input string:
abababbbab
Probability Theory
 Probability theory is the branch of
mathematics concerned with
probability, ie the analysis of random/
non-deterministic phenomena.
7
Statistics
 Statistics is the study of the collection,
analysis, interpretation, presentation,
and organization of data.
8
Probability Theory and Statistics
9
We use
probability
theory to build
models of
uncertainty and
we can use
statistics to
ground these
models in
empirical data.
Probability, Event and Sample Space
10
Ex 2: we have a sample space of sentences and we are interested in the
length of these sentences. A relevant event would be the set of all
sentences that contain exactly 8 words. And again we can describe this set
as the outcome for which the variable "numberOfWords" takes the value 8
Ex 1: we have a
sample space
consisting of
words. An event in
that sample space
can be the set of
NOUNS, ie all the
words that belong
to the category
NOUN. One way
of describing this
subset is to say
that the property
PartOfSpeech has
the value NOUN.= is an element of
Operators
11
Venn diagrams or Set diagrams
to represent logical relations
12
Axioms (Statements that are
always accepted as true)
13
Formula &
Calculations
14
Calculations: 6x6x6=216; 26x26x26=17576; 216/17576=0.01228949
if A is an event,
and x1 to xn are
its individual
outcomes, then
the probability of
A can be
computed by
summing the
probability of
each outcome
because they are
disjoint or
mutually
exclusive.
Read as:
sum from i=1 to n
or
sum over all the elements of the set
There are 26
ways of
choosing the
first letter,
26 ways of
choosing the
2nd letter
and 26 ways
of choosing
the third
letter, ie
26*26*26 =
263
But there
are only 6
ways of
choosing
the first
vowel… Since we assume that all strings are equally
possible, the probability is simply 1 over the total
number of strings.
In order to get the probability of the 3-
vowel string, we can simply add the
strings that contain exactly 3 vowels.
So 6 to the power of 3 plus 26 to the
power of 3 gives us approximately.012
In sum
 The probability of an event is the SUM
of the probabilities of each outcome
 An event is represented as a variable
15
Theorems
16
A theorem is a
statement that has
been proven on the
basis of previously
established
statements, such as
axioms
Addition Rule:
A method for finding the probability that either or both of two events occur
17
In other words:
If events A and B are mutually exclusive
(disjoint), then:
P(A or B) = P(A) + P(B)
Otherwise:
P(A or B) = P(A) + P(B) – P(A and B)
Say that A is the set of people who have glasses and B is the
set of people who are blond
We are interested in the set of people who are blond OR
have glasses. If we simply add the probabilities of the two
simple events, we count blond with glasses twice.Therefore,
in order to get the correct probability, we have to subtract
Think of the axiom about disjoint event as a
special case where the intersection is
empty. Therefore it is not added in the first
place, and it has not to be subracted.
Quiz 1: only one answer is correct
18
Quiz 1: Solution
1. 0.01 - incorrect. The probability of an event and
its complement must sum to 1.
2. 0.99 - correct. The complement of A has
probability 1 - P(A).
3. Impossible to tell - incorrect. The complement of
A must have probability 1 - P(A).
19
Quiz 2: more than 1 answers can be correct
20
Quiz 2: Solutions
1.  P(A or B) < P(A and B) - incorrect. Since the union
includes the intersection, it can never have lower
probability.
2.  2. P(A or B) = P(A and B) - correct. This is possible
as a limiting case, for example, when A = B.
3.  3. P(A or B) > P(A and B) - correct. This holds as
soon as there is some outcome with a positive
probability in A or B that is not in the intersection.
21
Practical Activity
22
We have a regular die. We cast the die twice and we get a two and a four.
Therefore, A = {2,4}.
Calculate:
1.  The probability of the event A = {2,4}
2.  The probability that the first number is a 6
3.  The probability that the second number is a 5 or a 6
4.  The probability that the first and the second number are the same
5.  The probability that the first number is an odd number
6.  The probability that the first and the second number are both odd numbers
Practical Activity: Solutions
1.  The probability of the event A = {2,4} [1/36 = 0.05]
2.  The probability that the first number is a 6 [1/6 = 0.16]
3.  The probability that the second number is a 5 or a 6 [1/3 = 0.33]
4.  The probability that the first and the second number are the same [1/6 =
0.16]
5.  The probability that the first number is an odd number [1/5 = 0.5]
6.  The probability that the first and the second number are both odd numbers
[1/4 = 0.25]
23
The End
24

Mathematics for Language Technology: Introduction to Probability Theory

  • 1.
    Introduction to Probability LastUpdated: 20 March 2015 Slideshare: http://www.slideshare.net/marinasantini1/introduction-to-probability-theory Mathematics for Language Technology http://stp.lingfil.uu.se/~matsd/uv/uv15/mfst/ Marina Santini [email protected] Department of Linguistics and Philology Uppsala University, Uppsala, Sweden Spring 2015 1
  • 2.
    Acknowledgements   Several slidesborrowed from Prof Joakim Nivre.   Practical Activity by Mats Dahllöf   Required Reading:   E&G (2013): Ch. 5 (pp. 105-110)   Compendium (4): 9.1   E&G (2013): Ch. 5.2-5.3 (self-study)   Recommended Reading:   Sections 1-3 in Goldsmith J. (2007) Probability for Linguists. The University of Chicago. The Department of Linguistics: •  http://hum.uchicago.edu/~jagoldsm/Papers/probability.pdf 2
  • 3.
    Outline  The Notion ofProbability  Events  Axioms and Theorems of Probability  Addition Rule 3
  • 4.
    Why study probabilityand statistics? Developments in NLP have led to the exploitation of language corpora to refine and develop computational models of language. Many of these models exploit basic axioms, theorems and approximations from the field of probability theory and statistical inference. 4
  • 5.
    Deterministic vs Non-Deterministic Generallyspeaking, a deterministic system is a system in which no randomness is involved in the development of future states of the system. That is, a deterministic model will always produce the same behaviour from a given state. In automata theory, a deterministic finite automaton (DFA) is a finite state machine that accepts/rejects finite strings of symbols and only produces a unique computation (or run) of the automaton for each input string. A nondeterministic finite automaton (NFA), or nondeterministic finite state machine, needn't obey these restrictions. 5
  • 6.
    Deterministic vs Non-Deterministic 6 Inputstring: baaaa! Input string: abababbbab
  • 7.
    Probability Theory  Probability theoryis the branch of mathematics concerned with probability, ie the analysis of random/ non-deterministic phenomena. 7
  • 8.
    Statistics  Statistics is thestudy of the collection, analysis, interpretation, presentation, and organization of data. 8
  • 9.
    Probability Theory andStatistics 9 We use probability theory to build models of uncertainty and we can use statistics to ground these models in empirical data.
  • 10.
    Probability, Event andSample Space 10 Ex 2: we have a sample space of sentences and we are interested in the length of these sentences. A relevant event would be the set of all sentences that contain exactly 8 words. And again we can describe this set as the outcome for which the variable "numberOfWords" takes the value 8 Ex 1: we have a sample space consisting of words. An event in that sample space can be the set of NOUNS, ie all the words that belong to the category NOUN. One way of describing this subset is to say that the property PartOfSpeech has the value NOUN.= is an element of
  • 11.
  • 12.
    Venn diagrams orSet diagrams to represent logical relations 12
  • 13.
    Axioms (Statements thatare always accepted as true) 13
  • 14.
    Formula & Calculations 14 Calculations: 6x6x6=216;26x26x26=17576; 216/17576=0.01228949 if A is an event, and x1 to xn are its individual outcomes, then the probability of A can be computed by summing the probability of each outcome because they are disjoint or mutually exclusive. Read as: sum from i=1 to n or sum over all the elements of the set There are 26 ways of choosing the first letter, 26 ways of choosing the 2nd letter and 26 ways of choosing the third letter, ie 26*26*26 = 263 But there are only 6 ways of choosing the first vowel… Since we assume that all strings are equally possible, the probability is simply 1 over the total number of strings. In order to get the probability of the 3- vowel string, we can simply add the strings that contain exactly 3 vowels. So 6 to the power of 3 plus 26 to the power of 3 gives us approximately.012
  • 15.
    In sum  The probabilityof an event is the SUM of the probabilities of each outcome  An event is represented as a variable 15
  • 16.
    Theorems 16 A theorem isa statement that has been proven on the basis of previously established statements, such as axioms
  • 17.
    Addition Rule: A methodfor finding the probability that either or both of two events occur 17 In other words: If events A and B are mutually exclusive (disjoint), then: P(A or B) = P(A) + P(B) Otherwise: P(A or B) = P(A) + P(B) – P(A and B) Say that A is the set of people who have glasses and B is the set of people who are blond We are interested in the set of people who are blond OR have glasses. If we simply add the probabilities of the two simple events, we count blond with glasses twice.Therefore, in order to get the correct probability, we have to subtract Think of the axiom about disjoint event as a special case where the intersection is empty. Therefore it is not added in the first place, and it has not to be subracted.
  • 18.
    Quiz 1: onlyone answer is correct 18
  • 19.
    Quiz 1: Solution 1.0.01 - incorrect. The probability of an event and its complement must sum to 1. 2. 0.99 - correct. The complement of A has probability 1 - P(A). 3. Impossible to tell - incorrect. The complement of A must have probability 1 - P(A). 19
  • 20.
    Quiz 2: morethan 1 answers can be correct 20
  • 21.
    Quiz 2: Solutions 1. P(A or B) < P(A and B) - incorrect. Since the union includes the intersection, it can never have lower probability. 2.  2. P(A or B) = P(A and B) - correct. This is possible as a limiting case, for example, when A = B. 3.  3. P(A or B) > P(A and B) - correct. This holds as soon as there is some outcome with a positive probability in A or B that is not in the intersection. 21
  • 22.
    Practical Activity 22 We havea regular die. We cast the die twice and we get a two and a four. Therefore, A = {2,4}. Calculate: 1.  The probability of the event A = {2,4} 2.  The probability that the first number is a 6 3.  The probability that the second number is a 5 or a 6 4.  The probability that the first and the second number are the same 5.  The probability that the first number is an odd number 6.  The probability that the first and the second number are both odd numbers
  • 23.
    Practical Activity: Solutions 1. The probability of the event A = {2,4} [1/36 = 0.05] 2.  The probability that the first number is a 6 [1/6 = 0.16] 3.  The probability that the second number is a 5 or a 6 [1/3 = 0.33] 4.  The probability that the first and the second number are the same [1/6 = 0.16] 5.  The probability that the first number is an odd number [1/5 = 0.5] 6.  The probability that the first and the second number are both odd numbers [1/4 = 0.25] 23
  • 24.