Randomized Algorithms
Dr. C.V. Suresh Babu
A short list of categories


Algorithm types we will consider
include:











2

Simple recursive algorithms
Backtracking algorithms
Divide and conquer algorithms
Dynamic programming algorithms
Greedy algorithms
Branch and bound algorithms
Brute force algorithms
Randomized algorithms
Also known as Monte Carlo algorithms or stochastic methods
Why use randomness?




Avoid worst-case behavior:
randomness can (probabilistically)
guarantee average case behavior
Efficient approximate solutions to
intractable problems
Randomized algorithms




A randomized algorithm is just one that
depends on random numbers for its operation
These are randomized algorithms:






These are related topics:



4

Using random numbers to help find a solution to a
problem
Using random numbers to improve a solution to a
problem
Getting or generating “random” numbers
Generating random data for testing (or other)
purposes
Randomized algorithms




In a randomized algorithm (probabilistic
algorithm), we make some random choices.
2 types of randomized algorithms:
1.

2.

For an optimization problem, a randomized
algorithm gives an optimal solution. The average
case time-complexity is more important than the
worst case time-complexity.
For a decision problem, a randomized algorithm
may make mistakes. The probability of producing
wrong solutions is very small.
Quick Sort
Select: pick an arbitrary element x
in S to be the pivot.
Partition: rearrange elements so
that elements with value less than x
go to List L to the left of x and
elements with value greater than x
go to the List R to the right of x.
Recursion: recursively sort the
lists L and R.

6
Worst Case Partitioning of
Quick Sort
Best Case Partitioning of Quick
Sort
Average Case of Quick Sort
Randomized Quick Sort
Randomized-Partition( A, p, r )
1. i ← Random(p, r)
2. exchange A[r] ↔ A[i]
3. return Partition(A, p, r)

Randomized-Quicksort( A, p, r )

1. if p < r
2. then q ← Randomized-Partition (A, p, r)
3.
Randomized-Quicksort (A, p , q-1)
4.
Randomized-Quicksort (A, q+1, r)
Randomized Quick Sort








Exchange A[r] with an element chosen at random from A[p…r] in
Partition.
The pivot element is equally likely to be any of input elements.

For any given input, the behavior of Randomized Quick Sort is
determined not only by the input but also by the random choices of
the pivot.
We add randomization to Quick Sort to obtain for any input the
expected performance of the algorithm to be good.
Deterministic Algorithms
INPUT

ALGORITHM

OUTPUT

Goal: Prove for all input instances the algorithm solves the
problem correctly and the number of steps is bounded by a
polynomial in the size of the input.
Randomized Algorithms
INPUT

ALGORITHM

OUTPUT

RANDOM NUMBERS





In addition to input, algorithm takes a source of random numbers
and makes random choices during execution;
Behavior can vary even on a fixed input;
Las Vegas Randomized
Algorithms
INPUT

ALGORITHM

OUTPUT

RANDOM NUMBERS
Goal: Prove that for all input instances the algorithm solves the
problem correctly and the expected number of steps is bounded by
a polynomial in the input size.

Note: The expectation is over the random choices made by the
algorithm.
Probabilistic Analysis of
Algorithms
RANDOM
INPUT

ALGORITHM

OUTPUT
DISTRIBUTION

Input is assumed to be from a probability distribution.

Goal: Show that for all inputs the algorithm works correctly and
for most inputs the number of steps is bounded by a polynomial in
the size of the input.
The closest pair problem




This problem can be solved by the divide-andconquer approach in O(nlogn) time.
The randomized algorithm:
partition the points into several clusters:


X4

X5


 3
X
 1
X

 2
X





 6
X
 7
X

We only calculate distances among points within the
same cluster.
Similar to the divide-and-conquer strategy. There is a
dividing process, but no merging process.
A randomized algorithm for
closest pair finding


Input: A set S consisting of n elements x1, x2,…,
xn, where S⊆ R2.

Output: The closest pair in S.
xi , xi ,..., xi
Step 1: Randomly choose a set S1={
}
where m=n2/3. Find the closest pair of S1 and let
the distance between this pair of points be
denoted as δ .
Step 2: Construct a set of squares T with meshsize δ.


1

2

m
Step 3: Construct four sets of squares T1, T2, T3
and T4 derived from T by doubling the meshsize to 2δ .
Step 4: For each Ti, find the induced
decomposition S=S1(i) ∪S2(i) ∪ … ∪ Sj(i), 1≤i ≤4,
where Sj(i) is a non-empty intersection of S
with a square of Ti.
Step 5: For each xp, xq∈Sj(i), compute d(xp, xq). Let
xa and xb be the pair of points with the shortest
distance among these pairs. Return xa and xb
as the closest pair.
An example


27 points.
S1 = {x1, x2, …, x9},
δ = d(x1, x2)

6δ
5δ
X4

X3

4δ

X5

X2

3δ

X9

X1

2δ

X7
X6

δ
X8

δ

2δ

3δ

4δ

5δ

6δ
6δ
5δ

X4

X3

4δ

T1

X5

3δ

X2
X1

X9

2δ
X6

X7

δ

X8

2δ

δ

3δ

4δ

5δ

6δ

6δ
5δ

X3

X4

4δ

T2

X5

3δ

X2
X1

2δ

X9

X6

X7

δ

X8

δ

2δ

3δ

4δ

5δ

6δ
6δ
5δ
X4

X3

4δ

T3

X5

3δ

X2
X1

2δ

X9

X6

X7
X8

δ
δ

2δ

4δ

3δ

5δ

6δ

6δ
5δ
X4

T4

X3

4δ

X5

X2

3δ
X1

2δ

X9
X7

X6

δ

X8

δ

2δ

3δ

4δ

5δ

6δ
Time complexity



Time complexity: O(n) in average
step 1: O(n)
method : Recursively apply the algorithm once,
2

i.e. randomly choose ( n )
3

2

3

=n

2

4

9

points from the n 3 points, then solve
it with a straightforward method for the
n



4

9

8

points : O( n 9 )

step 2 ~ Step 4: O(n)
step 5: O(n)with probability 1-2e-cn

1
6
Analysis of Step 5


How many distance computations in step 5?
δ : mesh-size in step 5
T: partition in step 5
N(T): # of distance computations in partition T
Fact: There exists a particular partition T0,
whose mesh-size is δ0 such that
(1) N(T0) ≤ c0n.
(2) the probability that

δ ≤ 2δ 0 is 1 - 2e

1
− cn 6

.


Construct T1, T2, …, T16

δ

4δ0

mesh-size: 4δ0




The probability that each square in T falls into
at least1 one square of Ti , i = 1, 2, …, 16 is
− cn 6
1 - 2e .
1
The probability that
16
− cn 6
N (T ) ≤ ∑ N (Ti ) is 1 - 2e .

i =1


Let the square in T0 with the largest number of
elements among the 16 squares have k
elements.
k ( k −1)
16k (16k −1)
2
= O ( k ),
= O(k 2 )
2
2



N(T0) ≤ c0n => N(Ti ) ≤ cin
16

N (T ) ≤ ∑ N (Ti ) = O( n ) with

4δ0
δ0

Ti:

i =1

probability 1 - 2e

1
− cn 6

.

k
A randomized algorithm to test
whether a number is prime.




This problem is very difficult and no
polynomial algorithm has been found to
solve this problem
Traditional method:
use 2,3,… N to test whether N is prime.
input size of N : B=log2N (binary
representation)
N =2B/2, exponential function of B
Thus N can not be viewed as a polynomial
function of the input size.
Randomized prime number
testing algorithm
Input: A positive number N, and a parameter m.

Output: Whether N is a prime or not, with probability
of being correct at least 1-ε = 1-2-m.
Step 1: Randomly choose m numbers b1, b2, …, bm, 1≤ b1,
b2, …, bm <N, where m≥log2(1/ε).


Step 2: For each bi, test whether W(bi) holds where
W(bi) is defined as follows:
N
1
(1) biN-1 ≠ 1mod N or −
2j

(2) ∃ j such taht
= k is an integer and the greatest
common divisor of (bi)k-1 and N is not 1 or N.
If any W(bi) holds, then return N as a composite
number, otherwise, return N as a prime.
Examples for randomized prime
number testing


Example 1: N = 12
Randomly choose 2, 3, 7
212-1 = 2048 ≠ 1 mod 12
⇒ 12 is a composite number.


Example 2: N = 11
Randomly choose 2, 5, 7

(1) 211-1=1024≡1 mod 11
j=1, (N-1)/2j==5
GCD(25-1, 11) = 1
W(2) does not hold .
(2) 511-1=9765625≡1 mod 11
GCD(55-1, 11) = 11
W(5) does not hold .
(3) 711-1=282475249≡1 mod 11
GCD(75-1, 11) = 1
W(7) does not hold .



Thus, 11 is a prime number with the
probability of correctness being at least
1-2-3= 7/8.
Theorem for number theory


Theorem:

If W(b) holds for any 1≤ b<N, then N is a
composite number .

If N is composite, then
(N-1)/2 ≤ | { b | 1 ≤ b<N, W(b) holds } |.
Pattern matching
Pattern string : X length : n
Text string : Y length : m, m ≥ n
To find the first occurrence of X as a
consecutive substring of Y .
Assume that X and Y are binary strings.
 e.g. X = 01001 , Y = 1010100111


X





Straightforward method : O(mn)
Knuth-Morris-Pratt’s algorithm : O(m)
The randomized algorithm : O(mk) with a
mistake of small probability. (k:# of testings)
Binary representation


X = x1 x2…xn∈{0,1}
Y = y1 y2…ym∈{0,1}
Let Y(i)=yi yi+1….yi+n-1



A match occurs if X=Y(i) for some i .
Binary values of X and Y(i):
B(X) = x12n-1 + x22n-2 + … + xn
B(Y(i)) = yi2n-1+yi+12n-2+…+yi+n-1 ,
1≤ i ≤ m-n+1
Fingerprints of binary strings





Let p be a randomly chosen prime number in
{1,2,…,nt2}, where t = m - n + 1.
Notation: (xi)p = xi mod p
Fingerprints of X and Y(i):
Bp(x) = ( ( ( (x12)p+x2)p2)p+x3)p2…

Bp(Y(i)) = ( ( ( (yi2)p+yi+1)p2+yi+2)p2…

⇒ Bp(Y(i+1))= ( (Bp(Yi)-2n-1yi) 2+Yi+n)p
=( ( (Bp(Yi)-( (2n-1)pyi)p )p 2)p +yi+n)p


If X=Y(i), then Bp(X) = Bp (Y(i)), but not vice
versa.
Examples for using fingerprints


Example: X = 10110 , Y = 110110
n=5,m=6,t=m-n+1=2
suppose P=3.
Bp(X) = (22)3 = 1
Bp(Y(1)) = (27)3 = 0
⇒ X≠Y(1)
Bp(Y(2)) = ( (0-24)3 2+0)3 = 1
⇒ X = Y(2)


e.g. X = 10110 , Y = 10011 , P = 3
Bp(X) = (22)3 = 1
Bp(Y(1)) = (19)3 = 1

⇒ X= Y(1) WRONG!



If Bp(X) ≠ Bp(Y(i)), then X ≠ Y(i) .
If Bp(X) = Bp(Y(i)), we may do a bit by bit
checking or compute k different fingerprints
by using k different prime numbers in {1,2,…
nt2} .
A randomized algorithm for
pattern matching




Input: A pattern X = x1 x2…xn, a text Y = y1
y2…ym and a parameter k.
Output:

(1) No, there is no consecutive substring in Y which
matches with X.
(2) Yes, Y(i) = yi yi+1.…yi+n-1 matches with X which is the
first occurrence.

If the answer is “No” , there is no mistake.
If the answer is “Yes” , there is some
probability that a mistake is made.
Step 1: Randomly choose k prime numbers p1, p2, …, pk
from {1,2,…,nt2}, where t = m - n + 1.
Step 2: i = 1.
Step 3: j = 1.
Step 4: If B(X)Pj ≠ (B(Yi))pj, then go to step 5.
If j = k, return Y(i) as the answer.
j = j + 1.
Go to step 4.
Step5: If i = t, return “No, there is no consecutive
substring in Y which matches with X.”
i = i + 1.
Go to Step 3.
An example for the algorithm


X = 10110 , Y = 100111 , P1 = 3 , P2 = 5
B3(X) = (22)3 = 1
B5(X) = (22)5 = 2

B3(Y(2)) = (7)3 = 1
B5(y(2)) = (7)5 = 2

Choose one more prime number, P3 = 7
B7(x) = (22)7 = 1

B7(Y(2)) = (7)7 = 0
⇒X ≠ Y(2)
How often does a mistake
occur?






If a mistake occurs in X and Y(i), then
B(X) - B(Y(i)) ≠ 0, and
pj divides | B(X) - B(Y(i)) | for all pj’s.
Let Q =

π

i where p j divides B ( X ) − B ( Y ( i ))

B( X ) − B (Y (i ))

Q<2n(m-n+1)
reason: B(x)<2n, and at most (m-n+1) B(Y(i))’s
2n 2n…2n
m-n-1
Theorem for number theory








Theorem: If u≥29 and a<2u, then a has fewer than π(u)
diffferent prime number divisors where π(u) is the
number of prime numbers smaller than u.
Assume nt ≥ 29 .
Q < 2n(m-n+1) = 2nt
⇒ Q has fewer than π(nt) different prime number
divisors.
If pj is a prime number selected from {1, 2, …, M},
π nt )
(
the probability that pj divides Q is less than π M ).
(
If k different prime numbers are selected from {1, 2, …
nt2} , the probability that a mistake occurs is less than
k
 π (nt ) 
 provided nt ≥ 29.

2 

 π (nt ) 
An example for mistake probability
k

 π ( nt ) 

2 
 π (nt ) 



How do we estimate 




u
u
Theorem: For all u ≥ 17,  ≤ π (u ) ≤ 1.25506
ln u
ln u





π (nt )
nt ln(nt 2 )
≤ 1.25506 ⋅
⋅
2
π (nt )
ln nt nt 2
1.25506
ln(t )
=
(1 +
)
t
ln(nt )
Example: n = 10 , m = 100 , t = m - n + 1 = 91
π (nt )
≤ 0.0229
2
π ( nt )
Let k=4

(0.0229)4≈2.75×10-7 // very small
Interactive proofs: method I






Two persons: A : a spy
B : the boss of A
When A wants to talk to B , how does B know
that A is the real A, not an enemy imitating A ?
Method I : a trivial method
B may ask the name of A’s mother (a private
secret)
Disadvantage:
The enemy can collect the information, and
imitate A the next time.
Interactive proofs: method II






Method II:
B may send a Boolean formula to A and ask A to
determine its satisfiability. (an NP-complete problem).
It is assumed that A is a smart person and knows
how to solve this NP-complete problem.
B can check the answer and know whether A is the
real A or not.
Disadvantage:
The enemy can study methods of mechanical
theorem proving and sooner or later he can imitate A.
In Methods I and II, A and B have revealed too much.
A randomized algorithm for
interactive proofs




Method III:
B can ask A to solve a quadratic nonresidue
problem in which the data can be sent back and
forth without revealing much information.
Definition:
GCD(x, y) = 1, y is a quadratic residue mod x if
z2 ≡ y mod x for some z, 0 < z < x, GCD(x, z) = 1,
and y is a quadratic nonresidue mod x if
otherwise.
(See the example on the next page.)
An example for quadratic
residue/nonresidue






Let
QR = {(x, y) | y is a quadratic residue mod x}
QNR = {(x, y) | y is a quadratic nonresidue mod x}
Try to test x = 9, y = 7:
12 ≡ 1 mod 9
22 ≡ 4 mod 9
32 ≡ 0 mod 9
42 ≡ 7 mod 9
52 ≡ 7 mod 9
62 ≡ 0 mod 9
72 ≡ 4 mod 9
82 ≡ 1 mod 9
We have (9,1), (9,4), (9,7) ∈ QR
but (9,5), (9,8) ∈ QNR
Detailed method for
interactive proofs
1)
2)

A and B know x and keep x confidential .
B knows y.
Action of B:

Step 1: Randomly choose m bits: b1, b2, …, bm, where
m is the length of the binary representation of x.
Step 2: Find z1, z2, …, zm s.t. GCD(zi , x)=1 for all i .
Step 3:Compute w1, w2, …, wm:
wi ←zi2 mod x if bi=0
//(x, wi) ∈ QR
wi ← (zi2y) mod x if bi=1 //(x, wi) ∈ NQR
Step 4: Send w1, w2, …, wm to A.
3)

Action of A:

Step 1: Receive w1, w2, …, wm from B.
Step 2: Compute c1, c2, …, cm:
ci ←0 if (x, wi) ∈ QR

ci ←1 if (x, wi) ∈ QNR
Send c1, c2, …, cm to B.
4)

Action of B:

Step 1: Receive c1, c2, …, cm from A.
Step 2: If (x, y) ∈ QNR and bi = ci for all i, then A is
the real A (with probability 1-2-m).

Randomized algorithms ver 1.0

  • 1.
  • 2.
    A short listof categories  Algorithm types we will consider include:          2 Simple recursive algorithms Backtracking algorithms Divide and conquer algorithms Dynamic programming algorithms Greedy algorithms Branch and bound algorithms Brute force algorithms Randomized algorithms Also known as Monte Carlo algorithms or stochastic methods
  • 3.
    Why use randomness?   Avoidworst-case behavior: randomness can (probabilistically) guarantee average case behavior Efficient approximate solutions to intractable problems
  • 4.
    Randomized algorithms   A randomizedalgorithm is just one that depends on random numbers for its operation These are randomized algorithms:    These are related topics:   4 Using random numbers to help find a solution to a problem Using random numbers to improve a solution to a problem Getting or generating “random” numbers Generating random data for testing (or other) purposes
  • 5.
    Randomized algorithms   In arandomized algorithm (probabilistic algorithm), we make some random choices. 2 types of randomized algorithms: 1. 2. For an optimization problem, a randomized algorithm gives an optimal solution. The average case time-complexity is more important than the worst case time-complexity. For a decision problem, a randomized algorithm may make mistakes. The probability of producing wrong solutions is very small.
  • 6.
    Quick Sort Select: pickan arbitrary element x in S to be the pivot. Partition: rearrange elements so that elements with value less than x go to List L to the left of x and elements with value greater than x go to the List R to the right of x. Recursion: recursively sort the lists L and R. 6
  • 7.
  • 8.
  • 9.
    Average Case ofQuick Sort
  • 10.
    Randomized Quick Sort Randomized-Partition(A, p, r ) 1. i ← Random(p, r) 2. exchange A[r] ↔ A[i] 3. return Partition(A, p, r) Randomized-Quicksort( A, p, r ) 1. if p < r 2. then q ← Randomized-Partition (A, p, r) 3. Randomized-Quicksort (A, p , q-1) 4. Randomized-Quicksort (A, q+1, r)
  • 11.
    Randomized Quick Sort     ExchangeA[r] with an element chosen at random from A[p…r] in Partition. The pivot element is equally likely to be any of input elements. For any given input, the behavior of Randomized Quick Sort is determined not only by the input but also by the random choices of the pivot. We add randomization to Quick Sort to obtain for any input the expected performance of the algorithm to be good.
  • 12.
    Deterministic Algorithms INPUT ALGORITHM OUTPUT Goal: Provefor all input instances the algorithm solves the problem correctly and the number of steps is bounded by a polynomial in the size of the input.
  • 13.
    Randomized Algorithms INPUT ALGORITHM OUTPUT RANDOM NUMBERS   Inaddition to input, algorithm takes a source of random numbers and makes random choices during execution; Behavior can vary even on a fixed input;
  • 14.
    Las Vegas Randomized Algorithms INPUT ALGORITHM OUTPUT RANDOMNUMBERS Goal: Prove that for all input instances the algorithm solves the problem correctly and the expected number of steps is bounded by a polynomial in the input size. Note: The expectation is over the random choices made by the algorithm.
  • 15.
    Probabilistic Analysis of Algorithms RANDOM INPUT ALGORITHM OUTPUT DISTRIBUTION Inputis assumed to be from a probability distribution. Goal: Show that for all inputs the algorithm works correctly and for most inputs the number of steps is bounded by a polynomial in the size of the input.
  • 16.
    The closest pairproblem   This problem can be solved by the divide-andconquer approach in O(nlogn) time. The randomized algorithm: partition the points into several clusters:  X4 X5   3 X  1 X  2 X    6 X  7 X We only calculate distances among points within the same cluster. Similar to the divide-and-conquer strategy. There is a dividing process, but no merging process.
  • 17.
    A randomized algorithmfor closest pair finding  Input: A set S consisting of n elements x1, x2,…, xn, where S⊆ R2. Output: The closest pair in S. xi , xi ,..., xi Step 1: Randomly choose a set S1={ } where m=n2/3. Find the closest pair of S1 and let the distance between this pair of points be denoted as δ . Step 2: Construct a set of squares T with meshsize δ.  1 2 m
  • 18.
    Step 3: Constructfour sets of squares T1, T2, T3 and T4 derived from T by doubling the meshsize to 2δ . Step 4: For each Ti, find the induced decomposition S=S1(i) ∪S2(i) ∪ … ∪ Sj(i), 1≤i ≤4, where Sj(i) is a non-empty intersection of S with a square of Ti. Step 5: For each xp, xq∈Sj(i), compute d(xp, xq). Let xa and xb be the pair of points with the shortest distance among these pairs. Return xa and xb as the closest pair.
  • 19.
    An example  27 points. S1= {x1, x2, …, x9}, δ = d(x1, x2) 6δ 5δ X4 X3 4δ X5 X2 3δ X9 X1 2δ X7 X6 δ X8 δ 2δ 3δ 4δ 5δ 6δ
  • 20.
  • 21.
  • 22.
    Time complexity   Time complexity:O(n) in average step 1: O(n) method : Recursively apply the algorithm once, 2 i.e. randomly choose ( n ) 3 2 3 =n 2 4 9 points from the n 3 points, then solve it with a straightforward method for the n   4 9 8 points : O( n 9 ) step 2 ~ Step 4: O(n) step 5: O(n)with probability 1-2e-cn 1 6
  • 23.
    Analysis of Step5  How many distance computations in step 5? δ : mesh-size in step 5 T: partition in step 5 N(T): # of distance computations in partition T Fact: There exists a particular partition T0, whose mesh-size is δ0 such that (1) N(T0) ≤ c0n. (2) the probability that δ ≤ 2δ 0 is 1 - 2e 1 − cn 6 .
  • 24.
     Construct T1, T2,…, T16 δ 4δ0 mesh-size: 4δ0   The probability that each square in T falls into at least1 one square of Ti , i = 1, 2, …, 16 is − cn 6 1 - 2e . 1 The probability that 16 − cn 6 N (T ) ≤ ∑ N (Ti ) is 1 - 2e . i =1
  • 25.
     Let the squarein T0 with the largest number of elements among the 16 squares have k elements. k ( k −1) 16k (16k −1) 2 = O ( k ), = O(k 2 ) 2 2  N(T0) ≤ c0n => N(Ti ) ≤ cin 16 N (T ) ≤ ∑ N (Ti ) = O( n ) with 4δ0 δ0 Ti: i =1 probability 1 - 2e 1 − cn 6 . k
  • 26.
    A randomized algorithmto test whether a number is prime.   This problem is very difficult and no polynomial algorithm has been found to solve this problem Traditional method: use 2,3,… N to test whether N is prime. input size of N : B=log2N (binary representation) N =2B/2, exponential function of B Thus N can not be viewed as a polynomial function of the input size.
  • 27.
    Randomized prime number testingalgorithm Input: A positive number N, and a parameter m.  Output: Whether N is a prime or not, with probability of being correct at least 1-ε = 1-2-m. Step 1: Randomly choose m numbers b1, b2, …, bm, 1≤ b1, b2, …, bm <N, where m≥log2(1/ε).  Step 2: For each bi, test whether W(bi) holds where W(bi) is defined as follows: N 1 (1) biN-1 ≠ 1mod N or − 2j (2) ∃ j such taht = k is an integer and the greatest common divisor of (bi)k-1 and N is not 1 or N. If any W(bi) holds, then return N as a composite number, otherwise, return N as a prime.
  • 28.
    Examples for randomizedprime number testing  Example 1: N = 12 Randomly choose 2, 3, 7 212-1 = 2048 ≠ 1 mod 12 ⇒ 12 is a composite number.
  • 29.
     Example 2: N= 11 Randomly choose 2, 5, 7 (1) 211-1=1024≡1 mod 11 j=1, (N-1)/2j==5 GCD(25-1, 11) = 1 W(2) does not hold . (2) 511-1=9765625≡1 mod 11 GCD(55-1, 11) = 11 W(5) does not hold . (3) 711-1=282475249≡1 mod 11 GCD(75-1, 11) = 1 W(7) does not hold .  Thus, 11 is a prime number with the probability of correctness being at least 1-2-3= 7/8.
  • 30.
    Theorem for numbertheory  Theorem:  If W(b) holds for any 1≤ b<N, then N is a composite number .  If N is composite, then (N-1)/2 ≤ | { b | 1 ≤ b<N, W(b) holds } |.
  • 31.
    Pattern matching Pattern string: X length : n Text string : Y length : m, m ≥ n To find the first occurrence of X as a consecutive substring of Y . Assume that X and Y are binary strings.  e.g. X = 01001 , Y = 1010100111  X    Straightforward method : O(mn) Knuth-Morris-Pratt’s algorithm : O(m) The randomized algorithm : O(mk) with a mistake of small probability. (k:# of testings)
  • 32.
    Binary representation  X =x1 x2…xn∈{0,1} Y = y1 y2…ym∈{0,1} Let Y(i)=yi yi+1….yi+n-1  A match occurs if X=Y(i) for some i . Binary values of X and Y(i): B(X) = x12n-1 + x22n-2 + … + xn B(Y(i)) = yi2n-1+yi+12n-2+…+yi+n-1 , 1≤ i ≤ m-n+1
  • 33.
    Fingerprints of binarystrings    Let p be a randomly chosen prime number in {1,2,…,nt2}, where t = m - n + 1. Notation: (xi)p = xi mod p Fingerprints of X and Y(i): Bp(x) = ( ( ( (x12)p+x2)p2)p+x3)p2… Bp(Y(i)) = ( ( ( (yi2)p+yi+1)p2+yi+2)p2… ⇒ Bp(Y(i+1))= ( (Bp(Yi)-2n-1yi) 2+Yi+n)p =( ( (Bp(Yi)-( (2n-1)pyi)p )p 2)p +yi+n)p  If X=Y(i), then Bp(X) = Bp (Y(i)), but not vice versa.
  • 34.
    Examples for usingfingerprints  Example: X = 10110 , Y = 110110 n=5,m=6,t=m-n+1=2 suppose P=3. Bp(X) = (22)3 = 1 Bp(Y(1)) = (27)3 = 0 ⇒ X≠Y(1) Bp(Y(2)) = ( (0-24)3 2+0)3 = 1 ⇒ X = Y(2)
  • 35.
     e.g. X =10110 , Y = 10011 , P = 3 Bp(X) = (22)3 = 1 Bp(Y(1)) = (19)3 = 1 ⇒ X= Y(1) WRONG!   If Bp(X) ≠ Bp(Y(i)), then X ≠ Y(i) . If Bp(X) = Bp(Y(i)), we may do a bit by bit checking or compute k different fingerprints by using k different prime numbers in {1,2,… nt2} .
  • 36.
    A randomized algorithmfor pattern matching   Input: A pattern X = x1 x2…xn, a text Y = y1 y2…ym and a parameter k. Output: (1) No, there is no consecutive substring in Y which matches with X. (2) Yes, Y(i) = yi yi+1.…yi+n-1 matches with X which is the first occurrence. If the answer is “No” , there is no mistake. If the answer is “Yes” , there is some probability that a mistake is made.
  • 37.
    Step 1: Randomlychoose k prime numbers p1, p2, …, pk from {1,2,…,nt2}, where t = m - n + 1. Step 2: i = 1. Step 3: j = 1. Step 4: If B(X)Pj ≠ (B(Yi))pj, then go to step 5. If j = k, return Y(i) as the answer. j = j + 1. Go to step 4. Step5: If i = t, return “No, there is no consecutive substring in Y which matches with X.” i = i + 1. Go to Step 3.
  • 38.
    An example forthe algorithm  X = 10110 , Y = 100111 , P1 = 3 , P2 = 5 B3(X) = (22)3 = 1 B5(X) = (22)5 = 2 B3(Y(2)) = (7)3 = 1 B5(y(2)) = (7)5 = 2 Choose one more prime number, P3 = 7 B7(x) = (22)7 = 1 B7(Y(2)) = (7)7 = 0 ⇒X ≠ Y(2)
  • 39.
    How often doesa mistake occur?    If a mistake occurs in X and Y(i), then B(X) - B(Y(i)) ≠ 0, and pj divides | B(X) - B(Y(i)) | for all pj’s. Let Q = π i where p j divides B ( X ) − B ( Y ( i )) B( X ) − B (Y (i )) Q<2n(m-n+1) reason: B(x)<2n, and at most (m-n+1) B(Y(i))’s 2n 2n…2n m-n-1
  • 40.
    Theorem for numbertheory     Theorem: If u≥29 and a<2u, then a has fewer than π(u) diffferent prime number divisors where π(u) is the number of prime numbers smaller than u. Assume nt ≥ 29 . Q < 2n(m-n+1) = 2nt ⇒ Q has fewer than π(nt) different prime number divisors. If pj is a prime number selected from {1, 2, …, M}, π nt ) ( the probability that pj divides Q is less than π M ). ( If k different prime numbers are selected from {1, 2, … nt2} , the probability that a mistake occurs is less than k  π (nt )   provided nt ≥ 29.  2    π (nt ) 
  • 41.
    An example formistake probability k  π ( nt )   2   π (nt )   How do we estimate    u u Theorem: For all u ≥ 17,  ≤ π (u ) ≤ 1.25506 ln u ln u   π (nt ) nt ln(nt 2 ) ≤ 1.25506 ⋅ ⋅ 2 π (nt ) ln nt nt 2 1.25506 ln(t ) = (1 + ) t ln(nt ) Example: n = 10 , m = 100 , t = m - n + 1 = 91 π (nt ) ≤ 0.0229 2 π ( nt ) Let k=4 (0.0229)4≈2.75×10-7 // very small
  • 42.
    Interactive proofs: methodI    Two persons: A : a spy B : the boss of A When A wants to talk to B , how does B know that A is the real A, not an enemy imitating A ? Method I : a trivial method B may ask the name of A’s mother (a private secret) Disadvantage: The enemy can collect the information, and imitate A the next time.
  • 43.
    Interactive proofs: methodII    Method II: B may send a Boolean formula to A and ask A to determine its satisfiability. (an NP-complete problem). It is assumed that A is a smart person and knows how to solve this NP-complete problem. B can check the answer and know whether A is the real A or not. Disadvantage: The enemy can study methods of mechanical theorem proving and sooner or later he can imitate A. In Methods I and II, A and B have revealed too much.
  • 44.
    A randomized algorithmfor interactive proofs   Method III: B can ask A to solve a quadratic nonresidue problem in which the data can be sent back and forth without revealing much information. Definition: GCD(x, y) = 1, y is a quadratic residue mod x if z2 ≡ y mod x for some z, 0 < z < x, GCD(x, z) = 1, and y is a quadratic nonresidue mod x if otherwise. (See the example on the next page.)
  • 45.
    An example forquadratic residue/nonresidue    Let QR = {(x, y) | y is a quadratic residue mod x} QNR = {(x, y) | y is a quadratic nonresidue mod x} Try to test x = 9, y = 7: 12 ≡ 1 mod 9 22 ≡ 4 mod 9 32 ≡ 0 mod 9 42 ≡ 7 mod 9 52 ≡ 7 mod 9 62 ≡ 0 mod 9 72 ≡ 4 mod 9 82 ≡ 1 mod 9 We have (9,1), (9,4), (9,7) ∈ QR but (9,5), (9,8) ∈ QNR
  • 46.
    Detailed method for interactiveproofs 1) 2) A and B know x and keep x confidential . B knows y. Action of B: Step 1: Randomly choose m bits: b1, b2, …, bm, where m is the length of the binary representation of x. Step 2: Find z1, z2, …, zm s.t. GCD(zi , x)=1 for all i . Step 3:Compute w1, w2, …, wm: wi ←zi2 mod x if bi=0 //(x, wi) ∈ QR wi ← (zi2y) mod x if bi=1 //(x, wi) ∈ NQR Step 4: Send w1, w2, …, wm to A.
  • 47.
    3) Action of A: Step1: Receive w1, w2, …, wm from B. Step 2: Compute c1, c2, …, cm: ci ←0 if (x, wi) ∈ QR ci ←1 if (x, wi) ∈ QNR Send c1, c2, …, cm to B. 4) Action of B: Step 1: Receive c1, c2, …, cm from A. Step 2: If (x, y) ∈ QNR and bi = ci for all i, then A is the real A (with probability 1-2-m).

Editor's Notes