Unit IV- KMP String Matching
Algorithm
• The Knuth–Morris–Pratt string-searching algorithm (or KMP algorithm) searches for occurrences of
a "word" W within a main "text string" S when a mismatch occurs, the pattern P has sufficient
information to determine where the next potential match could begin thereby avoiding several
unnecessary matching bringing the time complexity to linear.
• Knuth-Morris and Pratt introduce a linear time algorithm for the string matching problem.
• It which checks the characters from left to right. When a pattern has a sub-pattern appears more than
one in the sub-pattern.
Knuth–Morris–Pratt
Some of the applications are Text editors in computing machines, Database queries,
Bioinformatics and Cheminformatics, two dimensional mesh, network intrusion
detections system, wide window pattern matching (large string matching), music
content retrievals, language syntax checker, ms word spell checker, matching DNA
sequences, digital libraries, search engines.
Applications
Components of KMPAlgorithm:
1. The Prefix Function (Π): The prefix function for this string is defined as an array π
of length n, where π[i] is the length of the longest proper prefix of the substring s[0…i]
which is also a suffix of this substring. A proper prefix of a string is a prefix that is not
equal to the string itself. By definition, π[0]=0.
2. The KMP Matcher: With string 'S,' pattern 'p' and prefix function 'Π' as inputs, find
the occurrence of 'p' in 'S' and returns the number of shifts of 'p' after which
occurrences are found.
The Prefix Function (Π)
Following pseudo code compute the prefix function, Π:
COMPUTE- PREFIX- FUNCTION (P)
1. m ←length [P] //'p' pattern to be matched
2. Π [1] ← 0
3. k ← 0
4. for q ← 2 to m
5. do while k > 0 and P [k + 1] ≠ P [q]
6. do k ← Π [k]
7. If P [k + 1] = P [q]
8. then k← k + 1
9. Π [q] ← k
10. Return Π
Example: Compute Π for the pattern 'p' below:
Initially: m = length [p] = 7
Π [1] = 0
k = 0
COMPUTE- PREFIX- FUNCTION
(P)
1. m ←length [P]
//'p' pattern to be
matched
2. Π [1] ← 0
3. k ← 0
4. for q ← 2 to m
5. do while k > 0 and P
[k + 1] ≠ P [q]
6. do k ← Π [k-1]
7. If P [k + 1] = P [q]
8. then k← k + 1
9. Π [q] ← k
10. Return Π
Running Time Analysis:
For calculating the prefix function, the for loop from step 4 to step 10
runs 'm' times. Step1 to Step3 take constant time. Hence the running time
of computing prefix function is O (m).
The KMP Matcher:
The KMP Matcher with the pattern 'p,' the string ‘T' and prefix function 'Π' as input, finds a
match of p in T.
Following pseudo code compute the matching component of KMP algorithm:
KMP-MATCHER (T, P)
1. n ← length [S]
2. m ← length [P]
3. Π← COMPUTE-PREFIX-FUNCTION (P)
4. q ← 0 // numbers of characters matched
5. for i ← 1 to n // scan S from left to right
6. do while q > 0 and P [q + 1] ≠ T [i]
7. do q ← Π [q] // next character does not match
8. If P [q + 1] = T [i]
9. then q ← q + 1 // next character matches
10. If q = m // is all of p matched?
11. then print "Pattern occurs with shift" i - m
12. q ← Π [q] // look for the next
match
Running Time Analysis:
The for loop beginning in step 5 runs 'n' times, i.e., as long as the length of the
string 'S.' Since step 1 to step 4 take constant times, the running time is
dominated by this for the loop. Thus running time of the matching function is O
(n).
KMP-MATCHER (T, P)
1. n ← length [S]
2. m ← length [P]
3. Π← COMPUTE-PREFIX-FUNCTION (P)
4. q ← 0
5. for i ← 1 to n
6. do while q > 0 and P [q + 1] ≠
T [i]
7. do q ← Π [q]
8. If P [q + 1] = T [i]
9. then q ← q + 1
10. If q = m
11. then print "Pattern occurs
with shift" i - m
12. q ← Π [q]
KMP-MATCHER (T, P)
1. n ← length [S]
2. m ← length [P]
3. Π← COMPUTE-PREFIX-FUNCTION (P)
4. q ← 0
5. for i ← 1 to n
6. do while q > 0 and P [q + 1] ≠
T [i]
7. do q ← Π [q]
8. If P [q + 1] = T [i]
9. then q ← q + 1
10. If q = m
11. then print "Pattern occurs
with shift" i - m
12. q ← Π [q]
KMP-MATCHER (T, P)
1. n ← length [S]
2. m ← length [P]
3. Π← COMPUTE-PREFIX-FUNCTION (P)
4. q ← 0
5. for i ← 1 to n
6. do while q > 0 and P [q + 1] ≠
T [i]
7. do q ← Π [q]
8. If P [q + 1] = T [i]
9. then q ← q + 1
10. If q = m
11. then print "Pattern occurs
with shift" i - m
12. q ← Π [q]
KMP-MATCHER (T, P)
1. n ← length [S]
2. m ← length [P]
3. Π← COMPUTE-PREFIX-FUNCTION (P)
4. q ← 0
5. for i ← 1 to n
6. do while q > 0 and P [q + 1] ≠
T [i]
7. do q ← Π [q]
8. If P [q + 1] = T [i]
9. then q ← q + 1
10. If q = m
11. then print "Pattern occurs
with shift" i - m
12. q ← Π [q]
KMP-MATCHER (T, P)
1. n ← length [S]
2. m ← length [P]
3. Π← COMPUTE-PREFIX-FUNCTION (P)
4. q ← 0
5. for i ← 1 to n
6. do while q > 0 and P [q + 1] ≠
T [i]
7. do q ← Π [q]
8. If P [q + 1] = T [i]
9. then q ← q + 1
10. If q = m
11. then print "Pattern occurs
with shift" i - m
12. q ← Π [q]

KMP String Matching Algorithm

  • 1.
    Unit IV- KMPString Matching Algorithm
  • 2.
    • The Knuth–Morris–Prattstring-searching algorithm (or KMP algorithm) searches for occurrences of a "word" W within a main "text string" S when a mismatch occurs, the pattern P has sufficient information to determine where the next potential match could begin thereby avoiding several unnecessary matching bringing the time complexity to linear. • Knuth-Morris and Pratt introduce a linear time algorithm for the string matching problem. • It which checks the characters from left to right. When a pattern has a sub-pattern appears more than one in the sub-pattern. Knuth–Morris–Pratt
  • 3.
    Some of theapplications are Text editors in computing machines, Database queries, Bioinformatics and Cheminformatics, two dimensional mesh, network intrusion detections system, wide window pattern matching (large string matching), music content retrievals, language syntax checker, ms word spell checker, matching DNA sequences, digital libraries, search engines. Applications
  • 4.
    Components of KMPAlgorithm: 1.The Prefix Function (Π): The prefix function for this string is defined as an array π of length n, where π[i] is the length of the longest proper prefix of the substring s[0…i] which is also a suffix of this substring. A proper prefix of a string is a prefix that is not equal to the string itself. By definition, π[0]=0. 2. The KMP Matcher: With string 'S,' pattern 'p' and prefix function 'Π' as inputs, find the occurrence of 'p' in 'S' and returns the number of shifts of 'p' after which occurrences are found.
  • 5.
    The Prefix Function(Π) Following pseudo code compute the prefix function, Π: COMPUTE- PREFIX- FUNCTION (P) 1. m ←length [P] //'p' pattern to be matched 2. Π [1] ← 0 3. k ← 0 4. for q ← 2 to m 5. do while k > 0 and P [k + 1] ≠ P [q] 6. do k ← Π [k] 7. If P [k + 1] = P [q] 8. then k← k + 1 9. Π [q] ← k 10. Return Π
  • 6.
    Example: Compute Πfor the pattern 'p' below: Initially: m = length [p] = 7 Π [1] = 0 k = 0
  • 7.
    COMPUTE- PREFIX- FUNCTION (P) 1.m ←length [P] //'p' pattern to be matched 2. Π [1] ← 0 3. k ← 0 4. for q ← 2 to m 5. do while k > 0 and P [k + 1] ≠ P [q] 6. do k ← Π [k-1] 7. If P [k + 1] = P [q] 8. then k← k + 1 9. Π [q] ← k 10. Return Π
  • 10.
    Running Time Analysis: Forcalculating the prefix function, the for loop from step 4 to step 10 runs 'm' times. Step1 to Step3 take constant time. Hence the running time of computing prefix function is O (m).
  • 11.
    The KMP Matcher: TheKMP Matcher with the pattern 'p,' the string ‘T' and prefix function 'Π' as input, finds a match of p in T. Following pseudo code compute the matching component of KMP algorithm: KMP-MATCHER (T, P) 1. n ← length [S] 2. m ← length [P] 3. Π← COMPUTE-PREFIX-FUNCTION (P) 4. q ← 0 // numbers of characters matched 5. for i ← 1 to n // scan S from left to right 6. do while q > 0 and P [q + 1] ≠ T [i] 7. do q ← Π [q] // next character does not match 8. If P [q + 1] = T [i] 9. then q ← q + 1 // next character matches 10. If q = m // is all of p matched? 11. then print "Pattern occurs with shift" i - m 12. q ← Π [q] // look for the next match
  • 12.
    Running Time Analysis: Thefor loop beginning in step 5 runs 'n' times, i.e., as long as the length of the string 'S.' Since step 1 to step 4 take constant times, the running time is dominated by this for the loop. Thus running time of the matching function is O (n).
  • 14.
    KMP-MATCHER (T, P) 1.n ← length [S] 2. m ← length [P] 3. Π← COMPUTE-PREFIX-FUNCTION (P) 4. q ← 0 5. for i ← 1 to n 6. do while q > 0 and P [q + 1] ≠ T [i] 7. do q ← Π [q] 8. If P [q + 1] = T [i] 9. then q ← q + 1 10. If q = m 11. then print "Pattern occurs with shift" i - m 12. q ← Π [q]
  • 15.
    KMP-MATCHER (T, P) 1.n ← length [S] 2. m ← length [P] 3. Π← COMPUTE-PREFIX-FUNCTION (P) 4. q ← 0 5. for i ← 1 to n 6. do while q > 0 and P [q + 1] ≠ T [i] 7. do q ← Π [q] 8. If P [q + 1] = T [i] 9. then q ← q + 1 10. If q = m 11. then print "Pattern occurs with shift" i - m 12. q ← Π [q]
  • 16.
    KMP-MATCHER (T, P) 1.n ← length [S] 2. m ← length [P] 3. Π← COMPUTE-PREFIX-FUNCTION (P) 4. q ← 0 5. for i ← 1 to n 6. do while q > 0 and P [q + 1] ≠ T [i] 7. do q ← Π [q] 8. If P [q + 1] = T [i] 9. then q ← q + 1 10. If q = m 11. then print "Pattern occurs with shift" i - m 12. q ← Π [q]
  • 17.
    KMP-MATCHER (T, P) 1.n ← length [S] 2. m ← length [P] 3. Π← COMPUTE-PREFIX-FUNCTION (P) 4. q ← 0 5. for i ← 1 to n 6. do while q > 0 and P [q + 1] ≠ T [i] 7. do q ← Π [q] 8. If P [q + 1] = T [i] 9. then q ← q + 1 10. If q = m 11. then print "Pattern occurs with shift" i - m 12. q ← Π [q]
  • 18.
    KMP-MATCHER (T, P) 1.n ← length [S] 2. m ← length [P] 3. Π← COMPUTE-PREFIX-FUNCTION (P) 4. q ← 0 5. for i ← 1 to n 6. do while q > 0 and P [q + 1] ≠ T [i] 7. do q ← Π [q] 8. If P [q + 1] = T [i] 9. then q ← q + 1 10. If q = m 11. then print "Pattern occurs with shift" i - m 12. q ← Π [q]