Mining Sequential Patterns
Agrawal, Rakesh, and Ramakrishnan Srikant.
"Mining sequential patterns." In Data Engineering, 1995. Proceedings
of the Eleventh International Conference on, pp. 3-14. IEEE, 1995.
Presenter : Shaina Raza ( Phd student)
Instructor : Dr. Cherie Ding
 Introduction to Sequential Mining
 Problem Definition
 Algorithm
 Performance
 Conclusion
Outline
2/10/20172
 Introduction to Sequential Mining
 Problem Definition
 Algorithm
 Performance
 Conclusion
Outline
2/10/20173
 A sequence is an ordered list of elements (transactions) : s = < e1 e2 e3 … >
 Each element contains a collection of events (items) : ei = {i1, i2, …, ik}
 Each element is attributed to a specific time or location
 Length of a sequence= |s|= number of elements of the sequence
 A k-sequence = k events (items) sequence
Sequence
Sequence
E1
E2
E1
E3
E2
E3
E4
E2
Element
(Transaction)
Event
(Item)
t1 t2 tn
2/10/2017
4
Subsequence
 A sequence <a1 a2 … an> is contained in another sequence <b1 b2 … bm> (m ≥ n)
such that a1  b1 , a2  b2, …, an  bm
Data sequence Subsequence Contain?
< {2,4} {3,5,6} {8} > < {2} {3,5} > Yes
< {1,2} {3,4} > < {1} {2} > No
2/10/20175
 Sequential Pattern Mining is the mining of frequently occurring ordered
events or subsequences as patterns
 Example:
 Web access patterns, Weather prediction, Telephone calling patterns, DNA
sequences and gene structures
Sequential Pattern Mining
2/10/2017
Sequence Database Sequence Element (Transaction) Event (Item)
Customer Purchase history of a
given customer
A set of items bought by a
customer at time t
Books, grocery, CDs,
etc
6
Sequential Pattern Mining : Example
A sequence database
SID Sequence
1 <a(abc)(ac)d(cf)>
2 <(ad)c(bc)(ae)>
3 <(ef)(ab)(df)cb>
4 <eg(af)cbc>
Support for subsequence <(ab)c> is 2 (Present in 1 and 3)
Frequent as it satisfies minimum support of 2
Given support threshold min_sup =2, <(ab)c> is a sequential pattern
2/10/2017
Length of 1st Sequence = 1 – 9
(‘a’ multiple times but contribute only
one to the support of <a> )
8
APPROACHES OF SEQUNTIAL PATTERN
ALGORITHM
 Apriori-like (Concept introduction and an initial )
 Agrawal & Srikant. Mining sequential patterns, ICDE’95
 Apriori-based
 GSP (Generalized Sequential Patterns: Srikant & Agrawal @ EDBT’96)
 SPADE (Zaki@Machine Leanining’00)
 Pattern-Growth-based
 FreeSpan & PrefixSpan(Han et al.@KDD’00; Pei, et al.@ICDE’01)
 Constraint-based
 SPIRIT ( Garofalakis, Rastogi, Shim@VLDB’99; Pei, Han, Wang @ CIKM’02)
 Mining closed sequential patterns
 CloSpan (Yan, Han & Afshar @SDM’03)
2/10/20179
 Introduction to Sequential Mining
 Problem Definition
 Algorithm
 Performance
 Conclusion
Outline
2/10/201710
1.Problem Definition
 Given:
 a database of sequences
 a user-specified minimum support threshold, minsup
 Task:
 Find all subsequences (or maximal sequences ) among all
sequences with support ≥ minsup
Support for a sequence is defined as the fraction of total customers who support this
sequence
Maximal Sequence : A sequence that is not contained in any other sequence. 2/10/2017
11
 Introduction to Sequential Mining
 Problem Definition
 Algorithm
 Performance
 Conclusion
Outline
2/10/201712
2/10/2017
2.The Algorithm (Phases)
1. Sort
2. Litemset
3. Transformation
4. Sequence
1. AprioriAll,
2. AprioriSome
3. DynamicSome
5. Maximal
13
1.Sort Phase
 Sort the database:
 Customer ID as the major key.
 Transaction-Time as the
minor key.
 Convert the original
transaction DB into a
customer sequence DB.
Customer
ID
Transaction
Time
Items
Bought
1 June 25 '93 30
1 June 30 '93 90
2 June 10 '93 10,20
2 June 15 '93 30
2 June 20 '93 40,60,70
3 June 25 '93 30,50,70
4 June 25 '93 30
4 June 30 '93 40,70
4 July 25 '93 90
5 June 12 '93 90
Original database 2/10/201714
Customer Id Customer Sequence
1 <(30) (90)>
2 <(10 , 20) (30) (40, 60, 70)>
3 <(30, 50 , 70)>
4 <(30) (40, 70) (90)>
5 <(90)>
Customer-sequence version of the
Sequential Database
• Minimum support of 40%: a minimum support of 2 customers.
Sequential Patterns with Support > 40%
<(30) , (90)>
<(30) (40, 70)>
2/10/201715
2/10/2017
The Algorithm (Phases)
1. Sort
2. Litemset
3. Transformation
4. Sequence
1. AprioriAll,
2. AprioriSome
3. DynamicSome
5. Maximal
16
2.Litemset Phase
 Litemset (Large Itemset):
 Itemset with minimum support.
Customer
ID
Transaction
Time
Items
Bought
1 June 25 '93 30
1 June 30 '93 90
2 June 10 '93 10,20
2 June 15 '93 30
2 June 20 '93 40,60,70
3 June 25 '93 30,50,70
4 June 25 '93 30
4 June 30 '93 40,70
4 July 25 '93 90
5 June 12 '93 90
Large Itemsets Mapped To
(30) 1
(40) 2
(70) 3
(40 70) 4
(90) 5
*Reason of mapping: treating litemsets as single entities
- Compare two litemsets in constant time
- Reduce the time to check if a sequence is contained in a customer sequence
2/10/201717
2/10/2017
The Algorithm (Phases)
1. Sort
2. Litemset
3. Transformation
4. Sequence
1. AprioriAll,
2. AprioriSome
3. DynamicSome
5. Maximal
18
3.Transformation Phase
Cust ID Original Cust Sequence Transformed Customer Sequence After Mapping
1 < (30) (90) > <{(30)} {(90)}> <{1} {5}>
2 < (10 20) (30) (40 60 70) > <{(30)} {(40),(70),(40 70)}> <{1} {2,3,4}>
3 < (30) (50) (70) > <{(30),(70)}> <{1,3}>
4 < (30) (40 70) (90) > <{(30)} {(40),(70),(40 70)} {(90)}> <{1} {2,3,4} {5}>
5 < (90) > <{(90)}> <{5}>
• Replace each transaction with all litemsets contained in the transaction.
• Transactions with no litemsets are dropped.
Note: (10 20) dropped because of lack of support.
(40 60 70) replaced with set of litemsets {(40),(70),(40 70)} (60 does not have
minisup)
2/10/2017
The Algorithm (Phases)
1. Sort
2. Litemset
3. Transformation
4. Sequence
1. AprioriAll,
2. AprioriSome
3. DynamicSome
5. Maximal
20
2/10/2017
4.Sequence Phase
Two types of algorithms:
 Count-all: counts all large sequences, including non-
maximal sequences.
 AprioriAll
 Count-some: try to avoid counting non-maximal sequences
by counting longer sequences first.
 AprioriSome
 DynamicSome
21
2/10/2017
The Algorithm (Phases)
1. Sort
2. Litemset
3. Transformation
4. Sequence
1. AprioriAll
2. AprioriSome
3. DynamicSome
5. Maximal
22
 Based on the normal Apriori algorithm that counts all the large sequences
 Steps
 Candidate generation:
 Join Lk-1 with itself to form Ck
insert into Ck
select p.litemset1,p.litemsetk-1,q.litemsetk-1 from Lk-1 p, Lk-1 q
where p.litemset1=q.litemset1 ….litemsetk-2=q.litemsetk-2
 Delete all sequences c in Ck such that some (k-1)-sub sequence of c is
not in Lk-1.
a) AprioriAll
2/10/201723
AprioriAll (Example)
Cust Sequences
<{1 5} {2} {3} {4} >
<{1} {3} {4} {3 5}>
<{1} {2} {3} {4}>
<{1} {3} {5}>
<{4} {5}>
1-Seq Sup
<1> 4
<2> 2
<3> 4
<4> 4
<5> 4
2-Seq Sup
<1 2> 2
<1 3> 4
<1 4> 3
<1 5> 3
<2 3> 2
<2 4> 2
<3 4> 3
<3 5> 2
<4 5> 2
3-Seq Sup
<1 2 3> 2
<1 2 4> 2
<1 3 4> 3
<1 3 5> 2
<2 3 4> 2
4-Seq Sup
<1 2 3 4> 2
<3 4 5> 1
<2 5> 0
Minisup = 40%
Answer: <1 2 3 4>, <1 3 5>, <4 5>
Lk = Large k-Sequences
Sequence Support
<1 2 3 4> 2
<1 3 5> 2
<4 5> 2
Maximal Large Sequences
2/10/2017
L1
L2 L3
L4
24
2/10/2017
The Algorithm (Phases)
1. Sort
2. Litemset
3. Transformation
4. Sequence
1. AprioriAll,
2. AprioriSome
3. DynamicSome
5. Maximal
25
b)AprioriSome
 Avoid counting non-maximal sequences by counting the longer ones first.
 next(length of sequence in last pass) to count next
 next(k) = k + 1 : AprioriSome degenerates into AprioriAll (extreme case)
 Balances the tradeoff between the time wasted in counting non-maximal
sequences versus counting extensions of small candidate sequences
2/10/201726
AprioriSome Phases
Forward phase Backward phase
• Find all large sequences of certain
lengths
• e.g. length 1, 2, 4 and 6 in "forward
phase"
• Find all remaining large sequences
skipped over forward phase
• count sequences of length 3 and 5
in "backward phase
In the forward phase, candidates for all
levels are counted:
• If in the large sequences of length Lk-1were
checked, then generate new candidates Ck
based on them
• If in the large sequences of length Lk-1were
NOT checked, then generate new candidates
Ck based on candidates Ck-1
In backward phase: delete the large
sequences found in the forward phase that
are non-maximal.
2/10/201727
2/10/2017
AprioriSome Forward
next(k) = 2k
minsup = 2
Forward Phase:
L2
2-Seq Sup
<1 2> 2
<1 3> 4
<1 4> 3
<1 5> 3
<2 3> 2
<2 4> 2
<3 4> 3
<3 5> 2
<4 5> 2
L3
3-Seq Sup
<1 2 3> 2
<1 2 4> 2
<1 3 4> 3
<1 3 5> 2
<2 3 4> 2
L4
4-Seq Sup
<1 2 3 4> 2
C3
3-Seq
<1 2 3>
<1 2 4>
<1 3 4>
<1 3 5>
<2 3 4>
<3 4 5>
C4
4-Seq
<1 2 3 4>
28
2/10/2017
AprioriSome Backward
 Example
Backward Phase:
3-Sequences
C3
29
2/10/2017
The Algorithm (Phases)
1. Sort
2. Litemset
3. Transformation
4. Sequence
1. AprioriAll,
2. AprioriSome
3. DynamicSome
5. Maximal
30
2/10/2017
DynamicSome
 Similar to AprioriSome
 AprioriSome generates Ck from Ck-1
 DynamicSome generates Ck “on the fly”
 Use the variable step to decide how to jump.
 On-the-fly generate function
 Otf(Lk,Lj,c)  (k+j)-sequence contained in c
 Where Lk =large k-sequence, Lj= large j-sequence, c:customer sequence
 Example
 Otf(L2,L2, <{1} {2} {3 7} {4}>)  <1 2 3 4>
 Where c1={1], c2={2}, c3={3 7}, c4={4}
 Join L2 with L2 with join condition
L2-sequence
31
2/10/2017
The Algorithm (Phases)
1. Sort
2. Litemset
3. Transformation
4. Sequence
1. AprioriAll,
2. AprioriSome
3. DynamicSome
5. Maximal
32
2/10/2017
5.Maximal Phase
 Find maximal sequences among large sequences.
 k-sequence: sequence of length k
 S: the set of all large sequences
for (k=n; k>1; k--) do
for each k-sequence sk do
Delete from S all subsequences of sk
Already performed above
33
 Introduction to Sequential Mining
 Problem Definition
 Finding Sequential Patterns (The Main Algorithm)
 Sequence Phase (AprioriAll, AprioriSome,
DynamicSome)
 Performance
 Conclusion
Outline
2/10/201734
Performance
 DynamicSome generates too many candidates.
 AprioriSome does a little better than AprioriAll.
 It avoids counting many non-maximal sequences.
 AprioriSome generates more candidates which remain
memory resident
2/10/2017 35
 Introduction to Sequential Mining
 Problem Definition
 Finding Sequential Patterns (The Main Algorithm)
 Sequence Phase (AprioriAll, AprioriSome,
DynamicSome)
 Performance
 Conclusion
Outline
2/10/201736
 They proposed an algorithm for finding sequential patterns in a
database.
 They proposed three different algorithm for the sequence
phase.
 AprioriAll
 AprioriSome
 DynamicSome
Conclusion
2/10/201737
2/10/2017
Ref: Mining Sequential Patterns
 R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements.
EDBT’96.
 H. Mannila, H Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. DAMI:97.
 M. Zaki. SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning, 2001.
 J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by
Prefix-Projected Pattern Growth. ICDE'01 (TKDE’04).
 J. Pei, J. Han and W. Wang, Constraint-Based Sequential Pattern Mining in Large Databases, CIKM'02.
 X. Yan, J. Han, and R. Afshar. CloSpan: Mining Closed Sequential Patterns in Large Datasets. SDM'03.
 J. Wang and J. Han, BIDE: Efficient Mining of Frequent Closed Sequences, ICDE'04.
 H. Cheng, X. Yan, and J. Han, IncSpan: Incremental Mining of Sequential Patterns in Large Database, KDD'04.
 J. Han, G. Dong and Y. Yin, Efficient Mining of Partial Periodic Patterns in Time Series Database, ICDE'99.
 J. Yang, W. Wang, and P. S. Yu, Mining asynchronous periodic patterns in time series data, KDD'00.
38
Questions ??
2/10/201739

pattern mining

  • 1.
    Mining Sequential Patterns Agrawal,Rakesh, and Ramakrishnan Srikant. "Mining sequential patterns." In Data Engineering, 1995. Proceedings of the Eleventh International Conference on, pp. 3-14. IEEE, 1995. Presenter : Shaina Raza ( Phd student) Instructor : Dr. Cherie Ding
  • 2.
     Introduction toSequential Mining  Problem Definition  Algorithm  Performance  Conclusion Outline 2/10/20172
  • 3.
     Introduction toSequential Mining  Problem Definition  Algorithm  Performance  Conclusion Outline 2/10/20173
  • 4.
     A sequenceis an ordered list of elements (transactions) : s = < e1 e2 e3 … >  Each element contains a collection of events (items) : ei = {i1, i2, …, ik}  Each element is attributed to a specific time or location  Length of a sequence= |s|= number of elements of the sequence  A k-sequence = k events (items) sequence Sequence Sequence E1 E2 E1 E3 E2 E3 E4 E2 Element (Transaction) Event (Item) t1 t2 tn 2/10/2017 4
  • 5.
    Subsequence  A sequence<a1 a2 … an> is contained in another sequence <b1 b2 … bm> (m ≥ n) such that a1  b1 , a2  b2, …, an  bm Data sequence Subsequence Contain? < {2,4} {3,5,6} {8} > < {2} {3,5} > Yes < {1,2} {3,4} > < {1} {2} > No 2/10/20175
  • 6.
     Sequential PatternMining is the mining of frequently occurring ordered events or subsequences as patterns  Example:  Web access patterns, Weather prediction, Telephone calling patterns, DNA sequences and gene structures Sequential Pattern Mining 2/10/2017 Sequence Database Sequence Element (Transaction) Event (Item) Customer Purchase history of a given customer A set of items bought by a customer at time t Books, grocery, CDs, etc 6
  • 7.
    Sequential Pattern Mining: Example A sequence database SID Sequence 1 <a(abc)(ac)d(cf)> 2 <(ad)c(bc)(ae)> 3 <(ef)(ab)(df)cb> 4 <eg(af)cbc> Support for subsequence <(ab)c> is 2 (Present in 1 and 3) Frequent as it satisfies minimum support of 2 Given support threshold min_sup =2, <(ab)c> is a sequential pattern 2/10/2017 Length of 1st Sequence = 1 – 9 (‘a’ multiple times but contribute only one to the support of <a> ) 8
  • 8.
    APPROACHES OF SEQUNTIALPATTERN ALGORITHM  Apriori-like (Concept introduction and an initial )  Agrawal & Srikant. Mining sequential patterns, ICDE’95  Apriori-based  GSP (Generalized Sequential Patterns: Srikant & Agrawal @ EDBT’96)  SPADE (Zaki@Machine Leanining’00)  Pattern-Growth-based  FreeSpan & PrefixSpan(Han et al.@KDD’00; Pei, et al.@ICDE’01)  Constraint-based  SPIRIT ( Garofalakis, Rastogi, Shim@VLDB’99; Pei, Han, Wang @ CIKM’02)  Mining closed sequential patterns  CloSpan (Yan, Han & Afshar @SDM’03) 2/10/20179
  • 9.
     Introduction toSequential Mining  Problem Definition  Algorithm  Performance  Conclusion Outline 2/10/201710
  • 10.
    1.Problem Definition  Given: a database of sequences  a user-specified minimum support threshold, minsup  Task:  Find all subsequences (or maximal sequences ) among all sequences with support ≥ minsup Support for a sequence is defined as the fraction of total customers who support this sequence Maximal Sequence : A sequence that is not contained in any other sequence. 2/10/2017 11
  • 11.
     Introduction toSequential Mining  Problem Definition  Algorithm  Performance  Conclusion Outline 2/10/201712
  • 12.
    2/10/2017 2.The Algorithm (Phases) 1.Sort 2. Litemset 3. Transformation 4. Sequence 1. AprioriAll, 2. AprioriSome 3. DynamicSome 5. Maximal 13
  • 13.
    1.Sort Phase  Sortthe database:  Customer ID as the major key.  Transaction-Time as the minor key.  Convert the original transaction DB into a customer sequence DB. Customer ID Transaction Time Items Bought 1 June 25 '93 30 1 June 30 '93 90 2 June 10 '93 10,20 2 June 15 '93 30 2 June 20 '93 40,60,70 3 June 25 '93 30,50,70 4 June 25 '93 30 4 June 30 '93 40,70 4 July 25 '93 90 5 June 12 '93 90 Original database 2/10/201714
  • 14.
    Customer Id CustomerSequence 1 <(30) (90)> 2 <(10 , 20) (30) (40, 60, 70)> 3 <(30, 50 , 70)> 4 <(30) (40, 70) (90)> 5 <(90)> Customer-sequence version of the Sequential Database • Minimum support of 40%: a minimum support of 2 customers. Sequential Patterns with Support > 40% <(30) , (90)> <(30) (40, 70)> 2/10/201715
  • 15.
    2/10/2017 The Algorithm (Phases) 1.Sort 2. Litemset 3. Transformation 4. Sequence 1. AprioriAll, 2. AprioriSome 3. DynamicSome 5. Maximal 16
  • 16.
    2.Litemset Phase  Litemset(Large Itemset):  Itemset with minimum support. Customer ID Transaction Time Items Bought 1 June 25 '93 30 1 June 30 '93 90 2 June 10 '93 10,20 2 June 15 '93 30 2 June 20 '93 40,60,70 3 June 25 '93 30,50,70 4 June 25 '93 30 4 June 30 '93 40,70 4 July 25 '93 90 5 June 12 '93 90 Large Itemsets Mapped To (30) 1 (40) 2 (70) 3 (40 70) 4 (90) 5 *Reason of mapping: treating litemsets as single entities - Compare two litemsets in constant time - Reduce the time to check if a sequence is contained in a customer sequence 2/10/201717
  • 17.
    2/10/2017 The Algorithm (Phases) 1.Sort 2. Litemset 3. Transformation 4. Sequence 1. AprioriAll, 2. AprioriSome 3. DynamicSome 5. Maximal 18
  • 18.
    3.Transformation Phase Cust IDOriginal Cust Sequence Transformed Customer Sequence After Mapping 1 < (30) (90) > <{(30)} {(90)}> <{1} {5}> 2 < (10 20) (30) (40 60 70) > <{(30)} {(40),(70),(40 70)}> <{1} {2,3,4}> 3 < (30) (50) (70) > <{(30),(70)}> <{1,3}> 4 < (30) (40 70) (90) > <{(30)} {(40),(70),(40 70)} {(90)}> <{1} {2,3,4} {5}> 5 < (90) > <{(90)}> <{5}> • Replace each transaction with all litemsets contained in the transaction. • Transactions with no litemsets are dropped. Note: (10 20) dropped because of lack of support. (40 60 70) replaced with set of litemsets {(40),(70),(40 70)} (60 does not have minisup)
  • 19.
    2/10/2017 The Algorithm (Phases) 1.Sort 2. Litemset 3. Transformation 4. Sequence 1. AprioriAll, 2. AprioriSome 3. DynamicSome 5. Maximal 20
  • 20.
    2/10/2017 4.Sequence Phase Two typesof algorithms:  Count-all: counts all large sequences, including non- maximal sequences.  AprioriAll  Count-some: try to avoid counting non-maximal sequences by counting longer sequences first.  AprioriSome  DynamicSome 21
  • 21.
    2/10/2017 The Algorithm (Phases) 1.Sort 2. Litemset 3. Transformation 4. Sequence 1. AprioriAll 2. AprioriSome 3. DynamicSome 5. Maximal 22
  • 22.
     Based onthe normal Apriori algorithm that counts all the large sequences  Steps  Candidate generation:  Join Lk-1 with itself to form Ck insert into Ck select p.litemset1,p.litemsetk-1,q.litemsetk-1 from Lk-1 p, Lk-1 q where p.litemset1=q.litemset1 ….litemsetk-2=q.litemsetk-2  Delete all sequences c in Ck such that some (k-1)-sub sequence of c is not in Lk-1. a) AprioriAll 2/10/201723
  • 23.
    AprioriAll (Example) Cust Sequences <{15} {2} {3} {4} > <{1} {3} {4} {3 5}> <{1} {2} {3} {4}> <{1} {3} {5}> <{4} {5}> 1-Seq Sup <1> 4 <2> 2 <3> 4 <4> 4 <5> 4 2-Seq Sup <1 2> 2 <1 3> 4 <1 4> 3 <1 5> 3 <2 3> 2 <2 4> 2 <3 4> 3 <3 5> 2 <4 5> 2 3-Seq Sup <1 2 3> 2 <1 2 4> 2 <1 3 4> 3 <1 3 5> 2 <2 3 4> 2 4-Seq Sup <1 2 3 4> 2 <3 4 5> 1 <2 5> 0 Minisup = 40% Answer: <1 2 3 4>, <1 3 5>, <4 5> Lk = Large k-Sequences Sequence Support <1 2 3 4> 2 <1 3 5> 2 <4 5> 2 Maximal Large Sequences 2/10/2017 L1 L2 L3 L4 24
  • 24.
    2/10/2017 The Algorithm (Phases) 1.Sort 2. Litemset 3. Transformation 4. Sequence 1. AprioriAll, 2. AprioriSome 3. DynamicSome 5. Maximal 25
  • 25.
    b)AprioriSome  Avoid countingnon-maximal sequences by counting the longer ones first.  next(length of sequence in last pass) to count next  next(k) = k + 1 : AprioriSome degenerates into AprioriAll (extreme case)  Balances the tradeoff between the time wasted in counting non-maximal sequences versus counting extensions of small candidate sequences 2/10/201726
  • 26.
    AprioriSome Phases Forward phaseBackward phase • Find all large sequences of certain lengths • e.g. length 1, 2, 4 and 6 in "forward phase" • Find all remaining large sequences skipped over forward phase • count sequences of length 3 and 5 in "backward phase In the forward phase, candidates for all levels are counted: • If in the large sequences of length Lk-1were checked, then generate new candidates Ck based on them • If in the large sequences of length Lk-1were NOT checked, then generate new candidates Ck based on candidates Ck-1 In backward phase: delete the large sequences found in the forward phase that are non-maximal. 2/10/201727
  • 27.
    2/10/2017 AprioriSome Forward next(k) =2k minsup = 2 Forward Phase: L2 2-Seq Sup <1 2> 2 <1 3> 4 <1 4> 3 <1 5> 3 <2 3> 2 <2 4> 2 <3 4> 3 <3 5> 2 <4 5> 2 L3 3-Seq Sup <1 2 3> 2 <1 2 4> 2 <1 3 4> 3 <1 3 5> 2 <2 3 4> 2 L4 4-Seq Sup <1 2 3 4> 2 C3 3-Seq <1 2 3> <1 2 4> <1 3 4> <1 3 5> <2 3 4> <3 4 5> C4 4-Seq <1 2 3 4> 28
  • 28.
  • 29.
    2/10/2017 The Algorithm (Phases) 1.Sort 2. Litemset 3. Transformation 4. Sequence 1. AprioriAll, 2. AprioriSome 3. DynamicSome 5. Maximal 30
  • 30.
    2/10/2017 DynamicSome  Similar toAprioriSome  AprioriSome generates Ck from Ck-1  DynamicSome generates Ck “on the fly”  Use the variable step to decide how to jump.  On-the-fly generate function  Otf(Lk,Lj,c)  (k+j)-sequence contained in c  Where Lk =large k-sequence, Lj= large j-sequence, c:customer sequence  Example  Otf(L2,L2, <{1} {2} {3 7} {4}>)  <1 2 3 4>  Where c1={1], c2={2}, c3={3 7}, c4={4}  Join L2 with L2 with join condition L2-sequence 31
  • 31.
    2/10/2017 The Algorithm (Phases) 1.Sort 2. Litemset 3. Transformation 4. Sequence 1. AprioriAll, 2. AprioriSome 3. DynamicSome 5. Maximal 32
  • 32.
    2/10/2017 5.Maximal Phase  Findmaximal sequences among large sequences.  k-sequence: sequence of length k  S: the set of all large sequences for (k=n; k>1; k--) do for each k-sequence sk do Delete from S all subsequences of sk Already performed above 33
  • 33.
     Introduction toSequential Mining  Problem Definition  Finding Sequential Patterns (The Main Algorithm)  Sequence Phase (AprioriAll, AprioriSome, DynamicSome)  Performance  Conclusion Outline 2/10/201734
  • 34.
    Performance  DynamicSome generatestoo many candidates.  AprioriSome does a little better than AprioriAll.  It avoids counting many non-maximal sequences.  AprioriSome generates more candidates which remain memory resident 2/10/2017 35
  • 35.
     Introduction toSequential Mining  Problem Definition  Finding Sequential Patterns (The Main Algorithm)  Sequence Phase (AprioriAll, AprioriSome, DynamicSome)  Performance  Conclusion Outline 2/10/201736
  • 36.
     They proposedan algorithm for finding sequential patterns in a database.  They proposed three different algorithm for the sequence phase.  AprioriAll  AprioriSome  DynamicSome Conclusion 2/10/201737
  • 37.
    2/10/2017 Ref: Mining SequentialPatterns  R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. EDBT’96.  H. Mannila, H Toivonen, and A. I. Verkamo. Discovery of frequent episodes in event sequences. DAMI:97.  M. Zaki. SPADE: An Efficient Algorithm for Mining Frequent Sequences. Machine Learning, 2001.  J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth. ICDE'01 (TKDE’04).  J. Pei, J. Han and W. Wang, Constraint-Based Sequential Pattern Mining in Large Databases, CIKM'02.  X. Yan, J. Han, and R. Afshar. CloSpan: Mining Closed Sequential Patterns in Large Datasets. SDM'03.  J. Wang and J. Han, BIDE: Efficient Mining of Frequent Closed Sequences, ICDE'04.  H. Cheng, X. Yan, and J. Han, IncSpan: Incremental Mining of Sequential Patterns in Large Database, KDD'04.  J. Han, G. Dong and Y. Yin, Efficient Mining of Partial Periodic Patterns in Time Series Database, ICDE'99.  J. Yang, W. Wang, and P. S. Yu, Mining asynchronous periodic patterns in time series data, KDD'00. 38
  • 38.