Description

PrefixSpan algorithm is used to mine frequent sequential patterns. The PrefixSpan algorithm is described in J. Pei, et al., Mining Sequential Patterns by Pattern-Growth: The PrefixSpan Approach

Parameters

Name Description Type Required? Default Value
itemsCol Column name of transaction items String
minSupportCount Minimum support count Integer -1
minSupportPercent Minimum support percent Double 0.02
minConfidence Minimum confidence Double 0.05
maxPatternLength Maximum frequent pattern length Integer 10

Script Example

Code

  1. data = np.array([
  2. ["a;a,b,c;a,c;d;c,f"],
  3. ["a,d;c;b,c;a,e"],
  4. ["e,f;a,b;d,f;c;b"],
  5. ["e;g;a,f;c;b;c"],
  6. ])
  7. df_data = pd.DataFrame({
  8. "sequence": data[:, 0],
  9. })
  10. data = dataframeToOperator(df_data, schemaStr='sequence string', op_type='batch')
  11. prefixSpan = PrefixSpanBatchOp() \
  12. .setItemsCol("sequence") \
  13. .setMinSupportCount(3)
  14. prefixSpan.linkFrom(data)
  15. prefixSpan.print()
  16. prefixSpan.getSideOutput(0).print()

输入说明:一个sequence由多个element组成,element之间用分号分隔;一个element由多个item组成,item间用逗号分隔。

Results

Output

  1. itemset supportcount itemcount
  2. 0 a 4 1
  3. 1 a;c 4 2
  4. 2 a;c;c 3 3
  5. 3 a;c;b 3 3
  6. 4 a;b 4 2
  7. 5 b 4 1
  8. 6 b;c 3 2
  9. 7 c 4 1
  10. 8 c;c 3 2
  11. 9 c;b 3 2
  12. 10 d 3 1
  13. 11 d;c 3 2
  14. 12 e 3 1
  15. 13 f 3 1

Output

  1. rule chain_length support confidence transaction_count
  2. 0 a=>c 2 1.00 1.00 4
  3. 1 a;c=>c 3 0.75 0.75 3
  4. 2 a;c=>b 3 0.75 0.75 3
  5. 3 a=>b 2 1.00 1.00 4
  6. 4 b=>c 2 0.75 0.75 3
  7. 5 c=>c 2 0.75 0.75 3
  8. 6 c=>b 2 0.75 0.75 3
  9. 7 d=>c 2 0.75 1.00 3