Description

FpGrowth computes frequent itemsets given a set of transactions. The FP-Growth algorithm is described in Han et al., Mining frequent patterns without candidate generation.

Parameters

Name Description Type Required? Default Value
itemsCol Column name of transaction items String
minSupportCount Minimum support count Integer -1
minSupportPercent Minimum support percent Double 0.02
minConfidence Minimum confidence Double 0.05
maxPatternLength Maximum frequent pattern length Integer 10
maxConsequentLength Maximum consequent length Integer 1
minLift Minimum lift Double 1.0

Script Example

Code

  1. data = np.array([
  2. ["A,B,C,D"],
  3. ["B,C,E"],
  4. ["A,B,C,E"],
  5. ["B,D,E"],
  6. ["A,B,C,D"],
  7. ])
  8. df_data = pd.DataFrame({
  9. "items": data[:, 0],
  10. })
  11. data = dataframeToOperator(df_data, schemaStr='items string', op_type='batch')
  12. fpGrowth = FpGrowthBatchOp() \
  13. .setItemsCol("items") \
  14. .setMinSupportPercent(0.4) \
  15. .setMinConfidence(0.6)
  16. fpGrowth.linkFrom(data)
  17. fpGrowth.print()
  18. fpGrowth.getSideOutput(0).print()

Results

Output

  1. itemset supportcount itemcount
  2. 0 E 3 1
  3. 1 B,E 3 2
  4. 2 C,E 2 2
  5. 3 B,C,E 2 3
  6. 4 D 3 1
  7. 5 B,D 3 2
  8. 6 C,D 2 2
  9. 7 B,C,D 2 3
  10. 8 A,D 2 2
  11. 9 B,A,D 2 3
  12. 10 C,A,D 2 3
  13. 11 B,C,A,D 2 4
  14. 12 A 3 1
  15. 13 B,A 3 2
  16. 14 C,A 3 2
  17. 15 B,C,A 3 3
  18. 16 C 4 1
  19. 17 B,C 4 2
  20. 18 B 5 1

Output

  1. rule itemcount lift support_percent confidence_percent transaction_count
  2. 0 B=>A 2 1.000000 0.6 0.600000 3
  3. 1 B=>D 2 1.000000 0.6 0.600000 3
  4. 2 B=>C 2 1.000000 0.8 0.800000 4
  5. 3 B=>E 2 1.000000 0.6 0.600000 3
  6. 4 C=>A 2 1.250000 0.6 0.750000 3
  7. 5 C=>B 2 1.000000 0.8 1.000000 4
  8. 6 B,C=>A 3 1.250000 0.6 0.750000 3
  9. 7 A=>C 2 1.250000 0.6 1.000000 3
  10. 8 A=>B 2 1.000000 0.6 1.000000 3
  11. 9 A=>D 2 1.111111 0.4 0.666667 2
  12. 10 B,A=>D 3 1.111111 0.4 0.666667 2
  13. 11 C,A=>B 3 1.000000 0.6 1.000000 3
  14. 12 B,A=>C 3 1.250000 0.6 1.000000 3
  15. 13 C,A=>D 3 1.111111 0.4 0.666667 2
  16. 14 B,C,A=>D 4 1.111111 0.4 0.666667 2
  17. 15 D=>A 2 1.111111 0.4 0.666667 2
  18. 16 D=>B 2 1.000000 0.6 1.000000 3
  19. 17 A,D=>B 3 1.000000 0.4 1.000000 2
  20. 18 B,D=>A 3 1.111111 0.4 0.666667 2
  21. 19 C,D=>B 3 1.000000 0.4 1.000000 2
  22. 20 A,D=>C 3 1.250000 0.4 1.000000 2
  23. 21 C,D=>A 3 1.666667 0.4 1.000000 2
  24. 22 C,A,D=>B 4 1.000000 0.4 1.000000 2
  25. 23 B,A,D=>C 4 1.250000 0.4 1.000000 2
  26. 24 B,C,D=>A 4 1.666667 0.4 1.000000 2
  27. 25 E=>B 2 1.000000 0.6 1.000000 3
  28. 26 C,E=>B 3 1.000000 0.4 1.000000 2