Description

Matrix factorization using Alternating Least Square method.

ALS tries to decompose a matrix R as R = X Yt. Here X and Y are called factor matrices. Matrix R is usually a sparse matrix representing ratings given from users to items. ALS tries to find X and Y that minimize || R - X Yt ||^2. This is done by iterations. At each step, X is fixed and Y is solved, then Y is fixed and X is solved.

The algorithm is described in “Large-scale Parallel Collaborative Filtering for the Netflix Prize, 2007”

We also support implicit preference model described in “Collaborative Filtering for Implicit Feedback Datasets, 2008”

Parameters

Name Description Type Required? Default Value
rank Rank of the factorization (>0). Integer 10
lambda regularization parameter (>= 0). Double 0.1
nonnegative Whether enforce the non-negative constraint. Boolean false
implicitPrefs Whether to use implicit preference model. Boolean false
alpha The alpha in implicit preference model. Double 40.0
numBlocks Number of blocks when doing ALS. This is a performance parameter. Integer 1
userCol User column name String
itemCol Item column name String
rateCol Rating column name String
numIter Number of iterations, The default value is 10 Integer 10
userCol User column name String
itemCol Item column name String
predictionCol Column name of prediction. String

Script Example

Code

  1. data = np.array([
  2. [1, 1, 0.6],
  3. [2, 2, 0.8],
  4. [2, 3, 0.6],
  5. [4, 1, 0.6],
  6. [4, 2, 0.3],
  7. [4, 3, 0.4],
  8. ])
  9. df_data = pd.DataFrame({
  10. "user": data[:, 0],
  11. "item": data[:, 1],
  12. "rating": data[:, 2],
  13. })
  14. df_data["user"] = df_data["user"].astype('int')
  15. df_data["item"] = df_data["item"].astype('int')
  16. data = dataframeToOperator(df_data, schemaStr='user bigint, item bigint, rating double', op_type='batch')
  17. als = ALS().setUserCol("user").setItemCol("item").setRateCol("rating") \
  18. .setNumIter(10).setRank(10).setLambda(0.01).setPredictionCol("pred_rating")
  19. pred = als.fit(data).transform(data)
  20. pred.print()

Results

  1. user item rating pred_rating
  2. 0 1 1 0.6 0.579622
  3. 1 2 2 0.8 0.766851
  4. 2 2 3 0.6 0.581079
  5. 3 4 1 0.6 0.574481
  6. 4 4 2 0.3 0.298500
  7. 5 4 3 0.4 0.382157