Description
Matrix factorization using Alternating Least Square method.
ALS tries to decompose a matrix R as R = X Yt. Here X and Y are called factor matrices. Matrix R is usually a sparse matrix representing ratings given from users to items. ALS tries to find X and Y that minimize || R - X Yt ||^2. This is done by iterations. At each step, X is fixed and Y is solved, then Y is fixed and X is solved.
The algorithm is described in “Large-scale Parallel Collaborative Filtering for the Netflix Prize, 2007”
We also support implicit preference model described in “Collaborative Filtering for Implicit Feedback Datasets, 2008”
Parameters
Name | Description | Type | Required? | Default Value |
---|---|---|---|---|
rank | Rank of the factorization (>0). | Integer | 10 | |
lambda | regularization parameter (>= 0). | Double | 0.1 | |
nonnegative | Whether enforce the non-negative constraint. | Boolean | false | |
implicitPrefs | Whether to use implicit preference model. | Boolean | false | |
alpha | The alpha in implicit preference model. | Double | 40.0 | |
numBlocks | Number of blocks when doing ALS. This is a performance parameter. | Integer | 1 | |
userCol | User column name | String | ✓ | |
itemCol | Item column name | String | ✓ | |
rateCol | Rating column name | String | ✓ | |
numIter | Number of iterations, The default value is 10 | Integer | 10 |
Script Example
Code
data = np.array([
[1, 1, 0.6],
[2, 2, 0.8],
[2, 3, 0.6],
[4, 1, 0.6],
[4, 2, 0.3],
[4, 3, 0.4],
])
df_data = pd.DataFrame({
"user": data[:, 0],
"item": data[:, 1],
"rating": data[:, 2],
})
df_data["user"] = df_data["user"].astype('int')
df_data["item"] = df_data["item"].astype('int')
data = dataframeToOperator(df_data, schemaStr='user bigint, item bigint, rating double', op_type='batch')
als = AlsTrainBatchOp().setUserCol("user").setItemCol("item").setRateCol("rating") \
.setNumIter(10).setRank(10).setLambda(0.01)
model = als.linkFrom(data)
model.print()
Results
user item factors
0 1.0 NaN -0.06586061 -0.034223076 0.069877796 0.0920446...
1 2.0 NaN 0.30718762 0.16972417 0.008185322 0.0386066 0....
2 4.0 NaN -0.06712866 -0.034935225 0.069463015 0.0913517...
3 NaN 1.0 -0.15275586 -0.07944428 0.15982738 0.21034132 ...
4 NaN 2.0 0.5041202 0.27869284 0.01877524 0.07083873 0.3...
5 NaN 3.0 0.23072533 0.12939966 0.06971352 0.118020855 0...