Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

als explain method bug #701

Open
eostendarp opened this issue Oct 29, 2023 · 1 comment
Open

als explain method bug #701

eostendarp opened this issue Oct 29, 2023 · 1 comment

Comments

@eostendarp
Copy link

I'm encountering an issue when running the explain method. I'm unsure of what is going wrong, but it seems like the dimensions of some matrix are being unintentionally flipped at some point.

The only userid that appears to dodge the issue is 0, but there is no such user in the training data I'm working with.

Below is code and output. Any feedback would be greatly appreciated!

import threadpoolctl
import numpy as np
from scipy import sparse
import implicit

threadpoolctl.threadpool_limits(1, 'blas')

matrix = np.loadtxt('./favs-2023-10-24.csv', dtype=np.uintc, delimiter=',')
user_post = sparse.csr_matrix((np.ones(matrix.shape[0], dtype=np.bool_), (matrix[:, 1], matrix[:, 0])))

model = implicit.als.AlternatingLeastSquares(factors=256, regularization=0.01, alpha=40, dtype=np.float32, iterations=50, calculate_training_loss=True)
model.fit(user_post)

user_id = 23
ids, scores = model.recommend(user_id, user_post[user_id], N=10, filter_already_liked_items=False)
print({'post_id': ids, 'score': scores, 'already_fav\'d': np.in1d(ids, user_post[user_id].indices)})

{'post_id': array([127664, 105085, 160655, 205782, 187429, 185678, 188119, 265365,
177336, 220538], dtype=int32), 'score': array([0.38005394, 0.3619479 , 0.35976228, 0.3480073 , 0.34047693,
0.34022546, 0.33973548, 0.33651435, 0.33490524, 0.3335871 ],
dtype=float32), "already_fav'd": array([False, False, False, True, True, True, True, True, True,
False])}

cpu_model = model.to_cpu()

user_id = 23
ids, scores = cpu_model.recommend(user_id, user_post[user_id], N=10, filter_already_liked_items=False)
print({'post_id': ids, 'score': scores, 'already_fav\'d': np.in1d(ids, user_post[user_id].indices)})

{'post_id': array([127664, 105085, 160655, 205782, 187429, 185678, 188119, 265365,
177336, 220538], dtype=int32), 'score': array([0.38005394, 0.36194786, 0.35976222, 0.3480073 , 0.34047693,
0.34022546, 0.33973545, 0.33651435, 0.33490527, 0.3335871 ],
dtype=float32), "already_fav'd": array([False, False, False, True, True, True, True, True, True,
False])}

cpu_model.explain(user_id, user_post[user_id], 127664)

**IndexError Traceback (most recent call last)
/home/eostendarp/workspace/e621/notebook.ipynb Cell 10 line 1
----> 1 cpu_model.explain(23, user_post[user_id], 127664)

File ~/workspace/e621/venv/lib/python3.10/site-packages/implicit/cpu/als.py:386, in AlternatingLeastSquares.explain(self, userid, user_items, itemid, user_weights, N)
383 # user_weights = Cholesky decomposition of Wu^-1
384 # from section 5 of the paper CF for Implicit Feedback Datasets
385 if user_weights is None:
--> 386 A, _ = user_linear_equation(
387 self.item_factors, self.YtY, user_items, userid, self.regularization, self.factors
388 )
389 user_weights = scipy.linalg.cho_factor(A)
390 seed_item = self.item_factors[itemid]

File ~/workspace/e621/venv/lib/python3.10/site-packages/implicit/cpu/als.py:503, in user_linear_equation(Y, YtY, Cui, u, regularization, n_factors)
500 # accumulate YtCuPu in b
501 b = np.zeros(n_factors)
--> 503 for i, confidence in nonzeros(Cui, u):
504 factor = Y[i]
506 if confidence > 0:

File ~/workspace/e621/venv/lib/python3.10/site-packages/implicit/utils.py:11, in nonzeros(m, row)
9 def nonzeros(m, row):
10 """returns the non zeroes of a row in csr_matrix"""
---> 11 for index in range(m.indptr[row], m.indptr[row + 1]):
12 yield m.indices[index], m.data[index]

IndexError: index 23 is out of bounds for axis 0 with size 2**

@gtfuhr
Copy link

gtfuhr commented Aug 7, 2024

I was facing the same issue @eostendarp.
Here's the solution in case anyone else faces this in the future:
You need to alter the line:
cpu_model.explain(user_id, user_post[user_id], 127664)

To:
cpu_model.explain(user_id, user_post, 127664)

The user_id will be used as an index internally in the explain function to select the user data from the user_post, as mentioned by the lib creator in this other github issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants