Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a sparse matrix (interactions matrix) of user -> movie rating from the Netflix data set #6

Closed
audiodude opened this issue Aug 16, 2024 · 4 comments
Assignees

Comments

@audiodude
Copy link
Collaborator

LightFM requires a matrix, where the rows are the users and the columns are the movies. A '1' in a cell represents that the user liked that movie.

@audiodude
Copy link
Collaborator Author

We should consider whether we can or need to serialize this matrix, or if we can just recreate it each time.

@audiodude audiodude changed the title Create a sparse matrix of user -> movie rating from the Netflix data set Create a sparse matrix (interactions matrix) of user -> movie rating from the Netflix data set Aug 16, 2024
@audiodude
Copy link
Collaborator Author

Basic algorithm:

  • Create a numpy matrix with the right number of rows (users) and columns (movies)
  • For each file (mv_00000X.txt):
    • read the movie id
    • For each line of the file:
      • Remap the user id
      • Update the matrix (skip any movie ratings that aren't 4 or 5)

Finally:
save the matrix to disk (pickle)

Some sample code:

n = 0
if id_ not in remap:
  remap[id_] = n
  n += 1

data[remap[id_]][movie_id] = 5

@audiodude
Copy link
Collaborator Author

As discussed a couple of weeks ago, the interactions matrix needs to be a coo_matrix: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.coo_matrix.html

@audiodude
Copy link
Collaborator Author

Fixed by #10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants