Interaction matrix of user #10

siddz415 · 2024-08-23T03:55:38Z

No description provided.

audiodude

This looks good.

I think you might want to think about writing the code that "glues" this all together. Instead of just writing the library functions, what does it looks like to call these functions and produce the final interaction matrix?

data/movie_titles.txt

data/mv_0000001.txt

data/mv_0000002.txt

data_processing/load_data.py

data_processing/process_data.py

data_processing/load_data.py

data_processing/process_data.py

audiodude · 2024-10-25T03:24:49Z

Hey @siddz415 , just wanted to check in if you're able to respond to the comments in this PR? If you're too busy with other stuff that's totally cool. We were thinking we would re-assign the PR if there's not any motion before the meeting on Nov 7. Let us know if you're still interested in working on it.

data_processing/process_data.py

jhanley634

Using latin-1 encoding in load_movie_titles() seems odd, but perhaps we're loading an ancient file which is not in utf8. It's all commented code in any case, so we should probably just delete those lines before merging to main.

The split() on , comma suggests that we might prefer to import csv.

In lightfm_recommendation.py and main.py we have a lot of code up at module level. Recommend burying it within def main(): or whatever. That way it will be safe for some future unit test to import it without side effects. The current code, which lacks a __main__ guard, is innocuous enough. But I'm concerned it will encourage folks to add stuff to this code base which is hard for a "second caller" (such as a unit test) to invoke. Such issues could be addressed in the current PR prior to a merge, or in a subsequent PR.

data_processing/process_data.py

audiodude · 2024-10-26T03:52:46Z

Using latin-1 encoding in load_movie_titles() seems odd, but perhaps we're loading an ancient file which is not in utf8. It's all commented code in any case, so we should probably just delete those lines before merging to main.

The file is from 2007, so it's completely possible that it's not UTF-8. It's also possible that it doesn't have any non-ASCII characters anyway, so it might not matter.

The split() on , comma suggests that we might prefer to import csv.

These text files are not CSV, they are only CSV-like. Individual fields can have commas in them. Specifically movie_titles.txt often looks like:

1023, 1994, Forrest Gump
1024, 2001, Planes, Trains and Automobiles
...

In lightfm_recommendation.py and main.py we have a lot of code up at module level. Recommend burying it within def main(): or whatever. That way it will be safe for some future unit test to import it without side effects. The current code, which lacks a __main__ guard, is innocuous enough. But I'm concerned it will encourage folks to add stuff to this code base which is hard for a "second caller" (such as a unit test) to invoke. Such issues could be addressed in the current PR prior to a merge, or in a subsequent PR.

I agree. Better to put things in a main function now and figure out how to call it later. We can always rename main() to create_matrix() or whatever. I think a lot of this comes from the fact that we haven't really discussed how to organize files/modules/libraries/functions, so things are sort of in random places right now.

…o interaction-matrix-of-user

…bridge/MediaBridge into interaction-matrix-of-user

jhanley634 · 2024-11-09T04:50:15Z

This PR is drifting toward being moribund. It is still mergeable at this point, though it proposes adding more than 4 million lines of CSV data, which for git is not quite a Best Practice.

It might be helpful to split Data from Code, merge the code, and move on to other issues and feature requests.

audiodude · 2024-11-09T05:27:22Z

Thanks for updating the branch @jhanley634.

We discussed early on in our in-person meetings that our strategy should not be to commit the Netflix prize data. That's the reason we're exploring the other PR that automatically downloads it.

This has already been addressed in this comment: #10 (comment)

…o interaction-matrix-of-user

…bridge/MediaBridge into interaction-matrix-of-user

cocomittens · 2024-11-23T18:25:10Z

Ok so I updated it to use a sparse matrix and removed the extra variable (right now its getting the files by calling list_rating_files in create_interaction_matrix, but could also get the files in main like it is currently and change the parameter to that instead of the directory. )

But not pushing the changes yet cause want to make sure I'm not breaking anything (it throws an error, but also doesn't seem to work for different reasons without my changes). So just to make sure,
The create_interaction_matrix function is supposed to go through both of the mv_01 and mv_02 files, right?

But then when going through the file, this line
user_id, rating, _ = line.strip().split(",")
throws an error because it's looking for 3 arguments, and those files only have 1 or 2 per line (with none of them seeming to be in the format of an id, rating). Is this intended to be the movie_titles file? (I would assume not because that doesnt have the ratings?) Is there supposed to be another file somewhere that contains the rating information like this, or is that supposed to be created somehow from the information in these files?

It looks like currently, these are the files being used in that function?
movie_data = list_rating_files(movie_titles_file)
Which I don't believe would theoretically work if I'm understanding it correctly, as the argument to that function is a directory not a file, and seems to be intended to find and return the mv_01 and mv_02 files? Or am I missing something?

cocomittens · 2024-11-23T18:31:45Z

I guess my greater question is, what are in these mv files exactly?
The first one looks to be a movie_id: , followed by a bunch of user ids? (Presumably those that rated that movie?)
Then the second one is movie_id: , then user ids and dates?
Where is the rating (presumably on a 1-5 scale?) supposed to come from?

main.py

audiodude · 2024-11-23T18:43:20Z

I guess my greater question is, what are in these mv files exactly? The first one looks to be a movie_id: , followed by a bunch of user ids? (Presumably those that rated that movie?) Then the second one is movie_id: , then user ids and dates? Where is the rating (presumably on a 1-5 scale?) supposed to come from?

The mv_*.txt files are in the form:

MOVIE_ID:
USER_ID,RATING,DATE_OF_RATING**
USER_ID_2,RATING,DATE_OF_RATING
USER_ID_3,RATING,DATE_OF_RATING**

There is a file for each movie id, and the id in the file name corresponds to the id of the movie that is referenced.

cocomittens · 2024-11-23T18:49:31Z

I guess my greater question is, what are in these mv files exactly? The first one looks to be a movie_id: , followed by a bunch of user ids? (Presumably those that rated that movie?) Then the second one is movie_id: , then user ids and dates? Where is the rating (presumably on a 1-5 scale?) supposed to come from?

The mv_*.txt files are in the form:

MOVIE_ID: USER_ID,RATING,DATE_OF_RATING** USER_ID_2,RATING,DATE_OF_RATING USER_ID_3,RATING,DATE_OF_RATING**

There is a file for each movie id, and the id in the file name corresponds to the id of the movie that is referenced.

Ok thats good to know, TBH I will push my changes then for now cause it should theoretically work if thats true...
However the files that I deleted from this PR sadly dont follow that format for some reason, not sure where the rest of it went?
~~But where can I actually get this data?~~
I see that it is in the same place as the other data

…e reasonable location

audiodude · 2024-11-23T18:59:32Z

Okay we're clearly working on this at the exact same moment, which probably isn't a great idea. Please pull. I consolidated all of the files and put them in a sane place.

cocomittens · 2024-11-23T19:03:19Z

Okay we're clearly working on this at the exact same moment, which probably isn't a great idea. Please pull. I consolidated all of the files and put them in a sane place.

Will do! Hopefully didnt create some kind of merge conflict, but yeah will make sure to avoid any future interactions

main.py

siddz415 added 2 commits August 22, 2024 19:16

added interaction matrix

15ac13d

restructured folders

05e8dc5

audiodude requested changes Aug 24, 2024

View reviewed changes

audiodude reviewed Aug 30, 2024

View reviewed changes

data_processing/process_data.py Outdated Show resolved Hide resolved

made changes to the files

d908840

audiodude reviewed Sep 1, 2024

View reviewed changes

data_processing/process_data.py Outdated Show resolved Hide resolved

added lightfm_recommendation.py

b5143b0

Merge branch 'main' into interaction-matrix-of-user

3f6bc40

audiodude reviewed Oct 25, 2024

View reviewed changes

data_processing/process_data.py Outdated Show resolved Hide resolved

jhanley634 approved these changes Oct 26, 2024

View reviewed changes

audiodude reviewed Oct 26, 2024

View reviewed changes

data_processing/process_data.py Outdated Show resolved Hide resolved

jhanley634 and others added 4 commits November 8, 2024 20:37

Merge branch 'main' of https://github.com/noisebridge/MediaBridge int…

1bc61e7

…o interaction-matrix-of-user

Merge branch 'main' into interaction-matrix-of-user

08459c1

Merge branch 'interaction-matrix-of-user' of https://github.com/noise…

4daa1f2

…bridge/MediaBridge into interaction-matrix-of-user

isort

4a174de

cocomittens added 7 commits November 22, 2024 21:57

Merge branch 'main' into interaction-matrix-of-user

a5f952f

Merge branch 'main' of https://github.com/noisebridge/MediaBridge int…

3c72e1a

…o interaction-matrix-of-user

Remove movie titles

69bd478

Merge branch 'interaction-matrix-of-user' of https://github.com/noise…

c84efab

…bridge/MediaBridge into interaction-matrix-of-user

Remove mv-01

05f41d0

Remove mv-02

25f83dd

Remove extra else statement

038ad44

audiodude reviewed Nov 23, 2024

View reviewed changes

main.py Outdated Show resolved Hide resolved

cocomittens added 2 commits November 23, 2024 10:45

Add sparse matrix, fix files

432df32

Comment unused code

f04ba20

cocomittens and others added 2 commits November 23, 2024 10:52

Remove unused import

6d197bd

Consolidate all files and functions for interaction matrix into a mor…

e6d4548

…e reasonable location

cocomittens and others added 3 commits November 23, 2024 11:16

Update matrix

27d303e

Remove errant main file

26fde84

Refactor file listing code

14d136f

audiodude approved these changes Dec 6, 2024

View reviewed changes

main.py Outdated Show resolved Hide resolved

Merge branch 'main' into interaction-matrix-of-user

3655b26

cocomittens approved these changes Dec 6, 2024

View reviewed changes

main.py Outdated Show resolved Hide resolved

Ruff format

713639a

audiodude requested a review from cocomittens December 6, 2024 03:07

Update interaction_matrix.py

29346a6

cocomittens approved these changes Dec 6, 2024

View reviewed changes

audiodude approved these changes Dec 6, 2024

View reviewed changes

audiodude merged commit 35709da into main Dec 6, 2024
4 checks passed

audiodude deleted the interaction-matrix-of-user branch December 6, 2024 03:11

audiodude mentioned this pull request Dec 13, 2024

Create a sparse matrix (interactions matrix) of user -> movie rating from the Netflix data set #6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interaction matrix of user #10

Interaction matrix of user #10

siddz415 commented Aug 23, 2024

audiodude left a comment

audiodude commented Oct 25, 2024

jhanley634 left a comment

audiodude commented Oct 26, 2024

jhanley634 commented Nov 9, 2024

audiodude commented Nov 9, 2024

cocomittens commented Nov 23, 2024

cocomittens commented Nov 23, 2024 •

edited

Loading

audiodude commented Nov 23, 2024

cocomittens commented Nov 23, 2024 •

edited

Loading

audiodude commented Nov 23, 2024

cocomittens commented Nov 23, 2024 •

edited

Loading

Interaction matrix of user #10

Interaction matrix of user #10

Conversation

siddz415 commented Aug 23, 2024

audiodude left a comment

Choose a reason for hiding this comment

audiodude commented Oct 25, 2024

jhanley634 left a comment

Choose a reason for hiding this comment

audiodude commented Oct 26, 2024

jhanley634 commented Nov 9, 2024

audiodude commented Nov 9, 2024

cocomittens commented Nov 23, 2024

cocomittens commented Nov 23, 2024 • edited Loading

audiodude commented Nov 23, 2024

cocomittens commented Nov 23, 2024 • edited Loading

audiodude commented Nov 23, 2024

cocomittens commented Nov 23, 2024 • edited Loading

cocomittens commented Nov 23, 2024 •

edited

Loading

cocomittens commented Nov 23, 2024 •

edited

Loading

cocomittens commented Nov 23, 2024 •

edited

Loading