-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy MovieLens and Criteo example notebooks from NVTabular #166
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -72,7 +72,7 @@ | |||
"# External dependencies\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update the title and probably talk about how this functionality has moved.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to leave this until a future PR and just merge this that's fine as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm trying to do a quick fix for issues raised on this repo, but I agree that the examples do need some attention and rework.
@@ -72,7 +72,7 @@ | |||
"# External dependencies\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #5. import nvtabular as nvt
Do we want to update this to come from core as well?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to get us set up to import from merlin.transforms
instead of nvtabular
, but I'm not sure it's going to fully land until after the upcoming release.
@@ -1,472 +1,485 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #1. movies = df_lib.read_csv(os.path.join(INPUT_DATA_DIR, "ml-25m/movies.csv"))
"ml-25m", "movies.csv"
or skip os.path.join
Reply via ReviewNB
@@ -2,7 +2,7 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -2,7 +2,7 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -2,7 +2,7 @@ | |||
"cells": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -3,7 +3,7 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #1. MODEL_DIR = os.path.join(INPUT_DATA_DIR, "model/movielens_hugectr/")
"model", "movielens_hugectr"
or skip os.path.join
Reply via ReviewNB
@@ -3,7 +3,7 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -3,7 +3,7 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -3,7 +3,7 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -3,7 +3,7 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #103. model.graph_to_json(graph_config_file=MODEL_DIR + "1/movielens.json")
use os.path.join
Reply via ReviewNB
@@ -3,7 +3,7 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #1. !mv *.model {MODEL_DIR}
this is not os independent. you'll need glob
and shutil
Reply via ReviewNB
@karlhigley are nvtabular, hugectr and triton expected to run natively on macos or windows? if not, there is inconsistent use of os independent code in the notebook that should be removed |
@@ -1,304 +1,304 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1,304 +1,304 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #6. BASE_DIR = os.environ.get("BASE_DIR", "/raid/data/criteo")
it is unlikely that everyome has a /raid
directory. instead of falling through, this should prompt the user to set BASE_DIR
to a location that satisfies the requirements.
Reply via ReviewNB
@@ -110,6 +110,7 @@ | |||
"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -113,7 +113,7 @@ | |||
"BASE_DIR = os.environ.get(\"BASE_DIR\", \"/raid/data/criteo\")\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #7. input_path = os.environ.get("INPUT_DATA_DIR", os.path.join(BASE_DIR, "test_dask/output"))
uses os.path.join
and assumes /
Reply via ReviewNB
@@ -311,9 +311,12 @@ | |||
" top_names=[\"loss\"],\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #19. source=["/raid/data/criteo/test_dask/output/train/_file_list.txt"],
inconsistent w/ use of os.path.join
Reply via ReviewNB
@@ -311,9 +311,12 @@ | |||
" top_names=[\"loss\"],\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of writing model.py
the code should just be run directly in the cell, and instead of using import time
just use %%time
Reply via ReviewNB
I'm not the author of these notebooks, nor do I have the bandwidth to address all the review comments on a copy/paste from another repo, so closing this PR and I'll let folks who work on the examples take over from here. |
@karlhigley if you're still willing, it's better to have this merged as is and then refined than not merged at all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's get these changes in and then we can iterate.
Can we merge this and close this PR @karlhigley @benfred @EvenOldridge |
No description provided.