Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy MovieLens and Criteo example notebooks from NVTabular #166

Merged
merged 1 commit into from
Mar 31, 2022

Conversation

karlhigley
Copy link
Contributor

No description provided.

@karlhigley karlhigley added the documentation Improvements or additions to documentation label Mar 26, 2022
@karlhigley karlhigley added this to the Merlin 22.04 milestone Mar 26, 2022
@karlhigley karlhigley requested a review from bschifferer March 26, 2022 17:34
@karlhigley karlhigley self-assigned this Mar 26, 2022
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@@ -72,7 +72,7 @@
"# External dependencies\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update the title and probably talk about how this functionality has moved.


Reply via ReviewNB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to leave this until a future PR and just merge this that's fine as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm trying to do a quick fix for issues raised on this repo, but I agree that the examples do need some attention and rework.

@@ -72,7 +72,7 @@
"# External dependencies\n",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #5.    import nvtabular as nvt

Do we want to update this to come from core as well?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to get us set up to import from merlin.transforms instead of nvtabular, but I'm not sure it's going to fully land until after the upcoming release.

@@ -1,472 +1,485 @@
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #1.    movies = df_lib.read_csv(os.path.join(INPUT_DATA_DIR, "ml-25m/movies.csv"))

"ml-25m", "movies.csv" or skip os.path.join


Reply via ReviewNB

@@ -2,7 +2,7 @@
"cells": [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this fixes https://github.com//issues/163


Reply via ReviewNB

@@ -2,7 +2,7 @@
"cells": [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.    %time

use %%time


Reply via ReviewNB

@@ -2,7 +2,7 @@
"cells": [
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #2.    %time

use %%time


Reply via ReviewNB

@@ -3,7 +3,7 @@
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #1.    MODEL_DIR = os.path.join(INPUT_DATA_DIR, "model/movielens_hugectr/")

"model", "movielens_hugectr" or skip os.path.join


Reply via ReviewNB

@@ -3,7 +3,7 @@
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #3.    !mkdir -p {MODEL_DIR}"1"

use os.mkdir


Reply via ReviewNB

@@ -3,7 +3,7 @@
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #16.        source=[INPUT_DATA_DIR + "train/_file_list.txt"],

use os.path.join


Reply via ReviewNB

@@ -3,7 +3,7 @@
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #17.        eval_source=INPUT_DATA_DIR + "valid/_file_list.txt",

use os.path.join


Reply via ReviewNB

@@ -3,7 +3,7 @@
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #103.    model.graph_to_json(graph_config_file=MODEL_DIR + "1/movielens.json")

use os.path.join


Reply via ReviewNB

@@ -3,7 +3,7 @@
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #1.    !mv *.model {MODEL_DIR}

this is not os independent. you'll need glob and shutil 


Reply via ReviewNB

@mattf
Copy link

mattf commented Mar 29, 2022

@karlhigley are nvtabular, hugectr and triton expected to run natively on macos or windows?

if not, there is inconsistent use of os independent code in the notebook that should be removed

@@ -1,304 +1,304 @@
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's a practical value for XXX?


Reply via ReviewNB

@@ -1,304 +1,304 @@
{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #6.    BASE_DIR = os.environ.get("BASE_DIR", "/raid/data/criteo")

it is unlikely that everyome has a /raid directory. instead of falling through, this should prompt the user to set BASE_DIR to a location that satisfies the requirements.


Reply via ReviewNB

@@ -110,6 +110,7 @@
"\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code uses os.path.join and assumes the pathsep is /


Reply via ReviewNB

@@ -113,7 +113,7 @@
"BASE_DIR = os.environ.get(\"BASE_DIR\", \"/raid/data/criteo\")\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #7.    input_path = os.environ.get("INPUT_DATA_DIR", os.path.join(BASE_DIR, "test_dask/output"))

uses os.path.join and assumes /


Reply via ReviewNB

@@ -311,9 +311,12 @@
" top_names=[\"loss\"],\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line #19.        source=["/raid/data/criteo/test_dask/output/train/_file_list.txt"],

inconsistent w/ use of os.path.join


Reply via ReviewNB

@@ -311,9 +311,12 @@
" top_names=[\"loss\"],\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of writing model.py the code should just be run directly in the cell, and instead of using import time just use %%time


Reply via ReviewNB

@karlhigley
Copy link
Contributor Author

I'm not the author of these notebooks, nor do I have the bandwidth to address all the review comments on a copy/paste from another repo, so closing this PR and I'll let folks who work on the examples take over from here.

cc @rnyak @bschifferer

@karlhigley karlhigley closed this Mar 29, 2022
@mattf
Copy link

mattf commented Mar 29, 2022

@karlhigley if you're still willing, it's better to have this merged as is and then refined than not merged at all

@EvenOldridge EvenOldridge self-requested a review March 29, 2022 20:54
@EvenOldridge EvenOldridge reopened this Mar 29, 2022
Copy link
Member

@EvenOldridge EvenOldridge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get these changes in and then we can iterate.

@viswa-nvidia
Copy link

Can we merge this and close this PR @karlhigley @benfred @EvenOldridge

@karlhigley karlhigley merged commit 0bf6c17 into main Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
4 participants