-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove upstream dependencies that have no outputs #107
remove upstream dependencies that have no outputs #107
Conversation
Click to view CI ResultsGitHub pull request #107 of commit c5ffc695caf8ddcb2b9123386c51897534dfe8e2, no merge conflicts. Running as SYSTEM Setting status of c5ffc695caf8ddcb2b9123386c51897534dfe8e2 to PENDING with url https://10.20.13.93:8080/job/merlin_core/82/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_core using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5 > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/core > git --version # timeout=10 using GIT_ASKPASS to set credentials login for merlin-systems username and pass > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/107/*:refs/remotes/origin/pr/107/* # timeout=10 > git rev-parse c5ffc695caf8ddcb2b9123386c51897534dfe8e2^{commit} # timeout=10 Checking out Revision c5ffc695caf8ddcb2b9123386c51897534dfe8e2 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f c5ffc695caf8ddcb2b9123386c51897534dfe8e2 # timeout=10 Commit message: "remove upstream dependencies that have no outputs" > git rev-list --no-walk d74519e495606eeac3237ec814db4c9df31c80c6 # timeout=10 [merlin_core] $ /bin/bash /tmp/jenkins2969909161860032254.sh ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 342 items / 1 skipped |
if node.input_schema and len(node.input_schema): | ||
output_columns_to_remove = node.remove_inputs(columns_to_remove) | ||
|
||
for child in node.children: | ||
nodes_to_process.append((child, to_remove + output_columns_to_remove)) | ||
nodes_to_process.append( | ||
(child, list(set(to_remove + output_columns_to_remove))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line's change is just to de-duplicate the list of output columns. without it, we would end up with something like ["target", "target"]
that would still technically work but made less sense when debugging this.
LGTM once it has a test |
Click to view CI ResultsGitHub pull request #107 of commit 74969c31bbe8f94331db230748737996b4814547, no merge conflicts. Running as SYSTEM Setting status of 74969c31bbe8f94331db230748737996b4814547 to PENDING with url https://10.20.13.93:8080/job/merlin_core/83/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_core using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5 > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/core > git --version # timeout=10 using GIT_ASKPASS to set credentials login for merlin-systems username and pass > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/107/*:refs/remotes/origin/pr/107/* # timeout=10 > git rev-parse 74969c31bbe8f94331db230748737996b4814547^{commit} # timeout=10 Checking out Revision 74969c31bbe8f94331db230748737996b4814547 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 74969c31bbe8f94331db230748737996b4814547 # timeout=10 Commit message: "add regression test for Graph dependencies" > git rev-list --no-walk c5ffc695caf8ddcb2b9123386c51897534dfe8e2 # timeout=10 [merlin_core] $ /bin/bash /tmp/jenkins11609524086250452249.sh ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 343 items / 1 skipped |
Documentation preview |
I just noticed this PR - after I had already submitted a fix of my own #108 =( |
@benfred @nv-alaiacano @karlhigley I would like to thank you all for this super timely fix! 🙂 |
This resolves the bug reported in NVIDIA-Merlin/NVTabular#1632
The issue was that we were successfully removing parents when calling
remove_inputs
but the node still showed up in thedependencies
of the downstream nodes. We add logic to remove any dependency that does not have anoutput_schema
.I'll try adding a regression test as well.