Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Graph.remove_inputs #108

Closed
wants to merge 3 commits into from
Closed

Fix Graph.remove_inputs #108

wants to merge 3 commits into from

Conversation

benfred
Copy link
Member

@benfred benfred commented Jul 23, 2022

The remove_inputs call on the graph wasn't working - since it was hanging on
to a reference to the removed columns via the dependencies list. Fix and
add a unittest that would have caught this.

Fixes NVIDIA-Merlin/NVTabular#1632

The `remove_inputs` call on the graph wasn't working - since it was hanging on
to a reference to the removed columns via the `dependencies` list. Fix and
add a unittest that would have caught this.

Fixes NVIDIA-Merlin/NVTabular#1632
@benfred benfred added the bug Something isn't working label Jul 23, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #108 of commit 255be9063edf1548aac12819359313f3f5dd8146, no merge conflicts.
Running as SYSTEM
Setting status of 255be9063edf1548aac12819359313f3f5dd8146 to PENDING with url https://10.20.13.93:8080/job/merlin_core/84/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/108/*:refs/remotes/origin/pr/108/* # timeout=10
 > git rev-parse 255be9063edf1548aac12819359313f3f5dd8146^{commit} # timeout=10
Checking out Revision 255be9063edf1548aac12819359313f3f5dd8146 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 255be9063edf1548aac12819359313f3f5dd8146 # timeout=10
Commit message: "Fix graph.remove_inputs"
 > git rev-list --no-walk 74969c31bbe8f94331db230748737996b4814547 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins1550920915489169753.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 343 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/dag/test_base_operator.py .... [ 1%]
tests/unit/dag/test_column_selector.py .......................... [ 9%]
tests/unit/dag/test_graph.py . [ 9%]
tests/unit/dag/test_tags.py ...... [ 11%]
tests/unit/dag/ops/test_selection.py ... [ 12%]
tests/unit/io/test_io.py ............................................... [ 25%]
................................................................ [ 44%]
tests/unit/schema/test_column_schemas.py ............................... [ 53%]
........................................................................ [ 74%]
....................................................................... [ 95%]
tests/unit/schema/test_schema.py ...... [ 97%]
tests/unit/schema/test_schema_io.py .. [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 71 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 46245 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43291 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35631 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44409 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33645 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44521 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================= 343 passed, 1 skipped, 82 warnings in 50.95s =================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins6922567919154791407.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #108 of commit d4757a8d4ff3482b00a1afb34bf22d018e4680ae, no merge conflicts.
Running as SYSTEM
Setting status of d4757a8d4ff3482b00a1afb34bf22d018e4680ae to PENDING with url https://10.20.13.93:8080/job/merlin_core/85/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/108/*:refs/remotes/origin/pr/108/* # timeout=10
 > git rev-parse d4757a8d4ff3482b00a1afb34bf22d018e4680ae^{commit} # timeout=10
Checking out Revision d4757a8d4ff3482b00a1afb34bf22d018e4680ae (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d4757a8d4ff3482b00a1afb34bf22d018e4680ae # timeout=10
Commit message: "spelling"
 > git rev-list --no-walk 255be9063edf1548aac12819359313f3f5dd8146 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins2869847055435311762.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 343 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/dag/test_base_operator.py .... [ 1%]
tests/unit/dag/test_column_selector.py .......................... [ 9%]
tests/unit/dag/test_graph.py . [ 9%]
tests/unit/dag/test_tags.py ...... [ 11%]
tests/unit/dag/ops/test_selection.py ... [ 12%]
tests/unit/io/test_io.py ............................................... [ 25%]
................................................................ [ 44%]
tests/unit/schema/test_column_schemas.py ............................... [ 53%]
........................................................................ [ 74%]
....................................................................... [ 95%]
tests/unit/schema/test_schema.py ...... [ 97%]
tests/unit/schema/test_schema_io.py .. [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 71 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35315 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39389 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36279 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40397 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34251 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41037 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================= 343 passed, 1 skipped, 82 warnings in 50.56s =================
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins1900167118721291153.sh

@github-actions
Copy link

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-108

@benfred benfred changed the title Fix graph.remove_inputs Fix Graph.remove_inputs Jul 23, 2022
@nv-alaiacano
Copy link
Contributor

It seems like we were all working on this at the same time. Either PR is fine by me. #107

@karlhigley
Copy link
Contributor

I'm just noticing this PR after merging the other one, so...why not both? 😆

@nv-alaiacano
Copy link
Contributor

Closing this because #107 was merged and we don't want to merge both!

@karlhigley
Copy link
Contributor

I actually don't think it's problematic to merge both here, so long as the tests pass. There may be subsequent clean-up we want to do, but this fix is really nice.

@karlhigley karlhigley deleted the fix_remove_inputs branch January 13, 2023 16:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Workflow#remove_inputs doesn't work
4 participants