Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it possible to use schema-awareness outside of operators and transform dictionary-like objects #139

Closed
wants to merge 39 commits into from

Conversation

karlhigley
Copy link
Contributor

This should make it easier to use Merlin schemas outside the context of operators and DAGs without dataframes.

Included changes:

  • Creates a ComputeSchemaMixin that contains only the schema computation parts of BaseOperator
  • Creates DictionaryLike, SeriesLike, DataframeLike, and Transformable Python protocols
  • Applies the new protocols throughout the merlin.dag and merlin.dispatch packages (replacing DataFrameType and SeriesType)
  • Adds DictArray and Column classes that conform to the DataFrameLike and SeriesLike protocols (respectively)

@karlhigley karlhigley added the enhancement New feature or request label Sep 13, 2022
@karlhigley karlhigley added this to the Merlin 22.10 milestone Sep 13, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #139 of commit 8600b240837f6be6b2cd57b2769b77ba9f2c3a58, no merge conflicts.
Running as SYSTEM
Setting status of 8600b240837f6be6b2cd57b2769b77ba9f2c3a58 to PENDING with url https://10.20.13.93:8080/job/merlin_core/199/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/139/*:refs/remotes/origin/pr/139/* # timeout=10
 > git rev-parse 8600b240837f6be6b2cd57b2769b77ba9f2c3a58^{commit} # timeout=10
Checking out Revision 8600b240837f6be6b2cd57b2769b77ba9f2c3a58 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8600b240837f6be6b2cd57b2769b77ba9f2c3a58 # timeout=10
Commit message: "Use protocols in dispatch"
 > git rev-list --no-walk ec99496883329a91de59fed300c77da0c1bc030c # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins3928575102079743059.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu inst-nodeps: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.6.0+11.g8600b24.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,alabaster==0.7.12,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.7,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,attrs==22.1.0,awscli==1.25.72,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.51,botocore==1.27.71,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,clang==5.0,click==8.1.3,cloudpickle==2.1.0,colorama==0.4.4,coverage==6.4.4,cuda-python==11.7.1,cudf==22.4.0,cupy-cuda116==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dask-cuda==22.4.0,dask-cudf==22.4.0,dbus-python==1.2.16,debugpy==1.6.2,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.3.0,distro==1.7.0,dm-tree==0.1.7,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==0.10.0,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.82.0,fastavro==1.6.0,fastcore==1.5.24,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.0,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.0,google-auth==2.11.0,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.2,grpcio==1.41.0,grpcio-channelz==1.47.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,hpack==4.0.0,httptools==0.4.0,hugectr2onnx==0.0.0,huggingface-hub==0.8.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.0,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.1,ipython==8.4.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.1.0,json5==0.9.9,jsonschema==4.9.1,jupyter-cache==0.4.3,jupyter-client==7.3.4,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyterlab==3.4.5,jupyterlab-pygments==0.2.2,jupyterlab-server==2.15.0,jupyterlab-widgets==1.1.0,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.7.1,libclang==14.0.6,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.0,locket==1.0.0,lxml==4.9.1,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.5.3,matplotlib-inline==0.1.3,mdit-py-plugins==0.2.8,merlin-core==0.6.0+11.g8600b24,merlin-models==0.6.0+45.g5a345d9c1,merlin-systems==0+untagged.105.gf89cc51,mistune==0.8.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.6,nbconvert==6.5.3,nbdime==3.1.1,nbformat==5.4.0,nest-asyncio==1.5.5,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.0,numpy==1.21.5,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.3.3+15.g16e4e34e9),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.0,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,pluggy==1.0.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.30,proto-plus==1.19.6,protobuf==3.19.4,psutil==5.9.1,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==6.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.12.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyrsistent==0.18.1,pytest==7.1.2,pytest-cov==3.0.0,pytest-forked==1.4.0,pytest-xdist==2.5.0,python-apt==2.0.0+ubuntu0.20.4.7,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==23.2.1,regex==2022.7.25,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rmm==21.12.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.9.0,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.4,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.2.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.1.1,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.36,stack-data==0.4.0,starlette==0.19.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.6.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.26.0,tensorflow-metadata==1.9.0,termcolor==1.1.0,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.25.1,tqdm==4.64.0,traitlets==5.3.0,transformers==4.12.0,transformers4rec==0.1.11+10.g21a2a836a,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.22.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.11,uvicorn==0.18.3,uvloop==0.16.0,versioneer==0.20,virtualenv==20.16.4,wandb==0.13.1,watchfiles==0.16.1,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.3.3,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='2715396807'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 364 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/core/test_protocols.py ........... [ 3%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .......................... [ 12%]
tests/unit/dag/test_dictarray.py .... [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py ... [ 15%]
tests/unit/io/test_io.py ............................................... [ 28%]
................................................................ [ 45%]
tests/unit/schema/test_column_schemas.py ....... [ 47%]
tests/unit/schema/test_schema.py ............ [ 50%]
tests/unit/schema/test_schema_io.py .................................... [ 60%]
........................................................................ [ 80%]
........................................................ [ 95%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 71 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35821 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35113 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45433 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 38853 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41957 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 42017 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 9 4 56%
merlin/core/dispatch.py 342 209 39%
merlin/core/protocols.py 49 16 67%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 5 0 100%
merlin/dag/base_operator.py 44 8 82%
merlin/dag/dictarray.py 43 16 63%
merlin/dag/graph.py 99 39 61%
merlin/dag/node.py 342 183 46%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 18 5 72%
merlin/dag/ops/selection.py 20 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/schema_mixin.py 89 18 80%
merlin/dag/selector.py 88 10 89%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 346 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 191 19 90%
merlin/schema/schema.py 205 41 80%
merlin/schema/tags.py 82 1 99%

TOTAL 4438 1415 68%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
============ 364 passed, 1 skipped, 89 warnings in 63.08s (0:01:03) ============
/usr/local/lib/python3.8/dist-packages/coverage/report.py:81: CoverageWarning: Couldn't parse Python file '/var/jenkins_home/workspace/merlin_core/core/merlin/dag/executors.py' (couldnt-parse)
coverage._warn(msg, slug="couldnt-parse")
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins11933323730849429339.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #139 of commit cb7d3f8298817d9d94d6c6fe23e3e44b8980345a, no merge conflicts.
Running as SYSTEM
Setting status of cb7d3f8298817d9d94d6c6fe23e3e44b8980345a to PENDING with url https://10.20.13.93:8080/job/merlin_core/206/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/139/*:refs/remotes/origin/pr/139/* # timeout=10
 > git rev-parse cb7d3f8298817d9d94d6c6fe23e3e44b8980345a^{commit} # timeout=10
Checking out Revision cb7d3f8298817d9d94d6c6fe23e3e44b8980345a (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f cb7d3f8298817d9d94d6c6fe23e3e44b8980345a # timeout=10
Commit message: "Adjust method signatures in DAG operators"
 > git rev-list --no-walk 9ebe07cfac0b9837c9168efde2548a7186469e8f # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins1786803776840566357.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu inst-nodeps: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.6.0+35.gcb7d3f8.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,alabaster==0.7.12,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.7,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,attrs==22.1.0,awscli==1.25.73,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.51,botocore==1.27.72,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,clang==5.0,click==8.1.3,cloudpickle==2.1.0,colorama==0.4.4,coverage==6.4.4,cuda-python==11.7.1,cudf==22.4.0,cupy-cuda116==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dask-cuda==22.4.0,dask-cudf==22.4.0,dbus-python==1.2.16,debugpy==1.6.2,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.3.0,distro==1.7.0,dm-tree==0.1.7,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==0.10.0,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.82.0,fastavro==1.6.0,fastcore==1.5.24,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.0,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.0,google-auth==2.11.0,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.2,grpcio==1.41.0,grpcio-channelz==1.47.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,hpack==4.0.0,httptools==0.4.0,hugectr2onnx==0.0.0,huggingface-hub==0.8.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.0,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.1,ipython==8.4.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.1.0,json5==0.9.9,jsonschema==4.9.1,jupyter-cache==0.4.3,jupyter-client==7.3.4,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyterlab==3.4.5,jupyterlab-pygments==0.2.2,jupyterlab-server==2.15.0,jupyterlab-widgets==1.1.0,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.7.1,libclang==14.0.6,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.0,locket==1.0.0,lxml==4.9.1,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.5.3,matplotlib-inline==0.1.3,mdit-py-plugins==0.2.8,merlin-core==0.6.0+35.gcb7d3f8,merlin-models==0.6.0+45.g5a345d9c1,merlin-systems==0+untagged.105.gf89cc51,mistune==0.8.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.6,nbconvert==6.5.3,nbdime==3.1.1,nbformat==5.4.0,nest-asyncio==1.5.5,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.0,numpy==1.21.5,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.3.3+15.g16e4e34e9),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.0,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,pluggy==1.0.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.30,proto-plus==1.19.6,protobuf==3.19.4,psutil==5.9.1,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==6.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.12.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyrsistent==0.18.1,pytest==7.1.2,pytest-cov==3.0.0,pytest-forked==1.4.0,pytest-xdist==2.5.0,python-apt==2.0.0+ubuntu0.20.4.7,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==23.2.1,regex==2022.7.25,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rmm==21.12.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.9.0,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.4,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.2.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.1.1,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.36,stack-data==0.4.0,starlette==0.19.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.6.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.26.0,tensorflow-metadata==1.9.0,termcolor==1.1.0,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.25.1,tqdm==4.64.0,traitlets==5.3.0,transformers==4.12.0,transformers4rec==0.1.11+10.g21a2a836a,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.22.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.11,uvicorn==0.18.3,uvloop==0.16.0,versioneer==0.20,virtualenv==20.16.4,wandb==0.13.1,watchfiles==0.16.1,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.3.3,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='287177704'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 367 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/core/test_protocols.py ......... [ 2%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py ........................... [ 11%]
tests/unit/dag/test_dictarray.py ... [ 12%]
tests/unit/dag/test_executors.py .... [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py ... [ 15%]
tests/unit/io/test_io.py ............................................... [ 28%]
................................................................ [ 45%]
tests/unit/schema/test_column_schemas.py ....... [ 47%]
tests/unit/schema/test_schema.py ............. [ 51%]
tests/unit/schema/test_schema_io.py .................................... [ 61%]
........................................................................ [ 80%]
........................................................ [ 95%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 71 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41615 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41069 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36617 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36359 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43375 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43231 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 9 4 56%
merlin/core/dispatch.py 342 207 39%
merlin/core/protocols.py 68 29 57%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 6 0 100%
merlin/dag/base_operator.py 44 5 89%
merlin/dag/dictarray.py 54 15 72%
merlin/dag/executors.py 129 55 57%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 342 160 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 18 5 72%
merlin/dag/ops/selection.py 20 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/schema_mixin.py 89 18 80%
merlin/dag/selector.py 101 9 91%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 346 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 191 19 90%
merlin/schema/schema.py 209 33 84%
merlin/schema/tags.py 82 1 99%

TOTAL 4615 1441 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
============ 367 passed, 1 skipped, 89 warnings in 64.83s (0:01:04) ============
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins11816507181570373560.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #139 of commit b3c910211a7ef8562be9443f05a17321bb4c49f8, no merge conflicts.
Running as SYSTEM
Setting status of b3c910211a7ef8562be9443f05a17321bb4c49f8 to PENDING with url https://10.20.13.93:8080/job/merlin_core/207/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/139/*:refs/remotes/origin/pr/139/* # timeout=10
 > git rev-parse b3c910211a7ef8562be9443f05a17321bb4c49f8^{commit} # timeout=10
Checking out Revision b3c910211a7ef8562be9443f05a17321bb4c49f8 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f b3c910211a7ef8562be9443f05a17321bb4c49f8 # timeout=10
Commit message: "Add `merlin/core` to interrogate ignores"
 > git rev-list --no-walk cb7d3f8298817d9d94d6c6fe23e3e44b8980345a # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins12183421768119601733.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu inst-nodeps: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.6.0+37.gb3c9102.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,alabaster==0.7.12,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.7,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,attrs==22.1.0,awscli==1.25.73,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.51,botocore==1.27.72,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,clang==5.0,click==8.1.3,cloudpickle==2.1.0,colorama==0.4.4,coverage==6.4.4,cuda-python==11.7.1,cudf==22.4.0,cupy-cuda116==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dask-cuda==22.4.0,dask-cudf==22.4.0,dbus-python==1.2.16,debugpy==1.6.2,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.3.0,distro==1.7.0,dm-tree==0.1.7,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==0.10.0,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.82.0,fastavro==1.6.0,fastcore==1.5.24,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.0,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.0,google-auth==2.11.0,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.2,grpcio==1.41.0,grpcio-channelz==1.47.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,hpack==4.0.0,httptools==0.4.0,hugectr2onnx==0.0.0,huggingface-hub==0.8.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.0,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.1,ipython==8.4.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.1.0,json5==0.9.9,jsonschema==4.9.1,jupyter-cache==0.4.3,jupyter-client==7.3.4,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyterlab==3.4.5,jupyterlab-pygments==0.2.2,jupyterlab-server==2.15.0,jupyterlab-widgets==1.1.0,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.7.1,libclang==14.0.6,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.0,locket==1.0.0,lxml==4.9.1,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.5.3,matplotlib-inline==0.1.3,mdit-py-plugins==0.2.8,merlin-core==0.6.0+37.gb3c9102,merlin-models==0.6.0+45.g5a345d9c1,merlin-systems==0+untagged.105.gf89cc51,mistune==0.8.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.6,nbconvert==6.5.3,nbdime==3.1.1,nbformat==5.4.0,nest-asyncio==1.5.5,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.0,numpy==1.21.5,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.3.3+15.g16e4e34e9),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.0,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,pluggy==1.0.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.30,proto-plus==1.19.6,protobuf==3.19.4,psutil==5.9.1,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==6.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.12.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyrsistent==0.18.1,pytest==7.1.2,pytest-cov==3.0.0,pytest-forked==1.4.0,pytest-xdist==2.5.0,python-apt==2.0.0+ubuntu0.20.4.7,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==23.2.1,regex==2022.7.25,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rmm==21.12.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.9.0,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.4,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.2.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.1.1,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.36,stack-data==0.4.0,starlette==0.19.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.6.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.26.0,tensorflow-metadata==1.9.0,termcolor==1.1.0,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.25.1,tqdm==4.64.0,traitlets==5.3.0,transformers==4.12.0,transformers4rec==0.1.11+10.g21a2a836a,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.22.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.11,uvicorn==0.18.3,uvloop==0.16.0,versioneer==0.20,virtualenv==20.16.4,wandb==0.13.1,watchfiles==0.16.1,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.3.3,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='518487231'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 367 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/core/test_protocols.py ......... [ 2%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py ........................... [ 11%]
tests/unit/dag/test_dictarray.py ... [ 12%]
tests/unit/dag/test_executors.py .... [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py ... [ 15%]
tests/unit/io/test_io.py ............................................... [ 28%]
................................................................ [ 45%]
tests/unit/schema/test_column_schemas.py ....... [ 47%]
tests/unit/schema/test_schema.py ............. [ 51%]
tests/unit/schema/test_schema_io.py .................................... [ 61%]
........................................................................ [ 80%]
........................................................ [ 95%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 71 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41095 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43915 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34833 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33777 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40643 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43765 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 9 4 56%
merlin/core/dispatch.py 342 207 39%
merlin/core/protocols.py 68 29 57%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 6 0 100%
merlin/dag/base_operator.py 44 5 89%
merlin/dag/dictarray.py 54 15 72%
merlin/dag/executors.py 129 55 57%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 342 160 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 18 5 72%
merlin/dag/ops/selection.py 20 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/schema_mixin.py 89 18 80%
merlin/dag/selector.py 101 9 91%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 346 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 191 19 90%
merlin/schema/schema.py 209 33 84%
merlin/schema/tags.py 82 1 99%

TOTAL 4615 1441 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
============ 367 passed, 1 skipped, 89 warnings in 64.62s (0:01:04) ============
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins13209246532250637734.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #139 of commit 865b4d3f4de02ac39ff7adab6ee823e45a077743, no merge conflicts.
Running as SYSTEM
Setting status of 865b4d3f4de02ac39ff7adab6ee823e45a077743 to PENDING with url https://10.20.13.93:8080/job/merlin_core/208/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/139/*:refs/remotes/origin/pr/139/* # timeout=10
 > git rev-parse 865b4d3f4de02ac39ff7adab6ee823e45a077743^{commit} # timeout=10
Checking out Revision 865b4d3f4de02ac39ff7adab6ee823e45a077743 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 865b4d3f4de02ac39ff7adab6ee823e45a077743 # timeout=10
Commit message: "Make the executor tests CPU/pandas compatible"
 > git rev-list --no-walk b3c910211a7ef8562be9443f05a17321bb4c49f8 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins6955275658770030891.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu inst-nodeps: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.6.0+38.g865b4d3.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,alabaster==0.7.12,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.7,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,attrs==22.1.0,awscli==1.25.73,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.51,botocore==1.27.72,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,clang==5.0,click==8.1.3,cloudpickle==2.1.0,colorama==0.4.4,coverage==6.4.4,cuda-python==11.7.1,cudf==22.4.0,cupy-cuda116==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dask-cuda==22.4.0,dask-cudf==22.4.0,dbus-python==1.2.16,debugpy==1.6.2,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.3.0,distro==1.7.0,dm-tree==0.1.7,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==0.10.0,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.82.0,fastavro==1.6.0,fastcore==1.5.24,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.0,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.0,google-auth==2.11.0,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.2,grpcio==1.41.0,grpcio-channelz==1.47.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,hpack==4.0.0,httptools==0.4.0,hugectr2onnx==0.0.0,huggingface-hub==0.8.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.0,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.1,ipython==8.4.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.1.0,json5==0.9.9,jsonschema==4.9.1,jupyter-cache==0.4.3,jupyter-client==7.3.4,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyterlab==3.4.5,jupyterlab-pygments==0.2.2,jupyterlab-server==2.15.0,jupyterlab-widgets==1.1.0,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.7.1,libclang==14.0.6,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.0,locket==1.0.0,lxml==4.9.1,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.5.3,matplotlib-inline==0.1.3,mdit-py-plugins==0.2.8,merlin-core==0.6.0+38.g865b4d3,merlin-models==0.6.0+45.g5a345d9c1,merlin-systems==0+untagged.105.gf89cc51,mistune==0.8.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.6,nbconvert==6.5.3,nbdime==3.1.1,nbformat==5.4.0,nest-asyncio==1.5.5,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.0,numpy==1.21.5,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.3.3+15.g16e4e34e9),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.0,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,pluggy==1.0.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.30,proto-plus==1.19.6,protobuf==3.19.4,psutil==5.9.1,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==6.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.12.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyrsistent==0.18.1,pytest==7.1.2,pytest-cov==3.0.0,pytest-forked==1.4.0,pytest-xdist==2.5.0,python-apt==2.0.0+ubuntu0.20.4.7,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==23.2.1,regex==2022.7.25,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rmm==21.12.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.9.0,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.4,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.2.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.1.1,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.36,stack-data==0.4.0,starlette==0.19.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.6.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.26.0,tensorflow-metadata==1.9.0,termcolor==1.1.0,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.25.1,tqdm==4.64.0,traitlets==5.3.0,transformers==4.12.0,transformers4rec==0.1.11+10.g21a2a836a,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.22.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.11,uvicorn==0.18.3,uvloop==0.16.0,versioneer==0.20,virtualenv==20.16.4,wandb==0.13.1,watchfiles==0.16.1,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.3.3,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='55312805'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 367 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/core/test_protocols.py ......... [ 2%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py ........................... [ 11%]
tests/unit/dag/test_dictarray.py ... [ 12%]
tests/unit/dag/test_executors.py .... [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py ... [ 15%]
tests/unit/io/test_io.py ............................................... [ 28%]
................................................................ [ 45%]
tests/unit/schema/test_column_schemas.py ....... [ 47%]
tests/unit/schema/test_schema.py ............. [ 51%]
tests/unit/schema/test_schema_io.py .................................... [ 61%]
........................................................................ [ 80%]
........................................................ [ 95%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 71 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37987 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41427 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36833 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40593 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36249 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45313 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 9 4 56%
merlin/core/dispatch.py 342 207 39%
merlin/core/protocols.py 68 29 57%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 6 0 100%
merlin/dag/base_operator.py 44 5 89%
merlin/dag/dictarray.py 54 15 72%
merlin/dag/executors.py 129 55 57%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 342 160 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 18 5 72%
merlin/dag/ops/selection.py 20 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/schema_mixin.py 89 18 80%
merlin/dag/selector.py 101 9 91%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 346 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 191 19 90%
merlin/schema/schema.py 209 33 84%
merlin/schema/tags.py 82 1 99%

TOTAL 4615 1441 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
============ 367 passed, 1 skipped, 89 warnings in 65.29s (0:01:05) ============
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins12864883319846534834.sh

else:
DataFrameType = pd.DataFrame # type: ignore
SeriesType = pd.Series # type: ignore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These types have been a pain since they don't play nicely with Mypy, which doesn't like dynamically defined types like this. This PR replaces them with Python protocols that can be used as a static types and also checked at runtime (with isinstance.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could keep this code around for a follow-up PR to remove once the references to these in the other repos have been removed. That way we might be able to merge this without breaking tests in the other repos as a result, making the transition smoother?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good idea 👍🏻

@runtime_checkable
class DictLike(Protocol):
def __iter__(self):
return iter([])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual implementation of these methods don't matter, since they'll be overridden by anything that explicitly implements the protocol and otherwise are only used to check that matching method signatures are available on anything compared with isinstance(). These fake implementations make the linters happy though, since they insist that __iter__() must return an iterator and __len__ must return a non-negative integer.


return results

# TODO: Replace `nodes` with `graph` here?
def transform(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method has been long and hard to read for as long as it has existed (well before Executors were a thing yet.) Here it's just broken down into some smaller methods that help explain what's happening via the method names to make it easier to read and think about.

@@ -91,7 +91,7 @@ def compute_input_schema(
"""
return parents_schema + deps_schema

def transform(self, col_selector: ColumnSelector, df: DataFrameType) -> DataFrameType:
def transform(self, col_selector: ColumnSelector, data: Transformable) -> Transformable:
Copy link
Contributor Author

@karlhigley karlhigley Sep 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to make the Merlin DAG friendly to transforming dictionary-like objects (as we'd like to do in Systems, Models, and the dataloaders), the typing here needs to be flexible enough to accomodate non-dataframe objects. The Transformable protocol expects anything passed here to both be dictionary-like in the sense that you can fetch columns with transformable[col_name] (which dataframes are) and have a few handy dataframe-like methods (e.g. .columns.) The DictArray class defined above satisfies both requirements.

GPU_DICT_ARRAY = auto()


class ComputeSchemaMixin:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mixin allows non-operator classes to use the same schema computation machinery as operator without having to become operators and implement the full operator interface.

@@ -46,6 +46,12 @@ def __init__(
self._tags = tags if tags is not None else []
self.subgroups = subgroups if subgroups is not None else []

self.all = names == "*"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to make it possible to select all columns by default, which is useful in order to make the ComputeSchemaMixin methods have the lightest possible number of required parameters, we introduce the wildcard selector "*", which is used in one of the following ways:

ColumnSelector("*")
"*" >> SomeOperator()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also useful for defining operator graphs that can be run on different input data by the same executor, since it allows late-binding to all provided columns in a similar way to how tags allow late-binding to some provided columns (but not others.)

@@ -32,7 +32,7 @@ ignore-nested-functions = true
ignore-semiprivate = true
ignore-setters = true
fail-under = 70
exclude = ["build", "docs", "merlin/io", "tests", "setup.py", "versioneer.py"]
exclude = ["build", "docs", "merlin/core", "merlin/io", "tests", "setup.py", "versioneer.py"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's not really much reason to add docstrings to the newly defined protocols here, since classes that implement the protocols will have their own docstrings

from merlin.dag.executors import LocalExecutor
from merlin.schema.schema import ColumnSchema, Schema


Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests in this file demonstrate the capabilities we discovered were needed in order to process the kinds of data we deal with after loading data from disk and before it gets passed to models, like operating on dictionary-like objects and transforming tuples of data (e.g. (X, y)) with the same transformation graph.

@karlhigley
Copy link
Contributor Author

karlhigley commented Sep 15, 2022

The downstream tests are broken by the removal of DataFrameType, but it's straightforward to replace with the DataFrameLike protocol.

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #139 of commit 4cb6455824f5a56f16ccfa2e424248e668bcb260, no merge conflicts.
Running as SYSTEM
Setting status of 4cb6455824f5a56f16ccfa2e424248e668bcb260 to PENDING with url https://10.20.13.93:8080/job/merlin_core/209/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/139/*:refs/remotes/origin/pr/139/* # timeout=10
 > git rev-parse 4cb6455824f5a56f16ccfa2e424248e668bcb260^{commit} # timeout=10
Checking out Revision 4cb6455824f5a56f16ccfa2e424248e668bcb260 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 4cb6455824f5a56f16ccfa2e424248e668bcb260 # timeout=10
Commit message: "Add additional tests for wildcard selectors"
 > git rev-list --no-walk 865b4d3f4de02ac39ff7adab6ee823e45a077743 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins9828698646634708686.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu inst-nodeps: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.6.0+39.g4cb6455.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,alabaster==0.7.12,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.7,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,attrs==22.1.0,awscli==1.25.73,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.51,botocore==1.27.72,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,clang==5.0,click==8.1.3,cloudpickle==2.1.0,colorama==0.4.4,coverage==6.4.4,cuda-python==11.7.1,cudf==22.4.0,cupy-cuda116==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dask-cuda==22.4.0,dask-cudf==22.4.0,dbus-python==1.2.16,debugpy==1.6.2,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.3.0,distro==1.7.0,dm-tree==0.1.7,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==0.10.0,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.82.0,fastavro==1.6.0,fastcore==1.5.24,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.0,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.0,google-auth==2.11.0,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.2,grpcio==1.41.0,grpcio-channelz==1.47.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,hpack==4.0.0,httptools==0.4.0,hugectr2onnx==0.0.0,huggingface-hub==0.8.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.0,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.1,ipython==8.4.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.1.0,json5==0.9.9,jsonschema==4.9.1,jupyter-cache==0.4.3,jupyter-client==7.3.4,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyterlab==3.4.5,jupyterlab-pygments==0.2.2,jupyterlab-server==2.15.0,jupyterlab-widgets==1.1.0,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.7.1,libclang==14.0.6,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.0,locket==1.0.0,lxml==4.9.1,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.5.3,matplotlib-inline==0.1.3,mdit-py-plugins==0.2.8,merlin-core==0.6.0+39.g4cb6455,merlin-models==0.6.0+45.g5a345d9c1,merlin-systems==0+untagged.105.gf89cc51,mistune==0.8.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.6,nbconvert==6.5.3,nbdime==3.1.1,nbformat==5.4.0,nest-asyncio==1.5.5,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.0,numpy==1.21.5,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.3.3+15.g16e4e34e9),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.0,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,pluggy==1.0.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.30,proto-plus==1.19.6,protobuf==3.19.4,psutil==5.9.1,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==6.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.12.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyrsistent==0.18.1,pytest==7.1.2,pytest-cov==3.0.0,pytest-forked==1.4.0,pytest-xdist==2.5.0,python-apt==2.0.0+ubuntu0.20.4.7,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==23.2.1,regex==2022.7.25,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rmm==21.12.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.9.0,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.4,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.2.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.1.1,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.36,stack-data==0.4.0,starlette==0.19.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.6.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.26.0,tensorflow-metadata==1.9.0,termcolor==1.1.0,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.25.1,tqdm==4.64.0,traitlets==5.3.0,transformers==4.12.0,transformers4rec==0.1.11+10.g21a2a836a,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.22.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.11,uvicorn==0.18.3,uvloop==0.16.0,versioneer==0.20,virtualenv==20.16.4,wandb==0.13.1,watchfiles==0.16.1,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.3.3,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='3709598691'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 370 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/core/test_protocols.py ......... [ 2%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .............................. [ 12%]
tests/unit/dag/test_dictarray.py ... [ 13%]
tests/unit/dag/test_executors.py .... [ 14%]
tests/unit/dag/test_graph.py .... [ 15%]
tests/unit/dag/ops/test_selection.py ... [ 16%]
tests/unit/io/test_io.py ............................................... [ 28%]
................................................................ [ 46%]
tests/unit/schema/test_column_schemas.py ....... [ 48%]
tests/unit/schema/test_schema.py ............. [ 51%]
tests/unit/schema/test_schema_io.py .................................... [ 61%]
........................................................................ [ 80%]
........................................................ [ 95%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 71 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 38575 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41487 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34477 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34753 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 46499 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 32935 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 9 4 56%
merlin/core/dispatch.py 342 207 39%
merlin/core/protocols.py 68 29 57%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 6 0 100%
merlin/dag/base_operator.py 44 5 89%
merlin/dag/dictarray.py 54 15 72%
merlin/dag/executors.py 129 55 57%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 342 160 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 18 5 72%
merlin/dag/ops/selection.py 20 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/schema_mixin.py 89 18 80%
merlin/dag/selector.py 101 6 94%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 346 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 191 19 90%
merlin/schema/schema.py 209 33 84%
merlin/schema/tags.py 82 1 99%

TOTAL 4615 1438 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
============ 370 passed, 1 skipped, 89 warnings in 64.76s (0:01:04) ============
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins5986064118779814632.sh

@karlhigley
Copy link
Contributor Author

@oliverholworthy @nv-alaiacano Thoughts on this? Downstream tests don't pass due to a breaking change related to DataFrameType, but this is ready for review when you get a chance.

----------
node : Node
Output node of the graph to execute
data : DataFrameType
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input type here is now Transformable? and return type is the same or something else?



@runtime_checkable
class Transformable(DictLike, Protocol):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only use the three methods defined here in practice? Would it continue to be valid if we dropped DictLike here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already use more than these three methods in practice (e.g. __setitem__), and part of what we're trying to do is make it possible to use dictionary-like objects, so even if we're not using the methods yet, this is laying the groundwork for being able to treat dictionaries and dataframes interchangeably.

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #139 of commit 41826018322df2a0a1247a7ef84fbe0dab9958bb, no merge conflicts.
Running as SYSTEM
Setting status of 41826018322df2a0a1247a7ef84fbe0dab9958bb to PENDING with url https://10.20.13.93:8080/job/merlin_core/210/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/139/*:refs/remotes/origin/pr/139/* # timeout=10
 > git rev-parse 41826018322df2a0a1247a7ef84fbe0dab9958bb^{commit} # timeout=10
Checking out Revision 41826018322df2a0a1247a7ef84fbe0dab9958bb (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 41826018322df2a0a1247a7ef84fbe0dab9958bb # timeout=10
Commit message: "Keep `DataFrameType` and `SeriesType` for now to smooth the transition"
 > git rev-list --no-walk 4cb6455824f5a56f16ccfa2e424248e668bcb260 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins14909425134326493424.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.7.0+39.g4182601.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,awscli==1.25.82,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.27.81,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cloudpickle==2.2.0,cmake==3.24.1.1,colorama==0.4.4,contourpy==1.0.5,coverage==6.4.4,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.7.1,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.7.0+39.g4182601,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,pluggy==1.0.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==3.0.0,pytest-forked==1.4.0,pytest-xdist==2.5.0,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.9.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.2.1,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.36,stack-data==0.5.0,starlette==0.20.4,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.6.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='1558946721'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 370 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/core/test_protocols.py ......... [ 2%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .............................. [ 12%]
tests/unit/dag/test_dictarray.py ... [ 13%]
tests/unit/dag/test_executors.py .... [ 14%]
tests/unit/dag/test_graph.py .... [ 15%]
tests/unit/dag/ops/test_selection.py ... [ 16%]
tests/unit/io/test_io.py ............................................... [ 28%]
................................................................ [ 46%]
tests/unit/schema/test_column_schemas.py ....... [ 48%]
tests/unit/schema/test_schema.py ............. [ 51%]
tests/unit/schema/test_schema_io.py .................................... [ 61%]
........................................................................ [ 80%]
........................................................ [ 95%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35853 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41779 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41943 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44687 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43205 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33087 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 9 4 56%
merlin/core/dispatch.py 347 209 40%
merlin/core/protocols.py 68 29 57%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 6 0 100%
merlin/dag/base_operator.py 44 5 89%
merlin/dag/dictarray.py 54 15 72%
merlin/dag/executors.py 129 55 57%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 342 160 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 18 5 72%
merlin/dag/ops/selection.py 20 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/schema_mixin.py 89 18 80%
merlin/dag/selector.py 101 6 94%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 346 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 191 19 90%
merlin/schema/schema.py 209 33 84%
merlin/schema/tags.py 82 1 99%

TOTAL 4620 1440 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
============ 370 passed, 1 skipped, 14 warnings in 63.88s (0:01:03) ============
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins16199113426067099050.sh

@karlhigley
Copy link
Contributor Author

Superseded by the PRs listed above

@karlhigley karlhigley closed this Oct 5, 2022
karlhigley added a commit to karlhigley/systems that referenced this pull request Oct 6, 2022
Depends on NVIDIA-Merlin/core#139 and NVIDIA-Merlin/core#146

We've had the concept of an `InferenceDataFrame` for a while, but it's really just a wrapper around a dictionary of arrays. That data structure is useful in a bunch of places, so this swaps out `InferenceDataFrame` for a similar class from Merlin Core called `DictArray`.
jperez999 added a commit to NVIDIA-Merlin/systems that referenced this pull request Oct 21, 2022
…Merlin Core (#204)

* Rework operators to use `DictArray` and `LocalExecutor` from Merlin Core

Depends on NVIDIA-Merlin/core#139 and NVIDIA-Merlin/core#146

We've had the concept of an `InferenceDataFrame` for a while, but it's really just a wrapper around a dictionary of arrays. That data structure is useful in a bunch of places, so this swaps out `InferenceDataFrame` for a similar class from Merlin Core called `DictArray`.

* Use `np.array([1])` to make `pandas` happy

* Mark FAISS tests with Triton `importorskip`s

* Install current version-under-test in Tox environments

When APIs/internals change, it's important that Triton is running the same version of the code that we're testing, since our tests run Triton.

* Skip FAISS executor test if Triton executable is not found

* Skip the Triton executor model test when Triton isn't available

* Use the `_parse_model_repository` fn in executor model

* Make some fixes to the Implicit op and tests

* Update the FIL op's export method for `executor` mode

* Fix the artifact path in the executor model

* Sort out dtypes in Implicit op and tests

Co-authored-by: Julio Perez <37191411+jperez999@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants