Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save schema with consistent dtype when dtypes is used #182

Conversation

oliverholworthy
Copy link
Member

@oliverholworthy oliverholworthy commented Dec 14, 2022

When Dataset.to_parquet is used with dtypes, ensure we save the schema with a dtype that matches the parquet file dtypes.

Motivation

Noticed this issue from the notebooks in Merlin when we updated to use the merlin-dataloader package which returns data with dtypes that match the parquet file.

Fixed this with careful dtype matching here. However, this dtype PR would have made that commit unnecessary.

@oliverholworthy oliverholworthy added the bug Something isn't working label Dec 14, 2022
@oliverholworthy oliverholworthy added this to the Merlin 22.12 milestone Dec 14, 2022
@oliverholworthy oliverholworthy self-assigned this Dec 14, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #182 of commit 89922a088a9d33a46ad23fd13e20f2772d69760e, no merge conflicts.
Running as SYSTEM
Setting status of 89922a088a9d33a46ad23fd13e20f2772d69760e to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/335/ and message: 'Pending'
Using context: Jenkins
Building on the built-in node in workspace /var/jenkins_home/jobs/merlin_core/workspace
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/182/*:refs/remotes/origin/pr/182/* # timeout=10
 > git rev-parse 89922a088a9d33a46ad23fd13e20f2772d69760e^{commit} # timeout=10
Checking out Revision 89922a088a9d33a46ad23fd13e20f2772d69760e (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 89922a088a9d33a46ad23fd13e20f2772d69760e # timeout=10
Commit message: "Save schema with consistent dtype when `dtypes` is passed to `Dataset.to_parquet`"
 > git rev-list --no-walk 294867b8796bff27771092070bc1eef61dadcd61 # timeout=10
[workspace] $ /bin/bash /tmp/jenkins5151540935050955964.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.9.0+16.g89922a0.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.29,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.29,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.1.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.9.0+16.g89922a0,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.4,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.1.0,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.45,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='1520247086'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, cov-4.0.0, xdist-3.1.0
collected 399 items / 1 skipped

tests/unit/core/test_dispatch.py .... [ 1%]
tests/unit/core/test_protocols.py ......... [ 3%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .............................. [ 12%]
tests/unit/dag/test_dictarray.py ... [ 12%]
tests/unit/dag/test_executors.py .. [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py .... [ 15%]
tests/unit/io/test_io.py ............................................... [ 27%]
................................................................F [ 43%]
tests/unit/schema/test_column_schemas.py ............................... [ 51%]
[ 51%]
tests/unit/schema/test_schema.py ............. [ 54%]
tests/unit/schema/test_schema_io.py .................................... [ 63%]
........................................................................ [ 81%]
........................................................... [ 96%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=================================== FAILURES ===================================
________________________ test_to_parquet_dtypes_schema _________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_to_parquet_dtypes_schema0')

def test_to_parquet_dtypes_schema(tmpdir):
    df = dispatch.make_df({"a": np.array([1, 2, 3], dtype=np.int32)})
    dataset = merlin.io.Dataset(df)
    dataset.to_parquet(output_path=str(tmpdir), dtypes={"a": np.float32})
  assert dataset.schema["a"].dtype == np.int32

E AssertionError: assert dtype('float32') == <class 'numpy.int32'>
E + where dtype('float32') = ColumnSchema(name='a', tags=set(), properties={}, dtype=dtype('float32'), is_list=False, is_ragged=False).dtype
E + and <class 'numpy.int32'> = np.int32

tests/unit/io/test_io.py:801: AssertionError
=============================== warnings summary ===============================
tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:579: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/io/test_io.py::test_parquet_aggregate_files[True]
tests/unit/io/test_io.py::test_parquet_aggregate_files[True]
tests/unit/io/test_io.py::test_parquet_aggregate_files[False]
tests/unit/io/test_io.py::test_parquet_aggregate_files[False]
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:245: FutureWarning: gather_statistics is now deprecated and will be ignored.
warnings.warn(

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35957 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39671 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 42601 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37027 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 46301 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 42583 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 10 4 60%
merlin/core/dispatch.py 360 214 41%
merlin/core/protocols.py 99 45 55%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 5 0 100%
merlin/dag/base_operator.py 122 20 84%
merlin/dag/dictarray.py 55 15 73%
merlin/dag/executors.py 141 68 52%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 344 161 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 17 4 76%
merlin/dag/ops/selection.py 22 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/selector.py 101 6 94%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 350 52 85%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 624 70 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 190 17 91%
merlin/schema/schema.py 229 30 87%
merlin/schema/tags.py 82 0 100%

TOTAL 4713 1464 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
======= 1 failed, 398 passed, 1 skipped, 29 warnings in 65.39s (0:01:05) =======
ERROR: InvocationError for command /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu/bin/python -m pytest --cov-report term --cov merlin -rxs tests/unit (exited with code 1)
___________________________________ summary ____________________________________
ERROR: test-gpu: commands failed
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[workspace] $ /bin/bash /tmp/jenkins6421781007141690666.sh

karlhigley
karlhigley previously approved these changes Dec 14, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #182 of commit f7309448adab8ebaff55e5b0ee2ea5a66b598856, no merge conflicts.
Running as SYSTEM
Setting status of f7309448adab8ebaff55e5b0ee2ea5a66b598856 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/342/ and message: 'Pending'
Using context: Jenkins
Building on the built-in node in workspace /var/jenkins_home/jobs/merlin_core/workspace
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/182/*:refs/remotes/origin/pr/182/* # timeout=10
 > git rev-parse f7309448adab8ebaff55e5b0ee2ea5a66b598856^{commit} # timeout=10
Checking out Revision f7309448adab8ebaff55e5b0ee2ea5a66b598856 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f7309448adab8ebaff55e5b0ee2ea5a66b598856 # timeout=10
Commit message: "Remove blank line at end of test_io.py"
 > git rev-list --no-walk c3aec0c44cb27832b1756cdce3d03e90741bd683 # timeout=10
[workspace] $ /bin/bash /tmp/jenkins13207562247775202134.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.9.0+17.gf730944.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.30,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.30,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.1.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.9.0+17.gf730944,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.4,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.1.0,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.45,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='2024043601'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, cov-4.0.0, xdist-3.1.0
collected 399 items / 1 skipped

tests/unit/core/test_dispatch.py .... [ 1%]
tests/unit/core/test_protocols.py ......... [ 3%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .............................. [ 12%]
tests/unit/dag/test_dictarray.py ... [ 12%]
tests/unit/dag/test_executors.py .. [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py .... [ 15%]
tests/unit/io/test_io.py ............................................... [ 27%]
................................................................F [ 43%]
tests/unit/schema/test_column_schemas.py ............................... [ 51%]
[ 51%]
tests/unit/schema/test_schema.py ............. [ 54%]
tests/unit/schema/test_schema_io.py .................................... [ 63%]
........................................................................ [ 81%]
........................................................... [ 96%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=================================== FAILURES ===================================
________________________ test_to_parquet_dtypes_schema _________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_to_parquet_dtypes_schema0')

def test_to_parquet_dtypes_schema(tmpdir):
    df = dispatch.make_df({"a": np.array([1, 2, 3], dtype=np.int32)})
    dataset = merlin.io.Dataset(df)
    dataset.to_parquet(output_path=str(tmpdir), dtypes={"a": np.float32})
  assert dataset.schema["a"].dtype == np.int32

E AssertionError: assert dtype('float32') == <class 'numpy.int32'>
E + where dtype('float32') = ColumnSchema(name='a', tags=set(), properties={}, dtype=dtype('float32'), is_list=False, is_ragged=False).dtype
E + and <class 'numpy.int32'> = np.int32

tests/unit/io/test_io.py:801: AssertionError
=============================== warnings summary ===============================
tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:579: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/io/test_io.py::test_parquet_aggregate_files[True]
tests/unit/io/test_io.py::test_parquet_aggregate_files[True]
tests/unit/io/test_io.py::test_parquet_aggregate_files[False]
tests/unit/io/test_io.py::test_parquet_aggregate_files[False]
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:245: FutureWarning: gather_statistics is now deprecated and will be ignored.
warnings.warn(

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39755 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44425 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34195 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35661 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36361 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41039 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 10 4 60%
merlin/core/dispatch.py 360 214 41%
merlin/core/protocols.py 99 45 55%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 5 0 100%
merlin/dag/base_operator.py 122 20 84%
merlin/dag/dictarray.py 55 15 73%
merlin/dag/executors.py 141 68 52%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 344 161 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 17 4 76%
merlin/dag/ops/selection.py 22 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/selector.py 101 6 94%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 350 52 85%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 624 70 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 190 17 91%
merlin/schema/schema.py 229 30 87%
merlin/schema/tags.py 82 0 100%

TOTAL 4713 1464 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
======= 1 failed, 398 passed, 1 skipped, 29 warnings in 64.47s (0:01:04) =======
ERROR: InvocationError for command /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu/bin/python -m pytest --cov-report term --cov merlin -rxs tests/unit (exited with code 1)
___________________________________ summary ____________________________________
ERROR: test-gpu: commands failed
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[workspace] $ /bin/bash /tmp/jenkins4115696962255728139.sh

@github-actions
Copy link

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-182

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #182 of commit 7681c7fff808dab881f3938a284a12e1ff4a39bf, no merge conflicts.
Running as SYSTEM
Setting status of 7681c7fff808dab881f3938a284a12e1ff4a39bf to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/346/ and message: 'Pending'
Using context: Jenkins
Building on the built-in node in workspace /var/jenkins_home/jobs/merlin_core/workspace
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/182/*:refs/remotes/origin/pr/182/* # timeout=10
 > git rev-parse 7681c7fff808dab881f3938a284a12e1ff4a39bf^{commit} # timeout=10
Checking out Revision 7681c7fff808dab881f3938a284a12e1ff4a39bf (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 7681c7fff808dab881f3938a284a12e1ff4a39bf # timeout=10
Commit message: "Merge branch 'main' into dataset-to-parquet-set-schema-dtypes"
 > git rev-list --no-walk 4b3cd885f56a38f02378749cacf6f98adef11df0 # timeout=10
[workspace] $ /bin/bash /tmp/jenkins4167300837511587823.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.9.0+20.g7681c7f.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.30,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.30,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.1.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.9.0+20.g7681c7f,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.4,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.1.0,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.45,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='4251265829'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
ImportError while loading conftest '/var/jenkins_home/workspace/merlin_core/core/tests/conftest.py'.
tests/conftest.py:26: in 
    import cudf
/usr/local/lib/python3.8/dist-packages/cudf/__init__.py:5: in 
    validate_setup()
/usr/local/lib/python3.8/dist-packages/cudf/utils/gpu_utils.py:55: in validate_setup
    raise e
/usr/local/lib/python3.8/dist-packages/cudf/utils/gpu_utils.py:52: in validate_setup
    gpus_count = getDeviceCount()
/usr/local/lib/python3.8/dist-packages/rmm/_cuda/gpu.py:101: in getDeviceCount
    raise CUDARuntimeError(status)
E   rmm._cuda.gpu.CUDARuntimeError: cudaErrorInitializationError: initialization error
ERROR: InvocationError for command /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu/bin/python -m pytest --cov-report term --cov merlin -rxs tests/unit (exited with code 4)
___________________________________ summary ____________________________________
ERROR:   test-gpu: commands failed
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script  : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log" 
[workspace] $ /bin/bash /tmp/jenkins2089468983162585276.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #182 of commit 7caf59150e2d73075cd301258ae27b2a3a43ec70, no merge conflicts.
Running as SYSTEM
Setting status of 7caf59150e2d73075cd301258ae27b2a3a43ec70 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/347/ and message: 'Pending'
Using context: Jenkins
Building on the built-in node in workspace /var/jenkins_home/jobs/merlin_core/workspace
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/core +refs/pull/182/*:refs/remotes/origin/pr/182/* # timeout=10
 > git rev-parse 7caf59150e2d73075cd301258ae27b2a3a43ec70^{commit} # timeout=10
Checking out Revision 7caf59150e2d73075cd301258ae27b2a3a43ec70 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 7caf59150e2d73075cd301258ae27b2a3a43ec70 # timeout=10
Commit message: "Merge branch 'main' into dataset-to-parquet-set-schema-dtypes"
 > git rev-list --no-walk 7681c7fff808dab881f3938a284a12e1ff4a39bf # timeout=10
[workspace] $ /bin/bash /tmp/jenkins18384283862898491118.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.9.0+22.g7caf591.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.30,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.30,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.1.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.9.0+22.g7caf591,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.4,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.1.0,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.45,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='624033682'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, cov-4.0.0, xdist-3.1.0
collected 399 items / 1 skipped

tests/unit/core/test_dispatch.py .... [ 1%]
tests/unit/core/test_protocols.py ......... [ 3%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .............................. [ 12%]
tests/unit/dag/test_dictarray.py ... [ 12%]
tests/unit/dag/test_executors.py .. [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py .... [ 15%]
tests/unit/io/test_io.py ............................................... [ 27%]
................................................................F [ 43%]
tests/unit/schema/test_column_schemas.py ............................... [ 51%]
[ 51%]
tests/unit/schema/test_schema.py ............. [ 54%]
tests/unit/schema/test_schema_io.py .................................... [ 63%]
........................................................................ [ 81%]
........................................................... [ 96%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=================================== FAILURES ===================================
________________________ test_to_parquet_dtypes_schema _________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-13/test_to_parquet_dtypes_schema0')

def test_to_parquet_dtypes_schema(tmpdir):
    df = dispatch.make_df({"a": np.array([1, 2, 3], dtype=np.int32)})
    dataset = merlin.io.Dataset(df)
    dataset.to_parquet(output_path=str(tmpdir), dtypes={"a": np.float32})
  assert dataset.schema["a"].dtype == np.int32

E AssertionError: assert dtype('float32') == <class 'numpy.int32'>
E + where dtype('float32') = ColumnSchema(name='a', tags=set(), properties={}, dtype=dtype('float32'), is_list=False, is_ragged=False).dtype
E + and <class 'numpy.int32'> = np.int32

tests/unit/io/test_io.py:801: AssertionError
=============================== warnings summary ===============================
tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:866: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:579: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/io/test_io.py::test_parquet_aggregate_files[True]
tests/unit/io/test_io.py::test_parquet_aggregate_files[True]
tests/unit/io/test_io.py::test_parquet_aggregate_files[False]
tests/unit/io/test_io.py::test_parquet_aggregate_files[False]
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:245: FutureWarning: gather_statistics is now deprecated and will be ignored.
warnings.warn(

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39161 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35389 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35187 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44295 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 38825 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 46391 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 10 4 60%
merlin/core/dispatch.py 365 218 40%
merlin/core/protocols.py 99 45 55%
merlin/core/utils.py 197 56 72%
merlin/dag/init.py 5 0 100%
merlin/dag/base_operator.py 121 20 83%
merlin/dag/dictarray.py 55 15 73%
merlin/dag/executors.py 141 68 52%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 344 161 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 17 4 76%
merlin/dag/ops/selection.py 22 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/selector.py 101 6 94%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 350 52 85%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 624 70 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 190 17 91%
merlin/schema/schema.py 229 30 87%
merlin/schema/tags.py 82 0 100%

TOTAL 4719 1468 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
======= 1 failed, 398 passed, 1 skipped, 29 warnings in 81.94s (0:01:21) =======
ERROR: InvocationError for command /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu/bin/python -m pytest --cov-report term --cov merlin -rxs tests/unit (exited with code 1)
___________________________________ summary ____________________________________
ERROR: test-gpu: commands failed
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[workspace] $ /bin/bash /tmp/jenkins1545941498244280134.sh

@karlhigley
Copy link
Contributor

@oliverholworthy It looks like this change may cause some dtype issues in NVTabular

@karlhigley karlhigley self-requested a review February 6, 2023 14:32
@karlhigley karlhigley dismissed their stale review February 6, 2023 14:33

CI build surfaced downstream errors in NVT

merlin/core/dispatch.py Outdated Show resolved Hide resolved
merlin/io/dataset.py Outdated Show resolved Hide resolved
@oliverholworthy oliverholworthy removed this from the Merlin 22.12 milestone Feb 6, 2023
@oliverholworthy oliverholworthy merged commit 23da628 into NVIDIA-Merlin:main Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants