-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix joint Categorify with list columns #1685
Conversation
cc @bschifferer |
Click to view CI ResultsGitHub pull request #1685 of commit 1b5cea3c9cba60b99496df70bfd6797c4552dbb0, no merge conflicts. Running as SYSTEM Setting status of 1b5cea3c9cba60b99496df70bfd6797c4552dbb0 to PENDING with url http://10.20.17.181:8080/job/nvtabular_tests/4734/ and message: 'Build started for merge commit.' Using context: Jenkins Unit Test Run Building on master in workspace /var/jenkins_home/workspace/nvtabular_tests using credential nvidia-merlin-bot Cloning the remote Git repository Cloning repository https://github.com/NVIDIA-Merlin/NVTabular.git > git init /var/jenkins_home/workspace/nvtabular_tests/nvtabular # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 > git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10 > git config remote.origin.url https://github.com/NVIDIA-Merlin/NVTabular.git # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/NVTabular.git using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/NVTabular.git +refs/pull/1685/*:refs/remotes/origin/pr/1685/* # timeout=10 > git rev-parse 1b5cea3c9cba60b99496df70bfd6797c4552dbb0^{commit} # timeout=10 Checking out Revision 1b5cea3c9cba60b99496df70bfd6797c4552dbb0 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 1b5cea3c9cba60b99496df70bfd6797c4552dbb0 # timeout=10 Commit message: "fix column concatenation to support list with normal column" > git rev-list --no-walk 513d1e6404a26273ffd5c3f0d61076f008121b50 # timeout=10 First time build. Skipping changelog. [nvtabular_tests] $ /bin/bash /tmp/jenkins6549111999443179106.sh GLOB sdist-make: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/setup.py test-gpu create: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu test-gpu installdeps: pytest, pytest-cov WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration. test-gpu inst: /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/.tmp/package/1/nvtabular-1.5.0+3.g1b5cea3c9.zip WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration. test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,awscli==1.25.84,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.27.83,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cloudpickle==2.2.0,cmake==3.24.1.1,colorama==0.4.4,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.7.1,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.6.0+1.g5926fcf,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,-e git+https://github.com/NVIDIA-Merlin/NVTabular.git@1b5cea3c9cba60b99496df70bfd6797c4552dbb0#egg=nvtabular,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,pluggy==1.0.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-forked==1.4.0,pytest-xdist==2.5.0,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.9.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.2.2,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.36,stack-data==0.5.0,starlette==0.20.4,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.6.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0 test-gpu run-test-pre: PYTHONHASHSEED='1036075765' test-gpu run-test: commands[0] | python -m pip install --upgrade git+https://github.com/NVIDIA-Merlin/core.git Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Collecting git+https://github.com/NVIDIA-Merlin/core.git Cloning https://github.com/NVIDIA-Merlin/core.git to /tmp/pip-req-build-dqha5r3v Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/core.git /tmp/pip-req-build-dqha5r3v Resolved https://github.com/NVIDIA-Merlin/core.git to commit 98cd36067d5ad9bb952aa2dbfac55eb059bb7bc4 Installing build dependencies: started Installing build dependencies: finished with status 'done' Getting requirements to build wheel: started Getting requirements to build wheel: finished with status 'done' Preparing metadata (pyproject.toml): started Preparing metadata (pyproject.toml): finished with status 'done' Requirement already satisfied: pandas<1.4.0dev0,>=1.2.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (1.3.5) Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (1.10.0) Requirement already satisfied: betterproto<2.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (1.2.5) Requirement already satisfied: numba>=0.54 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (0.55.1) Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (4.64.1) Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (21.3) Requirement already satisfied: distributed>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (2022.3.0) Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (3.19.5) Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core==0.7.0+1.g98cd360) (7.0.0) Requirement already satisfied: fsspec==2022.5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (2022.5.0) Requirement already satisfied: dask>=2022.3.0 in /var/jenkins_home/.local/lib/python3.8/site-packages (from merlin-core==0.7.0+1.g98cd360) (2022.3.0) Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core==0.7.0+1.g98cd360) (1.2.0) Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core==0.7.0+1.g98cd360) (0.4.3) Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.2.0) Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (0.12.0) Requirement already satisfied: partd>=0.3.10 in /var/jenkins_home/.local/lib/python3.8/site-packages/partd-1.2.0-py3.8.egg (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.2.0) Requirement already satisfied: pyyaml>=5.3.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/PyYAML-5.4.1-py3.8-linux-x86_64.egg (from dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (5.4.1) Requirement already satisfied: tblib>=1.6.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/tblib-1.7.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.7.0) Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /var/jenkins_home/.local/lib/python3.8/site-packages/sortedcontainers-2.4.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.4.0) Requirement already satisfied: tornado>=6.0.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/tornado-6.1-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (6.1) Requirement already satisfied: psutil>=5.0 in /var/jenkins_home/.local/lib/python3.8/site-packages/psutil-5.8.0-py3.8-linux-x86_64.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (5.8.0) Requirement already satisfied: zict>=0.1.3 in /var/jenkins_home/.local/lib/python3.8/site-packages/zict-2.0.0-py3.8.egg (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.0.0) Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (8.1.3) Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (3.1.2) Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.0.4) Requirement already satisfied: numpy<1.22,>=1.18 in /var/jenkins_home/.local/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+1.g98cd360) (1.20.3) Requirement already satisfied: llvmlite<0.39,>=0.38.0rc1 in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+1.g98cd360) (0.38.1) Requirement already satisfied: setuptools in ./.tox/test-gpu/lib/python3.8/site-packages (from numba>=0.54->merlin-core==0.7.0+1.g98cd360) (65.3.0) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core==0.7.0+1.g98cd360) (3.0.9) Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core==0.7.0+1.g98cd360) (2022.2.1) Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core==0.7.0+1.g98cd360) (2.8.2) Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+1.g98cd360) (1.52.0) Requirement already satisfied: absl-py<2.0.0,>=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core==0.7.0+1.g98cd360) (1.2.0) Requirement already satisfied: locket in /var/jenkins_home/.local/lib/python3.8/site-packages/locket-0.2.1-py3.8.egg (from partd>=0.3.10->dask>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (0.2.1) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas<1.4.0dev0,>=1.2.0->merlin-core==0.7.0+1.g98cd360) (1.15.0) Requirement already satisfied: heapdict in /var/jenkins_home/.local/lib/python3.8/site-packages/HeapDict-1.0.1-py3.8.egg (from zict>=0.1.3->distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (1.0.1) Requirement already satisfied: h2<5,>=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core==0.7.0+1.g98cd360) (4.1.0) Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core==0.7.0+1.g98cd360) (6.0.2) Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2022.3.0->merlin-core==0.7.0+1.g98cd360) (2.1.1) Requirement already satisfied: hpack<5,>=4.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core==0.7.0+1.g98cd360) (4.0.0) Requirement already satisfied: hyperframe<7,>=6.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core==0.7.0+1.g98cd360) (6.0.1) Building wheels for collected packages: merlin-core Building wheel for merlin-core (pyproject.toml): started Building wheel for merlin-core (pyproject.toml): finished with status 'done' Created wheel for merlin-core: filename=merlin_core-0.7.0+1.g98cd360-py3-none-any.whl size=114014 sha256=dc2cd5c2247afb4edfaf223b010292751bd912a8098fa3f690b7599d0c31d3bd Stored in directory: /tmp/pip-ephem-wheel-cache-428d5_k1/wheels/c8/38/16/a6968787eafcec5fa772148af8408b089562f71af0752e8e84 Successfully built merlin-core Installing collected packages: merlin-core Attempting uninstall: merlin-core Found existing installation: merlin-core 0.3.0+12.g78ecddd Not uninstalling merlin-core at /var/jenkins_home/.local/lib/python3.8/site-packages, outside environment /var/jenkins_home/workspace/nvtabular_tests/nvtabular/.tox/test-gpu Can't uninstall 'merlin-core'. No files were found to uninstall. Successfully installed merlin-core-0.7.0+1.g98cd360 test-gpu run-test: commands[1] | python -m pytest --cov-report term --cov merlin -rxs tests/unit ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0 cachedir: .tox/test-gpu/.pytest_cache rootdir: /var/jenkins_home/workspace/nvtabular_tests/nvtabular, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-4.0.0 collected 1433 items / 1 skipped |
Documentation preview |
Closes #1676
The intial concatenation step within the logic to jointly-encode columns was not accounting for the possibility that a subset of the columns may be list dtypes. This PR itrodces a simple fix to flatten list columns before concatenation.