-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoupling of InputBlock(v2) and Embedding Tables, support to Ragged Embeddings Lookup and AverageEmbeddingsByWeightFeature #593
Conversation
Click to view CI ResultsGitHub pull request #593 of commit 6d8ce8a80d32e4e7844634aaf2b28bc9604f7d59, no merge conflicts. Running as SYSTEM Setting status of 6d8ce8a80d32e4e7844634aaf2b28bc9604f7d59 to PENDING with url https://10.20.13.93:8080/job/merlin_models/749/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/593/*:refs/remotes/origin/pr/593/* # timeout=10 > git rev-parse 6d8ce8a80d32e4e7844634aaf2b28bc9604f7d59^{commit} # timeout=10 Checking out Revision 6d8ce8a80d32e4e7844634aaf2b28bc9604f7d59 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 6d8ce8a80d32e4e7844634aaf2b28bc9604f7d59 # timeout=10 Commit message: "Refactoring InputBlockv2, EmbeddingsFromSchema and creating AverageEmbeddingsByWeightFeature" > git rev-list --no-walk e8326aaa5e5929e5003929e46868e866bed33796 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins7205061555137100419.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /usr/local/lib/python3.8/dist-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /usr/local/lib/python3.8/dist-packages (from testbook) (0.6.5) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.1) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.2.0) Requirement already satisfied: tornado>=6.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 484 items / 2 skipped |
… lists are also possible
…beddingsByWeightFeature
6d8ce8a
to
4e60d4f
Compare
Documentation preview |
Click to view CI ResultsGitHub pull request #593 of commit 4e60d4f866337b9f74746380fb9252a4cabf4eb8, no merge conflicts. Running as SYSTEM Setting status of 4e60d4f866337b9f74746380fb9252a4cabf4eb8 to PENDING with url https://10.20.13.93:8080/job/merlin_models/750/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/593/*:refs/remotes/origin/pr/593/* # timeout=10 > git rev-parse 4e60d4f866337b9f74746380fb9252a4cabf4eb8^{commit} # timeout=10 Checking out Revision 4e60d4f866337b9f74746380fb9252a4cabf4eb8 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 4e60d4f866337b9f74746380fb9252a4cabf4eb8 # timeout=10 Commit message: "Fixed tests" > git rev-list --no-walk 6d8ce8a80d32e4e7844634aaf2b28bc9604f7d59 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins8007419558442995416.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /usr/local/lib/python3.8/dist-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /usr/local/lib/python3.8/dist-packages (from testbook) (0.6.5) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.1) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.2.0) Requirement already satisfied: tornado>=6.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 499 items / 2 skipped |
Apply classes renaming suggested by @marcromeyn Co-authored-by: Marc Romeyn <marcromeyn@gmail.com>
Click to view CI ResultsGitHub pull request #593 of commit 5212307036e6db6b5afaa74b8afc36d31ec8f461, no merge conflicts. Running as SYSTEM Setting status of 5212307036e6db6b5afaa74b8afc36d31ec8f461 to PENDING with url https://10.20.13.93:8080/job/merlin_models/760/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/593/*:refs/remotes/origin/pr/593/* # timeout=10 > git rev-parse 5212307036e6db6b5afaa74b8afc36d31ec8f461^{commit} # timeout=10 Checking out Revision 5212307036e6db6b5afaa74b8afc36d31ec8f461 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 5212307036e6db6b5afaa74b8afc36d31ec8f461 # timeout=10 Commit message: "Apply suggestions from code review" > git rev-list --no-walk 07911209e5e0836ec266740304058ae9176c7b40 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins3536002096638362811.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /usr/local/lib/python3.8/dist-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /usr/local/lib/python3.8/dist-packages (from testbook) (0.6.5) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.1) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.2.0) Requirement already satisfied: tornado>=6.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 150 items / 36 errors / 2 skipped |
Click to view CI ResultsGitHub pull request #593 of commit 8ed069527b4322a63a038a60edc4b239815b49bc, no merge conflicts. Running as SYSTEM Setting status of 8ed069527b4322a63a038a60edc4b239815b49bc to PENDING with url https://10.20.13.93:8080/job/merlin_models/764/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_models using credential nvidia-merlin-bot > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/models/ # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/models/ > git --version # timeout=10 using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/models/ +refs/pull/593/*:refs/remotes/origin/pr/593/* # timeout=10 > git rev-parse 8ed069527b4322a63a038a60edc4b239815b49bc^{commit} # timeout=10 Checking out Revision 8ed069527b4322a63a038a60edc4b239815b49bc (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 8ed069527b4322a63a038a60edc4b239815b49bc # timeout=10 Commit message: "Exposing EmbeddingTable kwargs in Embeddings and fixed tests" > git rev-list --no-walk 440e1dfb404acc4cc77ec2696f1e50f0b4daabb9 # timeout=10 [merlin_models] $ /bin/bash /tmp/jenkins16148962659745911063.sh Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com Requirement already satisfied: testbook in /usr/local/lib/python3.8/dist-packages (0.4.2) Requirement already satisfied: nbformat>=5.0.4 in /usr/local/lib/python3.8/dist-packages (from testbook) (5.4.0) Requirement already satisfied: nbclient>=0.4.0 in /usr/local/lib/python3.8/dist-packages (from testbook) (0.6.5) Requirement already satisfied: traitlets>=5.1 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (5.3.0) Requirement already satisfied: jsonschema>=2.6 in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.6.1) Requirement already satisfied: jupyter-core in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (4.10.0) Requirement already satisfied: fastjsonschema in /usr/local/lib/python3.8/dist-packages (from nbformat>=5.0.4->testbook) (2.15.3) Requirement already satisfied: jupyter-client>=6.1.5 in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (7.3.4) Requirement already satisfied: nest-asyncio in /usr/local/lib/python3.8/dist-packages (from nbclient>=0.4.0->testbook) (1.5.5) Requirement already satisfied: attrs>=17.4.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (21.4.0) Requirement already satisfied: importlib-resources>=1.4.0; python_version < "3.9" in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (5.8.0) Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /usr/local/lib/python3.8/dist-packages (from jsonschema>=2.6->nbformat>=5.0.4->testbook) (0.18.1) Requirement already satisfied: entrypoints in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (0.4) Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (2.8.2) Requirement already satisfied: pyzmq>=23.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (23.2.0) Requirement already satisfied: tornado>=6.0 in /usr/local/lib/python3.8/dist-packages (from jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (6.1) Requirement already satisfied: zipp>=3.1.0; python_version < "3.10" in /usr/local/lib/python3.8/dist-packages (from importlib-resources>=1.4.0; python_version < "3.9"->jsonschema>=2.6->nbformat>=5.0.4->testbook) (3.8.0) Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.8.2->jupyter-client>=6.1.5->nbclient>=0.4.0->testbook) (1.15.0) ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_models/models, configfile: pyproject.toml plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 499 items / 2 skipped |
Fixes #580 Fixes #579
Goals ⚽
This PR is based on the prototypes for
InputBlock
andInferEmbeddings
added by @marcromeyn inNVIDIA-Merlin/Merlin#404
InputBlockv2
(that will eventually replace the previousInputBlock
), which decouples the embedding tables creation fromInputBlock
, giving more flexibility for the users to define their own embeddings and simplifying the API (e.g. removing theEmbeddingOptions
in the InputBlock). The newEmbeddingsFromSchema
provides a default implementation that infers the embedding tables that need to be created from the schema.Implementation Details 🚧
EmbeddingsFromSchema
function creates theEmbeddingTable(s)
for categorical features like the logic that was before inEmbeddingFeatures
.EmbeddingTable
now supports both non-sequential and sequential features (RaggedTensor, SparseTensor) for embedding lookup, accepting either a single combiner (str or layer), a dict of combiners ({"feature_name": combiner}) or no combiner (returning the Ragged or Sparse tensor in this case)AverageEmbeddingsByWeightFeature
covers the use case where we want to do weighed average of a sequence of embeddings (3D tensor), using as weight another (sequential) continuous feature). The weights could be for example the frequency of a category in user history, or be based on the time since the user interacted with the itemInputBlockv2
.Testing Details 🔍
test_tabular.py
to test the new classes