Move PoC unit test to unit folder and fix POC nb #364

rnyak · 2022-06-02T22:18:38Z

This PR

moves POC unit tests to unit directory
edits the output column name in the ranking model so that it matches with the ranking model's output_names

This requires this PR to be merged first.

review-notebook-app · 2022-06-02T22:18:42Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

nvidia-merlin-bot · 2022-06-02T22:19:26Z

Click to view CI Results

GitHub pull request #364 of commit 2757e798a23cff3108a976a56d1e0fd233daf6cb, no merge conflicts.
Running as SYSTEM
Setting status of 2757e798a23cff3108a976a56d1e0fd233daf6cb to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/132/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/364/*:refs/remotes/origin/pr/364/* # timeout=10
 > git rev-parse 2757e798a23cff3108a976a56d1e0fd233daf6cb^{commit} # timeout=10
Checking out Revision 2757e798a23cff3108a976a56d1e0fd233daf6cb (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 2757e798a23cff3108a976a56d1e0fd233daf6cb # timeout=10
Commit message: "delete categories"
 > git rev-list --no-walk 8c5e1345a56d7052d28631a55d3056f4e0877ffc # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins6351894645920927990.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 2 items
tests/unit/test_version.py .                                             [ 50%]

tests/unit/examples/test_building_deploying_multi_stage_RecSys.py F      [100%]
=================================== FAILURES ===================================

__________________________________ test_func ___________________________________
tb1 = <testbook.client.TestbookNotebookClient object at 0x7fceefdc02b0>
@testbook(
    REPO_ROOT
    / "examples/Building-and-deploying-multi-stage-RecSys/01-Building-Recommender-Systems-with-Merlin.ipynb",
    execute=False,
)
def test_func(tb1):
    tb1.inject(
        """
        import os
        os.environ["DATA_FOLDER"] = "/tmp/data/"
        os.environ["NUM_ROWS"] = "10000"
        os.system("mkdir -p /tmp/examples")
        os.environ["BASE_DIR"] = "/tmp/examples/"
        """
    )


  tb1.execute()


tests/unit/examples/test_building_deploying_multi_stage_RecSys.py:23:

../../../.local/lib/python3.8/site-packages/testbook/client.py:147: in execute

super().execute_cell(cell, index)

../../../.local/lib/python3.8/site-packages/nbclient/util.py:84: in wrapped

return just_run(coro(*args, **kwargs))

../../../.local/lib/python3.8/site-packages/nbclient/util.py:62: in just_run

return loop.run_until_complete(coro)

/usr/lib/python3.8/asyncio/base_events.py:616: in run_until_complete

return future.result()

../../../.local/lib/python3.8/site-packages/nbclient/client.py:965: in async_execute_cell

await self._check_raise_for_error(cell, cell_index, exec_reply)

self = <testbook.client.TestbookNotebookClient object at 0x7fceefdc02b0>

cell = {'cell_type': 'code', 'execution_count': 13, 'id': 'bfe2aa9e', 'metadata': {'execution': {'iopub.status.busy': '2022-0...m', run_eagerly=False, metrics=[tf.keras.metrics.AUC()])\nmodel.fit(train, validation_data=valid, batch_size=16*1024)"}

cell_index = 29

exec_reply = {'buffers': [], 'content': {'ename': 'TypeError', 'engine_info': {'engine_id': -1, 'engine_uuid': '206a701d-f2c1-46fa-...e, 'engine': '206a701d-f2c1-46fa-9a98-91a18400e79b', 'started': '2022-06-02T22:19:23.591571Z', 'status': 'error'}, ...}
async def _check_raise_for_error(
    self, cell: NotebookNode, cell_index: int, exec_reply: t.Optional[t.Dict]
) -> None:

    if exec_reply is None:
        return None

    exec_reply_content = exec_reply['content']
    if exec_reply_content['status'] != 'error':
        return None

    cell_allows_errors = (not self.force_raise_errors) and (
        self.allow_errors
        or exec_reply_content.get('ename') in self.allow_error_names
        or "raises-exception" in cell.metadata.get("tags", [])
    )
    await run_hook(self.on_cell_error, cell=cell, cell_index=cell_index)
    if not cell_allows_errors:


      raise CellExecutionError.from_cell_and_msg(cell, exec_reply_content)


E           nbclient.exceptions.CellExecutionError: An error occurred while executing the following cell:

E           ------------------

E           model.compile(optimizer='adam', run_eagerly=False, metrics=[tf.keras.metrics.AUC()])

E           model.fit(train, validation_data=valid, batch_size=161024)

E           ------------------

E

E           �[0;31m---------------------------------------------------------------------------�[0m

E           �[0;31mTypeError�[0m                                 Traceback (most recent call last)

E           Input �[0;32mIn [13]�[0m, in �[0;36m<cell line: 1>�[0;34m()�[0m

E           �[0;32m----> 1�[0m �[43mmodel�[49m�[38;5;241;43m.�[39;49m�[43mcompile�[49m�[43m(�[49m�[43moptimizer�[49m�[38;5;241;43m=�[39;49m�[38;5;124;43m'�[39;49m�[38;5;124;43madam�[39;49m�[38;5;124;43m'�[39;49m�[43m,�[49m�[43m �[49m�[43mrun_eagerly�[49m�[38;5;241;43m=�[39;49m�[38;5;28;43;01mFalse�[39;49;00m�[43m,�[49m�[43m �[49m�[43mmetrics�[49m�[38;5;241;43m=�[39;49m�[43m[�[49m�[43mtf�[49m�[38;5;241;43m.�[39;49m�[43mkeras�[49m�[38;5;241;43m.�[39;49m�[43mmetrics�[49m�[38;5;241;43m.�[39;49m�[43mAUC�[49m�[43m(�[49m�[43m)�[49m�[43m]�[49m�[43m)�[49m

E           �[1;32m      2�[0m model�[38;5;241m.�[39mfit(train, validation_data�[38;5;241m=�[39mvalid, batch_size�[38;5;241m=�[39m�[38;5;241m16�[39m�[38;5;241m�[39m�[38;5;241m1024�[39m)

E

E           File �[0;32m/usr/local/lib/python3.8/dist-packages/merlin/models/tf/models/base.py:332�[0m, in �[0;36mModel.compile�[0;34m(self, optimizer, loss, metrics, loss_weights, weighted_metrics, run_eagerly, steps_per_execution, jit_compile, **kwargs)�[0m

E           �[1;32m    329�[0m     �[38;5;28;01mif�[39;00m �[38;5;28misinstance�[39m(_loss[key], �[38;5;28mstr�[39m) �[38;5;129;01mand�[39;00m _loss[key] �[38;5;129;01min�[39;00m loss_registry:

E           �[1;32m    330�[0m         _loss[key] �[38;5;241m=�[39m loss_registry�[38;5;241m.�[39mparse(_loss[key])

E           �[0;32m--> 332�[0m �[38;5;28;43msuper�[39;49m�[43m(�[49m�[43mModel�[49m�[43m,�[49m�[43m �[49m�[38;5;28;43mself�[39;49m�[43m)�[49m�[38;5;241;43m.�[39;49m�[43mcompile�[49m�[43m(�[49m

E           �[1;32m    333�[0m �[43m    �[49m�[43moptimizer�[49m�[38;5;241;43m=�[39;49m�[43moptimizer�[49m�[43m,�[49m

E           �[1;32m    334�[0m �[43m    �[49m�[43mloss�[49m�[38;5;241;43m=�[39;49m�[43m_loss�[49m�[43m,�[49m

E           �[1;32m    335�[0m �[43m    �[49m�[43mmetrics�[49m�[38;5;241;43m=�[39;49m�[43m_metrics�[49m�[43m,�[49m

E           �[1;32m    336�[0m �[43m    �[49m�[43mweighted_metrics�[49m�[38;5;241;43m=�[39;49m�[43mweighted_metrics�[49m�[43m,�[49m

E           �[1;32m    337�[0m �[43m    �[49m�[43mrun_eagerly�[49m�[38;5;241;43m=�[39;49m�[43mrun_eagerly�[49m�[43m,�[49m

E           �[1;32m    338�[0m �[43m    �[49m�[43mloss_weights�[49m�[38;5;241;43m=�[39;49m�[43mloss_weights�[49m�[43m,�[49m

E           �[1;32m    339�[0m �[43m    �[49m�[43msteps_per_execution�[49m�[38;5;241;43m=�[39;49m�[43msteps_per_execution�[49m�[43m,�[49m

E           �[1;32m    340�[0m �[43m    �[49m�[43mjit_compile�[49m�[38;5;241;43m=�[39;49m�[43mjit_compile�[49m�[43m,�[49m

E           �[1;32m    341�[0m �[43m    �[49m�[38;5;241;43m�[39;49m�[38;5;241;43m�[39;49m�[43mkwargs�[49m�[43m,�[49m

E           �[1;32m    342�[0m �[43m�[49m�[43m)�[49m

E

E           File �[0;32m/usr/local/lib/python3.8/dist-packages/keras/engine/training.py:555�[0m, in �[0;36mModel.compile�[0;34m(self, optimizer, loss, metrics, loss_weights, weighted_metrics, run_eagerly, steps_per_execution, **kwargs)�[0m

E           �[1;32m    550�[0m �[38;5;66;03m# When compiling from an already-serialized model, we do not want to�[39;00m

E           �[1;32m    551�[0m �[38;5;66;03m# reapply some processing steps (e.g. metric renaming for multi-output�[39;00m

E           �[1;32m    552�[0m �[38;5;66;03m# models, which have prefixes added for each corresponding output name).�[39;00m

E           �[1;32m    553�[0m from_serialized �[38;5;241m=�[39m kwargs�[38;5;241m.�[39mpop(�[38;5;124m'�[39m�[38;5;124mfrom_serialized�[39m�[38;5;124m'�[39m, �[38;5;28;01mFalse�[39;00m)

E           �[0;32m--> 555�[0m �[38;5;28;43mself�[39;49m�[38;5;241;43m.�[39;49m�[43m_validate_compile�[49m�[43m(�[49m�[43moptimizer�[49m�[43m,�[49m�[43m �[49m�[43mmetrics�[49m�[43m,�[49m�[43m �[49m�[38;5;241;43m�[39;49m�[38;5;241;43m�[39;49m�[43mkwargs�[49m�[43m)�[49m

E           �[1;32m    556�[0m �[38;5;28mself�[39m�[38;5;241m.�[39m_run_eagerly �[38;5;241m=�[39m run_eagerly

E           �[1;32m    558�[0m �[38;5;28mself�[39m�[38;5;241m.�[39moptimizer �[38;5;241m=�[39m �[38;5;28mself�[39m�[38;5;241m.�[39m_get_optimizer(optimizer)

E

E           File �[0;32m/usr/local/lib/python3.8/dist-packages/keras/engine/training.py:2717�[0m, in �[0;36mModel._validate_compile�[0;34m(self, optimizer, metrics, **kwargs)�[0m

E           �[1;32m   2715�[0m invalid_kwargs �[38;5;241m=�[39m �[38;5;28mset�[39m(kwargs) �[38;5;241m-�[39m {�[38;5;124m'�[39m�[38;5;124msample_weight_mode�[39m�[38;5;124m'�[39m}

E           �[1;32m   2716�[0m �[38;5;28;01mif�[39;00m invalid_kwargs:

E           �[0;32m-> 2717�[0m   �[38;5;28;01mraise�[39;00m �[38;5;167;01mTypeError�[39;00m(�[38;5;124m'�[39m�[38;5;124mInvalid keyword argument(s) in compile: �[39m�[38;5;132;01m%s�[39;00m�[38;5;124m'�[39m �[38;5;241m%�[39m

E           �[1;32m   2718�[0m                   (invalid_kwargs,))

E           �[1;32m   2720�[0m �[38;5;66;03m# Model must be created and compiled with the same DistStrat.�[39;00m

E           �[1;32m   2721�[0m �[38;5;28;01mif�[39;00m �[38;5;28mself�[39m�[38;5;241m.�[39mbuilt �[38;5;129;01mand�[39;00m tf�[38;5;241m.�[39mdistribute�[38;5;241m.�[39mhas_strategy():

E

E           �[0;31mTypeError�[0m: Invalid keyword argument(s) in compile: {'jit_compile'}

E           TypeError: Invalid keyword argument(s) in compile: {'jit_compile'}
../../../.local/lib/python3.8/site-packages/nbclient/client.py:862: CellExecutionError

----------------------------- Captured stderr call -----------------------------

2022-06-02 22:19:19.414601: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA

To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

2022-06-02 22:19:21.417467: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:222] Using CUDA malloc Async allocator for GPU: 0

2022-06-02 22:19:21.417625: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory:  -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0

2022-06-02 22:19:21.418381: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:222] Using CUDA malloc Async allocator for GPU: 1

2022-06-02 22:19:21.418437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15157 MB memory:  -> device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0

Error in atexit._run_exitfuncs:

Traceback (most recent call last):

File "/usr/lib/python3.8/logging/init.py", line 2127, in shutdown

h.close()

File "/usr/local/lib/python3.8/dist-packages/absl/logging/init.py", line 951, in close

self.stream.close()

File "/usr/local/lib/python3.8/dist-packages/ipykernel/iostream.py", line 445, in close

self.watch_fd_thread.join()

AttributeError: 'OutStream' object has no attribute 'watch_fd_thread'

=========================== short test summary info ============================

FAILED tests/unit/examples/test_building_deploying_multi_stage_RecSys.py::test_func

========================= 1 failed, 1 passed in 37.50s =========================

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_merlin] $ /bin/bash /tmp/jenkins4821672215182737569.sh

github-actions · 2022-06-02T22:20:37Z

Documentation preview

https://nvidia-merlin.github.io/Merlin/review/pr-364

bschifferer · 2022-06-03T08:28:42Z

I dont think that this PR will execute the unittests for each CI run.

@benfred @jperez999 In my understanding, the nightly CI will execute this bash script:
https://github.com/NVIDIA-Merlin/Merlin/blob/main/ci/test_container.sh

is that correct?

So we need to add a section

## Test Merlin Repository
echo "Run unit tests for Merlin"
cd /Merlin && ci/test_unit.sh $container $devices

and we need to add a test_unit.sh to the Merlin repository. @benfred @jperez999 can you confirm?

karlhigley · 2022-06-06T19:58:17Z

The multi-stage recommender example requires both Merlin Models and Merlin Systems, and therefore isn't well suited to a unit test (i.e. a test of a single "unit," like a library.) Since this requires multiple libraries—even though the notebooks live in their respectively most relevant repos—this would be better tested with an multi-library integration test in the Merlin repo (where we can safely assume that depending on anything in the Merlin ecosystem is fair game.)

rnyak · 2022-06-07T19:08:14Z

@karlhigley Thanks for your comment. FWIW, this is a unit test, not integration test, and it actually runs without issues (if the PoC's second nb error is fixed) at my env. This unit test runs fine because multi-stage recommender example runs in the same container (everything runs in the inference container) whereas this is not the situation for Serving-Ranking-Models-With-Merlin-Systems.ipynb example.
Maybe I am missing something with multi-library integration test statement, but I think this is more suitable statement for this #310 PR instead?

if we do not have a unit test for this release, we will still be not detecting errors with PoC example.. :/

jperez999 · 2022-06-09T01:37:02Z

for this to succeed we need to first have this PR merged in systems: NVIDIA-Merlin/systems#117

rnyak · 2022-06-13T14:33:23Z

closing due to #385

rnyak added 2 commits June 2, 2022 22:12

move POC test to unit folder and fix POC nb

a5de2c4

delete categories

2757e79

rnyak requested a review from jperez999 June 2, 2022 22:18

rnyak requested review from benfred and bschifferer June 2, 2022 22:37

rnyak mentioned this pull request Jun 3, 2022

Fix broken PoC notebook due mismatching output names between config file and saved ranking model NVIDIA-Merlin/systems#112

Closed

rnyak closed this Jun 13, 2022

rnyak deleted the fix_unit_test_and_POC branch June 14, 2022 10:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move PoC unit test to unit folder and fix POC nb #364

Move PoC unit test to unit folder and fix POC nb #364

rnyak commented Jun 2, 2022 •

edited

Loading

review-notebook-app bot commented Jun 2, 2022

nvidia-merlin-bot commented Jun 2, 2022

github-actions bot commented Jun 2, 2022

bschifferer commented Jun 3, 2022

karlhigley commented Jun 6, 2022

rnyak commented Jun 7, 2022 •

edited

Loading

jperez999 commented Jun 9, 2022

rnyak commented Jun 13, 2022

Move PoC unit test to unit folder and fix POC nb #364

Move PoC unit test to unit folder and fix POC nb #364

Conversation

rnyak commented Jun 2, 2022 • edited Loading

review-notebook-app bot commented Jun 2, 2022

nvidia-merlin-bot commented Jun 2, 2022

github-actions bot commented Jun 2, 2022

Documentation preview

bschifferer commented Jun 3, 2022

karlhigley commented Jun 6, 2022

rnyak commented Jun 7, 2022 • edited Loading

jperez999 commented Jun 9, 2022

rnyak commented Jun 13, 2022

rnyak commented Jun 2, 2022 •

edited

Loading

rnyak commented Jun 7, 2022 •

edited

Loading