Fix unit scaling criteo inference serving #559

jperez999 · 2022-08-25T14:43:08Z

Newer versions of triton do not allow startup from within the context of a notebook, that is within the context of testbooks. This fix is to remedy those issues.

review-notebook-app · 2022-08-25T14:43:12Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

nvidia-merlin-bot · 2022-08-25T14:43:42Z

Click to view CI Results

GitHub pull request #559 of commit 18f8854e9c7149f93bd7447ee1150020c1faf000, no merge conflicts.
Running as SYSTEM
Setting status of 18f8854e9c7149f93bd7447ee1150020c1faf000 to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/368/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/559/*:refs/remotes/origin/pr/559/* # timeout=10
 > git rev-parse 18f8854e9c7149f93bd7447ee1150020c1faf000^{commit} # timeout=10
Checking out Revision 18f8854e9c7149f93bd7447ee1150020c1faf000 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 18f8854e9c7149f93bd7447ee1150020c1faf000 # timeout=10
Commit message: "fix unit test for scaling criteo"
 > git rev-list --no-walk d8ab03429179e7dd1467123a0334d9dbf9875576 # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins9875516084636874203.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 3 items
tests/unit/test_version.py .                                             [ 33%]

tests/unit/examples/test_building_deploying_multi_stage_RecSys.py s      [ 66%]

tests/unit/examples/test_scaling_criteo_merlin_models.py F               [100%]
=================================== FAILURES ===================================

__________________________________ test_func ___________________________________
self = <testbook.client.TestbookNotebookClient object at 0x7f7e30450cd0>

cell = {'id': '426844fb', 'cell_type': 'code', 'metadata': {'execution': {'iopub.status.busy': '2022-08-25T14:43:39.494692Z',...ut/criteo/day_0.parquet')\nvalid.to_ddf().compute().to_parquet('/tmp/input/criteo/day_1.parquet')\n', 'outputs': []}

cell_index = 27, execution_count = None, store_history = True
async def async_execute_cell(
    self,
    cell: NotebookNode,
    cell_index: int,
    execution_count: t.Optional[int] = None,
    store_history: bool = True,
) -> NotebookNode:
    """
    Executes a single code cell.

    To execute all cells see :meth:`execute`.

    Parameters
    ----------
    cell : nbformat.NotebookNode
        The cell which is currently being processed.
    cell_index : int
        The position of the cell within the notebook object.
    execution_count : int
        The execution count to be assigned to the cell (default: Use kernel response)
    store_history : bool
        Determines if history should be stored in the kernel (default: False).
        Specific to ipython kernels, which can store command histories.

    Returns
    -------
    output : dict
        The execution output payload (or None for no output).

    Raises
    ------
    CellExecutionError
        If execution failed and should raise an exception, this will be raised
        with defaults about the failure.

    Returns
    -------
    cell : NotebookNode
        The cell which was just processed.
    """
    assert self.kc is not None

    await run_hook(self.on_cell_start, cell=cell, cell_index=cell_index)

    if cell.cell_type != 'code' or not cell.source.strip():
        self.log.debug("Skipping non-executing cell %s", cell_index)
        return cell

    if self.skip_cells_with_tag in cell.metadata.get("tags", []):
        self.log.debug("Skipping tagged cell %s", cell_index)
        return cell

    if self.record_timing:  # clear execution metadata prior to execution
        cell['metadata']['execution'] = {}

    self.log.debug("Executing cell:\n%s", cell.source)

    cell_allows_errors = (not self.force_raise_errors) and (
        self.allow_errors or "raises-exception" in cell.metadata.get("tags", [])
    )

    await run_hook(self.on_cell_execute, cell=cell, cell_index=cell_index)
    parent_msg_id = await ensure_async(
        self.kc.execute(
            cell.source, store_history=store_history, stop_on_error=not cell_allows_errors
        )
    )
    await run_hook(self.on_cell_complete, cell=cell, cell_index=cell_index)
    # We launched a code cell to execute
    self.code_cells_executed += 1
    exec_timeout = self._get_timeout(cell)

    cell.outputs = []
    self.clear_before_next_output = False

    task_poll_kernel_alive = asyncio.ensure_future(self._async_poll_kernel_alive())
    task_poll_output_msg = asyncio.ensure_future(
        self._async_poll_output_msg(parent_msg_id, cell, cell_index)
    )
    self.task_poll_for_reply = asyncio.ensure_future(
        self._async_poll_for_reply(
            parent_msg_id, cell, exec_timeout, task_poll_output_msg, task_poll_kernel_alive
        )
    )
    try:


      exec_reply = await self.task_poll_for_reply


E           asyncio.exceptions.CancelledError
/usr/local/lib/python3.8/dist-packages/nbclient/client.py:1006: CancelledError
During handling of the above exception, another exception occurred:
def test_func():
    with testbook(
        REPO_ROOT / "examples" / "scaling-criteo" / "02-ETL-with-NVTabular.ipynb",
        execute=False,
        timeout=180,
    ) as tb1:
        tb1.inject(
            """
            import os
            os.environ["BASE_DIR"] = "/tmp/input/criteo/"
            os.environ["INPUT_DATA_DIR"] = "/tmp/input/criteo/"
            os.environ["OUTPUT_DATA_DIR"] = "/tmp/output/criteo/"
            os.system("mkdir -p /tmp/input/criteo")
            os.system("mkdir -p /tmp/output/criteo")

            from merlin.datasets.synthetic import generate_data

            train, valid = generate_data("criteo", int(1000000), set_sizes=(0.7, 0.3))

            train.to_ddf().compute().to_parquet('/tmp/input/criteo/day_0.parquet')
            valid.to_ddf().compute().to_parquet('/tmp/input/criteo/day_1.parquet')
            """
        )


      tb1.execute()


tests/unit/examples/test_scaling_criteo_merlin_models.py:36:

/usr/local/lib/python3.8/dist-packages/testbook/client.py:147: in execute

super().execute_cell(cell, index)

/usr/local/lib/python3.8/dist-packages/nbclient/util.py:85: in wrapped

return just_run(coro(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/nbclient/util.py:60: in just_run

return loop.run_until_complete(coro)

/usr/lib/python3.8/asyncio/base_events.py:616: in run_until_complete

return future.result()

self = <testbook.client.TestbookNotebookClient object at 0x7f7e30450cd0>

cell = {'id': '426844fb', 'cell_type': 'code', 'metadata': {'execution': {'iopub.status.busy': '2022-08-25T14:43:39.494692Z',...ut/criteo/day_0.parquet')\nvalid.to_ddf().compute().to_parquet('/tmp/input/criteo/day_1.parquet')\n', 'outputs': []}

cell_index = 27, execution_count = None, store_history = True
async def async_execute_cell(
    self,
    cell: NotebookNode,
    cell_index: int,
    execution_count: t.Optional[int] = None,
    store_history: bool = True,
) -> NotebookNode:
    """
    Executes a single code cell.

    To execute all cells see :meth:`execute`.

    Parameters
    ----------
    cell : nbformat.NotebookNode
        The cell which is currently being processed.
    cell_index : int
        The position of the cell within the notebook object.
    execution_count : int
        The execution count to be assigned to the cell (default: Use kernel response)
    store_history : bool
        Determines if history should be stored in the kernel (default: False).
        Specific to ipython kernels, which can store command histories.

    Returns
    -------
    output : dict
        The execution output payload (or None for no output).

    Raises
    ------
    CellExecutionError
        If execution failed and should raise an exception, this will be raised
        with defaults about the failure.

    Returns
    -------
    cell : NotebookNode
        The cell which was just processed.
    """
    assert self.kc is not None

    await run_hook(self.on_cell_start, cell=cell, cell_index=cell_index)

    if cell.cell_type != 'code' or not cell.source.strip():
        self.log.debug("Skipping non-executing cell %s", cell_index)
        return cell

    if self.skip_cells_with_tag in cell.metadata.get("tags", []):
        self.log.debug("Skipping tagged cell %s", cell_index)
        return cell

    if self.record_timing:  # clear execution metadata prior to execution
        cell['metadata']['execution'] = {}

    self.log.debug("Executing cell:\n%s", cell.source)

    cell_allows_errors = (not self.force_raise_errors) and (
        self.allow_errors or "raises-exception" in cell.metadata.get("tags", [])
    )

    await run_hook(self.on_cell_execute, cell=cell, cell_index=cell_index)
    parent_msg_id = await ensure_async(
        self.kc.execute(
            cell.source, store_history=store_history, stop_on_error=not cell_allows_errors
        )
    )
    await run_hook(self.on_cell_complete, cell=cell, cell_index=cell_index)
    # We launched a code cell to execute
    self.code_cells_executed += 1
    exec_timeout = self._get_timeout(cell)

    cell.outputs = []
    self.clear_before_next_output = False

    task_poll_kernel_alive = asyncio.ensure_future(self._async_poll_kernel_alive())
    task_poll_output_msg = asyncio.ensure_future(
        self._async_poll_output_msg(parent_msg_id, cell, cell_index)
    )
    self.task_poll_for_reply = asyncio.ensure_future(
        self._async_poll_for_reply(
            parent_msg_id, cell, exec_timeout, task_poll_output_msg, task_poll_kernel_alive
        )
    )
    try:
        exec_reply = await self.task_poll_for_reply
    except asyncio.CancelledError:
        # can only be cancelled by task_poll_kernel_alive when the kernel is dead
        task_poll_output_msg.cancel()


      raise DeadKernelError("Kernel died")


E           nbclient.exceptions.DeadKernelError: Kernel died
/usr/local/lib/python3.8/dist-packages/nbclient/client.py:1010: DeadKernelError

----------------------------- Captured stderr call -----------------------------

2022-08-25 14:43:30,908 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize

2022-08-25 14:43:30,932 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize

terminate called after throwing an instance of 'rmm::out_of_memory'

what():  std::bad_alloc: out_of_memory: CUDA error at: /usr/include/rmm/mr/device/cuda_memory_resource.hpp:70: cudaErrorMemoryAllocation out of memory

/usr/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 12 leaked semaphore objects to clean up at shutdown

warnings.warn('resource_tracker: There appear to be %d '

------------------------------ Captured log call -------------------------------

ERROR    traitlets:client.py:863 Kernel died while waiting for execute reply.

=============================== warnings summary ===============================

../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33

/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

other = LooseVersion(other)
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/examples/test_scaling_criteo_merlin_models.py::test_func - ...

============= 1 failed, 1 passed, 1 skipped, 35 warnings in 20.35s =============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_merlin] $ /bin/bash /tmp/jenkins7049271451736486476.sh

github-actions · 2022-08-25T14:45:43Z

Documentation preview

https://nvidia-merlin.github.io/Merlin/review/pr-559

jperez999 · 2022-08-25T14:48:34Z

rerun tests

nvidia-merlin-bot · 2022-08-25T14:50:23Z

Click to view CI Results

GitHub pull request #559 of commit 18f8854e9c7149f93bd7447ee1150020c1faf000, no merge conflicts. Running as SYSTEM Setting status of 18f8854e9c7149f93bd7447ee1150020c1faf000 to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/369/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_merlin using credential systems-login > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin > git --version # timeout=10 using GIT_ASKPASS to set credentials login for merlin-systems > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/559/*:refs/remotes/origin/pr/559/* # timeout=10 > git rev-parse 18f8854e9c7149f93bd7447ee1150020c1faf000^{commit} # timeout=10 Checking out Revision 18f8854e9c7149f93bd7447ee1150020c1faf000 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 18f8854e9c7149f93bd7447ee1150020c1faf000 # timeout=10 Commit message: "fix unit test for scaling criteo" > git rev-list --no-walk 18f8854e9c7149f93bd7447ee1150020c1faf000 # timeout=10 [merlin_merlin] $ /bin/bash /tmp/jenkins7113214151575744866.sh ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 3 items

tests/unit/test_version.py . [ 33%]
tests/unit/examples/test_building_deploying_multi_stage_RecSys.py s [ 66%]
tests/unit/examples/test_scaling_criteo_merlin_models.py F [100%]

=================================== FAILURES ===================================
__________________________________ test_func ___________________________________

def test_func():
    with testbook(
        REPO_ROOT / "examples" / "scaling-criteo" / "02-ETL-with-NVTabular.ipynb",
        execute=False,
        timeout=180,
    ) as tb1:
        tb1.inject(
            """
            import os
            os.environ["BASE_DIR"] = "/tmp/input/criteo/"
            os.environ["INPUT_DATA_DIR"] = "/tmp/input/criteo/"
            os.environ["OUTPUT_DATA_DIR"] = "/tmp/output/criteo/"
            os.system("mkdir -p /tmp/input/criteo")
            os.system("mkdir -p /tmp/output/criteo")

            from merlin.datasets.synthetic import generate_data

            train, valid = generate_data("criteo", int(1000000), set_sizes=(0.7, 0.3))

            train.to_ddf().compute().to_parquet('/tmp/input/criteo/day_0.parquet')
            valid.to_ddf().compute().to_parquet('/tmp/input/criteo/day_1.parquet')
            """
        )
        tb1.execute()
        assert os.path.isfile("/tmp/output/criteo/train/part_0.parquet")
        assert os.path.isfile("/tmp/output/criteo/valid/part_0.parquet")
        assert os.path.isfile("/tmp/output/criteo/workflow/metadata.json")

    with testbook(
        REPO_ROOT
        / "examples"
        / "scaling-criteo"
        / "03-Training-with-Merlin-Models-TensorFlow.ipynb",
        execute=False,
        timeout=180,
    ) as tb2:
        tb2.inject(
            """
            import os
            os.environ["INPUT_DATA_DIR"] = "/tmp/output/criteo/"
            """
        )
        tb2.execute()
        metrics = tb2.ref("eval_metrics")
        assert set(metrics.keys()) == set(
            [
                "auc",
                "binary_accuracy",
                "loss",
                "precision",
                "recall",
                "regularization_loss",
            ]
        )
        assert os.path.isfile("/tmp/output/criteo/dlrm/saved_model.pb")

    with testbook(
        REPO_ROOT
        / "examples"
        / "scaling-criteo"
        / "04-Triton-Inference-with-Merlin-Models-TensorFlow.ipynb",
        execute=False,
        timeout=180,
    ) as tb3:
        tb3.inject(
            """
            import os
            os.environ["BASE_DIR"] = "/tmp/output/criteo/"
            os.environ["INPUT_FOLDER"] = "/tmp/input/criteo/"
            """
        )
        NUM_OF_CELLS = len(tb3.cells)
        tb3.execute_cell(list(range(0, NUM_OF_CELLS - 5)))
        input_cols = tb3.ref("input_cols")
        outputs = tb3.ref("output_cols")
        # read in data for request
        df_lib = get_lib()
        in_dtypes = {}
        for col in input_cols:
            if col.startswith("C"):
                in_dtypes[col] = "int64"
            if col.startswith("I"):
                in_dtypes[col] = "float64"
        batch = df_lib.read_parquet(
            os.path.join("/tmp/output/criteo/", "valid", "part_0.parquet"),
            num_rows=3,
            columns=input_cols,
        )
        batch = batch.astype(in_dtypes)
        configure_tensorflow()

      response = run_ensemble_on_tritonserver(

            "/tmp/output/criteo/ensemble/", outputs, batch, "ensemble_model"
        )

tests/unit/examples/test_scaling_criteo_merlin_models.py:103:

/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:92: in run_ensemble_on_tritonserver
with run_triton_server(tmpdir) as client:
/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)

modelpath = '/tmp/output/criteo/ensemble/'

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode

                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

/usr/local/lib/python3.8/dist-packages/merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
2022-08-25 14:48:55,752 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
2022-08-25 14:48:55,766 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(
/usr/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 27 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
2022-08-25 14:49:13.352448: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-25 14:49:15.459725: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:222] Using CUDA malloc Async allocator for GPU: 0
2022-08-25 14:49:15.459867: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-25 14:49:15.460678: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:222] Using CUDA malloc Async allocator for GPU: 1
2022-08-25 14:49:15.460728: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15153 MB memory: -> device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python3.8/logging/init.py", line 2127, in shutdown
h.close()
File "/usr/local/lib/python3.8/dist-packages/absl/logging/init.py", line 934, in close
self.stream.close()
File "/usr/local/lib/python3.8/dist-packages/ipykernel/iostream.py", line 438, in close
self.watch_fd_thread.join()
AttributeError: 'OutStream' object has no attribute 'watch_fd_thread'
2022-08-25 14:49:51.117986: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-25 14:49:53.164650: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:222] Using CUDA malloc Async allocator for GPU: 0
2022-08-25 14:49:53.164796: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14880 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-25 14:49:53.165632: I tensorflow/core/common_runtime/gpu/gpu_process_state.cc:222] Using CUDA malloc Async allocator for GPU: 1
2022-08-25 14:49:53.165683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15153 MB memory: -> device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0
I0825 14:50:09.259489 2672 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f1bc0000000' with size 268435456
I0825 14:50:09.260274 2672 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0825 14:50:09.264351 2672 model_repository_manager.cc:1191] loading: 1_predicttensorflow:1
I0825 14:50:09.364700 2672 model_repository_manager.cc:1191] loading: 0_transformworkflow:1
I0825 14:50:09.650113 2672 tensorflow.cc:2204] TRITONBACKEND_Initialize: tensorflow
I0825 14:50:09.650155 2672 tensorflow.cc:2214] Triton TRITONBACKEND API version: 1.10
I0825 14:50:09.650162 2672 tensorflow.cc:2220] 'tensorflow' TRITONBACKEND API version: 1.10
I0825 14:50:09.650168 2672 tensorflow.cc:2244] backend configuration:
{"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","version":"2","default-max-batch-size":"4"}}
I0825 14:50:09.650212 2672 tensorflow.cc:2310] TRITONBACKEND_ModelInitialize: 1_predicttensorflow (version 1)
I0825 14:50:09.655407 2672 tensorflow.cc:2359] TRITONBACKEND_ModelInstanceInitialize: 1_predicttensorflow (GPU device 0)
2022-08-25 14:50:10.011568: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/output/criteo/ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-25 14:50:10.027707: I tensorflow/cc/saved_model/reader.cc:81] Reading meta graph with tags { serve }
2022-08-25 14:50:10.027759: I tensorflow/cc/saved_model/reader.cc:122] Reading SavedModel debug info (if present) from: /tmp/output/criteo/ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-25 14:50:10.027891: I tensorflow/core/platform/cpu_feature_guard.cc:194] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-25 14:50:10.065314: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 13776 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-08-25 14:50:10.135588: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2022-08-25 14:50:10.142843: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-08-25 14:50:10.413723: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/output/criteo/ensemble/1_predicttensorflow/1/model.savedmodel
2022-08-25 14:50:10.488625: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 477076 microseconds.
I0825 14:50:10.507852 2672 tensorflow.cc:2397] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0825 14:50:10.507908 2672 tensorflow.cc:2336] TRITONBACKEND_ModelFinalize: delete model state
E0825 14:50:10.507944 2672 model_repository_manager.cc:1348] failed to load '1_predicttensorflow' version 1: Invalid argument: unexpected inference input 'C1', allowed inputs are: args_0, args_0_1, args_0_10, args_0_11, args_0_12, args_0_13, args_0_14, args_0_15, args_0_16, args_0_17, args_0_18, args_0_19, args_0_2, args_0_20, args_0_21, args_0_22, args_0_23, args_0_24, args_0_25, args_0_26, args_0_27, args_0_28, args_0_29, args_0_3, args_0_30, args_0_31, args_0_32, args_0_33, args_0_34, args_0_35, args_0_36, args_0_37, args_0_38, args_0_4, args_0_5, args_0_6, args_0_7, args_0_8, args_0_9
I0825 14:50:10.511201 2672 python_be.cc:1774] TRITONBACKEND_ModelInstanceInitialize: 0_transformworkflow (GPU device 0)
I0825 14:50:14.930794 2672 model_repository_manager.cc:1345] successfully loaded '0_transformworkflow' version 1
E0825 14:50:14.930907 2672 model_repository_manager.cc:1551] Invalid argument: ensemble 'ensemble_model' depends on '1_predicttensorflow' which has no loaded version
I0825 14:50:14.931019 2672 server.cc:556]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0825 14:50:14.931132 2672 server.cc:583]
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","version":"2","default-max-batch-size":"4"}} |
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"false","min-compute-capability":"6.000000","backend-directory":"/opt/tritonserver/backends","default-max-batch-size":"4"}} |
+------------+-----------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0825 14:50:14.931322 2672 server.cc:626]
+---------------------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0_transformworkflow | 1 | READY |
| 1_predicttensorflow | 1 | UNAVAILABLE: Invalid argument: unexpected inference input 'C1', allowed inputs are: args_0, args_0_1, args_0_10, args_0_11, args_0_12, args_0_13, args_0_14, args_0_15, args_0_16, args_0_17, args_0_18, args_0_19, args_0_2, args_0_20, args_0_21, args_0_22, args_0_23, args_0_24, args_0_25, args_0_26, args_0_27, args_0_28, args_0_29, args_0_3, args_0_30, args_0_31, args_0_32, args_0_33, args_0_34, args_0_35, args_0_36, args_0_37, args_0_38, args_0_4, args_0_5, ar |
| | | gs_0_6, args_0_7, args_0_8, args_0_9 |
+---------------------+---------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0825 14:50:14.995644 2672 metrics.cc:650] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0825 14:50:14.996540 2672 tritonserver.cc:2159]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.23.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/output/criteo/ensemble/ |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0825 14:50:14.996575 2672 server.cc:257] Waiting for in-flight requests to complete.
I0825 14:50:14.996583 2672 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0825 14:50:14.996594 2672 model_repository_manager.cc:1223] unloading: 0_transformworkflow:1
I0825 14:50:14.996630 2672 server.cc:288] All models are stopped, unloading models
I0825 14:50:14.996639 2672 server.cc:295] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0825 14:50:15.996724 2672 server.cc:295] Timeout 29: Found 1 live models and 0 in-flight non-inference requests
W0825 14:50:16.013750 2672 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
I0825 14:50:16.578288 2672 model_repository_manager.cc:1328] successfully unloaded '0_transformworkflow' version 1
I0825 14:50:16.996857 2672 server.cc:295] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0825 14:50:17.013939 2672 metrics.cc:468] Unable to get energy consumption for GPU 0. Status:Success, value:0
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "/usr/lib/python3.8/logging/init.py", line 2127, in shutdown
h.close()
File "/usr/local/lib/python3.8/dist-packages/absl/logging/init.py", line 934, in close
self.stream.close()
File "/usr/local/lib/python3.8/dist-packages/ipykernel/iostream.py", line 438, in close
self.watch_fd_thread.join()
AttributeError: 'OutStream' object has no attribute 'watch_fd_thread'
=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33
/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
DASK_VERSION = LooseVersion(dask.version)

../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings
/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
other = LooseVersion(other)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/examples/test_scaling_criteo_merlin_models.py::test_func - ...
======== 1 failed, 1 passed, 1 skipped, 35 warnings in 96.47s (0:01:36) ========
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_merlin] $ /bin/bash /tmp/jenkins4202871427140035698.sh

nvidia-merlin-bot · 2022-08-25T14:59:13Z

Click to view CI Results

GitHub pull request #559 of commit 3832ea55a7cc44dce1f693d5e718ddc49a12f1a6, no merge conflicts.
Running as SYSTEM
Setting status of 3832ea55a7cc44dce1f693d5e718ddc49a12f1a6 to PENDING with url https://10.20.13.93:8080/job/merlin_merlin/370/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_merlin
using credential systems-login
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/Merlin # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Merlin
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Merlin +refs/pull/559/*:refs/remotes/origin/pr/559/* # timeout=10
 > git rev-parse 3832ea55a7cc44dce1f693d5e718ddc49a12f1a6^{commit} # timeout=10
Checking out Revision 3832ea55a7cc44dce1f693d5e718ddc49a12f1a6 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 3832ea55a7cc44dce1f693d5e718ddc49a12f1a6 # timeout=10
Commit message: "add back model import"
 > git rev-list --no-walk 18f8854e9c7149f93bd7447ee1150020c1faf000 # timeout=10
[merlin_merlin] $ /bin/bash /tmp/jenkins3194441305650325862.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_merlin/merlin
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 3 items
tests/unit/test_version.py .                                             [ 33%]

tests/unit/examples/test_building_deploying_multi_stage_RecSys.py s      [ 66%]

tests/unit/examples/test_scaling_criteo_merlin_models.py .               [100%]
=============================== warnings summary ===============================

../../../../../usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33

/usr/local/lib/python3.8/dist-packages/dask_cudf/core.py:33: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

DASK_VERSION = LooseVersion(dask.version)
../../../.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: 34 warnings

/var/jenkins_home/.local/lib/python3.8/site-packages/setuptools/_distutils/version.py:346: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.

other = LooseVersion(other)
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

============ 2 passed, 1 skipped, 35 warnings in 111.70s (0:01:51) =============

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Merlin/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_merlin] $ /bin/bash /tmp/jenkins3902650011334290476.sh

jperez999 added 5 commits July 17, 2022 11:12

merge resolved

d53333f

merge in main

002e0d8

Merge branch 'main' of https://github.com/NVIDIA-Merlin/Merlin

c3c7cad

Merge branch 'main' of https://github.com/NVIDIA-Merlin/Merlin

148e57b

fix unit test for scaling criteo

18f8854

jperez999 self-assigned this Aug 25, 2022

jperez999 added bug Something isn't working chore Infrastructure update breaking Breaking change ci labels Aug 25, 2022

jperez999 changed the title ~~Fix unit multi stage deploy inference serving~~ Fix unit scaling criteo inference serving Aug 25, 2022

add back model import

3832ea5

karlhigley approved these changes Aug 25, 2022

View reviewed changes

karlhigley merged commit c12dbac into NVIDIA-Merlin:main Aug 25, 2022

viswa-nvidia added this to the Merlin 22.09 milestone Sep 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix unit scaling criteo inference serving #559

Fix unit scaling criteo inference serving #559

jperez999 commented Aug 25, 2022

review-notebook-app bot commented Aug 25, 2022

nvidia-merlin-bot commented Aug 25, 2022

github-actions bot commented Aug 25, 2022

jperez999 commented Aug 25, 2022

nvidia-merlin-bot commented Aug 25, 2022

nvidia-merlin-bot commented Aug 25, 2022

Fix unit scaling criteo inference serving #559

Fix unit scaling criteo inference serving #559

Conversation

jperez999 commented Aug 25, 2022

review-notebook-app bot commented Aug 25, 2022

nvidia-merlin-bot commented Aug 25, 2022

github-actions bot commented Aug 25, 2022

Documentation preview

jperez999 commented Aug 25, 2022

nvidia-merlin-bot commented Aug 25, 2022

nvidia-merlin-bot commented Aug 25, 2022