Fixes Merlin e2e example #117

jperez999 · 2022-06-09T01:25:42Z

ensures we pull correct column information from Tensorflow models.

nvidia-merlin-bot · 2022-06-09T01:25:59Z

Click to view CI Results

GitHub pull request #117 of commit 0f9fa18556f24ec9fc194d52cc3733a31f531826, no merge conflicts.
Running as SYSTEM
Setting status of 0f9fa18556f24ec9fc194d52cc3733a31f531826 to PENDING with url https://10.20.13.93:8080/job/merlin_systems/69/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/117/*:refs/remotes/origin/pr/117/* # timeout=10
 > git rev-parse 0f9fa18556f24ec9fc194d52cc3733a31f531826^{commit} # timeout=10
Checking out Revision 0f9fa18556f24ec9fc194d52cc3733a31f531826 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 0f9fa18556f24ec9fc194d52cc3733a31f531826 # timeout=10
Commit message: "fixes to ensure use of correct keys from tf models"
 > git rev-list --no-walk 2a413f6e0993969f2aeab999c9e7cf968b799e7b # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins14260605814698537897.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 6 items / 5 errors / 1 skipped
==================================== ERRORS ====================================

_____________ ERROR collecting tests/unit/systems/test_ensemble.py _____________

ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_ensemble.py'.

Hint: make sure your test modules/packages have valid Python names.

Traceback:

/usr/lib/python3.8/importlib/init.py:127: in import_module

return _bootstrap._gcd_import(name[level:], package, level)

tests/unit/systems/test_ensemble.py:31: in 

from nvtabular import Workflow  # noqa

E   ModuleNotFoundError: No module named 'nvtabular'

______________ ERROR collecting tests/unit/systems/test_export.py ______________

ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_export.py'.

Hint: make sure your test modules/packages have valid Python names.

Traceback:

/usr/lib/python3.8/importlib/init.py:127: in import_module

return _bootstrap._gcd_import(name[level:], package, level)

tests/unit/systems/test_export.py:23: in 

from nvtabular import Workflow, ops

E   ModuleNotFoundError: No module named 'nvtabular'

______________ ERROR collecting tests/unit/systems/test_graph.py _______________

ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_graph.py'.

Hint: make sure your test modules/packages have valid Python names.

Traceback:

/usr/lib/python3.8/importlib/init.py:127: in import_module

return _bootstrap._gcd_import(name[level:], package, level)

tests/unit/systems/test_graph.py:19: in 

from nvtabular import Workflow

E   ModuleNotFoundError: No module named 'nvtabular'

__________ ERROR collecting tests/unit/systems/test_inference_ops.py ___________

ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_inference_ops.py'.

Hint: make sure your test modules/packages have valid Python names.

Traceback:

/usr/lib/python3.8/importlib/init.py:127: in import_module

return _bootstrap._gcd_import(name[level:], package, level)

tests/unit/systems/test_inference_ops.py:27: in 

from nvtabular import Workflow  # noqa

E   ModuleNotFoundError: No module named 'nvtabular'

____________ ERROR collecting tests/unit/systems/test_op_runner.py _____________

ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_op_runner.py'.

Hint: make sure your test modules/packages have valid Python names.

Traceback:

/usr/lib/python3.8/importlib/init.py:127: in import_module

return _bootstrap._gcd_import(name[level:], package, level)

tests/unit/systems/test_op_runner.py:23: in 

import nvtabular as nvt

E   ModuleNotFoundError: No module named 'nvtabular'

=========================== short test summary info ============================

ERROR tests/unit/systems/test_ensemble.py

ERROR tests/unit/systems/test_export.py

ERROR tests/unit/systems/test_graph.py

ERROR tests/unit/systems/test_inference_ops.py

ERROR tests/unit/systems/test_op_runner.py

!!!!!!!!!!!!!!!!!!! Interrupted: 5 errors during collection !!!!!!!!!!!!!!!!!!!!

========================= 1 skipped, 5 errors in 2.06s =========================

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_systems] $ /bin/bash /tmp/jenkins1262507776029502473.sh

nvidia-merlin-bot · 2022-06-09T01:26:05Z

Click to view CI Results

GitHub pull request #117 of commit a25c7e90bc2adad53a75499fdbdeb14059b5607e, no merge conflicts.
Running as SYSTEM
Setting status of a25c7e90bc2adad53a75499fdbdeb14059b5607e to PENDING with url https://10.20.13.93:8080/job/merlin_systems/70/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/117/*:refs/remotes/origin/pr/117/* # timeout=10
 > git rev-parse a25c7e90bc2adad53a75499fdbdeb14059b5607e^{commit} # timeout=10
Checking out Revision a25c7e90bc2adad53a75499fdbdeb14059b5607e (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a25c7e90bc2adad53a75499fdbdeb14059b5607e # timeout=10
Commit message: "Merge branch 'main' into fix_model_outputnames"
 > git rev-list --no-walk 0f9fa18556f24ec9fc194d52cc3733a31f531826 # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins12352964776961901369.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 6 items / 5 errors / 1 skipped
==================================== ERRORS ====================================

_____________ ERROR collecting tests/unit/systems/test_ensemble.py _____________

ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_ensemble.py'.

Hint: make sure your test modules/packages have valid Python names.

Traceback:

/usr/lib/python3.8/importlib/init.py:127: in import_module

return _bootstrap._gcd_import(name[level:], package, level)

tests/unit/systems/test_ensemble.py:31: in 

from nvtabular import Workflow  # noqa

E   ModuleNotFoundError: No module named 'nvtabular'

______________ ERROR collecting tests/unit/systems/test_export.py ______________

ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_export.py'.

Hint: make sure your test modules/packages have valid Python names.

Traceback:

/usr/lib/python3.8/importlib/init.py:127: in import_module

return _bootstrap._gcd_import(name[level:], package, level)

tests/unit/systems/test_export.py:23: in 

from nvtabular import Workflow, ops

E   ModuleNotFoundError: No module named 'nvtabular'

______________ ERROR collecting tests/unit/systems/test_graph.py _______________

ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_graph.py'.

Hint: make sure your test modules/packages have valid Python names.

Traceback:

/usr/lib/python3.8/importlib/init.py:127: in import_module

return _bootstrap._gcd_import(name[level:], package, level)

tests/unit/systems/test_graph.py:19: in 

from nvtabular import Workflow

E   ModuleNotFoundError: No module named 'nvtabular'

__________ ERROR collecting tests/unit/systems/test_inference_ops.py ___________

ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_inference_ops.py'.

Hint: make sure your test modules/packages have valid Python names.

Traceback:

/usr/lib/python3.8/importlib/init.py:127: in import_module

return _bootstrap._gcd_import(name[level:], package, level)

tests/unit/systems/test_inference_ops.py:27: in 

from nvtabular import Workflow  # noqa

E   ModuleNotFoundError: No module named 'nvtabular'

____________ ERROR collecting tests/unit/systems/test_op_runner.py _____________

ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_op_runner.py'.

Hint: make sure your test modules/packages have valid Python names.

Traceback:

/usr/lib/python3.8/importlib/init.py:127: in import_module

return _bootstrap._gcd_import(name[level:], package, level)

tests/unit/systems/test_op_runner.py:23: in 

import nvtabular as nvt

E   ModuleNotFoundError: No module named 'nvtabular'

=========================== short test summary info ============================

ERROR tests/unit/systems/test_ensemble.py

ERROR tests/unit/systems/test_export.py

ERROR tests/unit/systems/test_graph.py

ERROR tests/unit/systems/test_inference_ops.py

ERROR tests/unit/systems/test_op_runner.py

!!!!!!!!!!!!!!!!!!! Interrupted: 5 errors during collection !!!!!!!!!!!!!!!!!!!!

========================= 1 skipped, 5 errors in 1.99s =========================

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_systems] $ /bin/bash /tmp/jenkins4417331190025152590.sh

github-actions · 2022-06-09T01:30:19Z

Documentation preview

https://nvidia-merlin.github.io/systems/review/pr-117

benfred · 2022-06-09T04:04:32Z

merlin/systems/dag/ops/tensorflow.py

@@ -145,35 +144,37 @@ def _export_model(self, model, name, output_path, version=1):
            name=name, backend="tensorflow", platform="tensorflow_savedmodel"
        )

-        inputs, outputs = model.inputs, model.outputs
+        # inputs, outputs = model.inputs, [model.outputs]


lets delete old code rather than comment out

Suggested change

# inputs, outputs = model.inputs, [model.outputs]

benfred · 2022-06-09T04:09:02Z

merlin/systems/dag/ops/tensorflow.py


        config.parameters["TF_GRAPH_TAG"].string_value = "serve"
        config.parameters["TF_SIGNATURE_DEF"].string_value = "serving_default"

-        for col in inputs:
+        for col, col_name in zip(inputs, input_col_names):


how about instead of zipping the keys/values of the default_signature.structured_input_signature[1] - we just iterate over the items? It will be cleaner and less susceptible to bugs in the future

Suggested change

for col, col_name in zip(inputs, input_col_names):

for col, col_name in default_signature.structured_input_signature[1].items():

(and then don't create the 'input_col_names' and 'inputs' on lines 157/160

benfred · 2022-06-09T04:10:09Z

merlin/systems/dag/ops/tensorflow.py

                )
            )

-        for col in outputs:
+        for col, col_name in zip(outputs, output_col_names):


maybe the same thing as for inputs?

Suggested change

for col, col_name in zip(outputs, output_col_names):

for col, col_name in default_signature.structured_outputs.items():

…systems-1 into fix_model_outputnames

nvidia-merlin-bot · 2022-06-09T04:25:54Z

Click to view CI Results

GitHub pull request #117 of commit 38367f43dc98741eb95c39514af669c0e111ffc8, no merge conflicts.
Running as SYSTEM
Setting status of 38367f43dc98741eb95c39514af669c0e111ffc8 to PENDING with url https://10.20.13.93:8080/job/merlin_systems/74/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/117/*:refs/remotes/origin/pr/117/* # timeout=10
 > git rev-parse 38367f43dc98741eb95c39514af669c0e111ffc8^{commit} # timeout=10
Checking out Revision 38367f43dc98741eb95c39514af669c0e111ffc8 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 38367f43dc98741eb95c39514af669c0e111ffc8 # timeout=10
Commit message: "Merge branch 'fix_model_outputnames' of https://github.com/jperez999/systems-1 into fix_model_outputnames"
 > git rev-list --no-walk 2a413f6e0993969f2aeab999c9e7cf968b799e7b # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins7719179364039111327.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 18 items / 1 skipped
tests/unit/test_version.py .                                             [  5%]

tests/unit/systems/test_ensemble.py ...F                                 [ 27%]

tests/unit/systems/test_ensemble_ops.py ..                               [ 38%]

tests/unit/systems/test_export.py .                                      [ 44%]

tests/unit/systems/test_graph.py .                                       [ 50%]

tests/unit/systems/test_inference_ops.py ..                              [ 61%]

tests/unit/systems/test_op_runner.py ....                                [ 83%]

tests/unit/systems/test_tensorflow_inf_op.py ...                         [100%]
=================================== FAILURES ===================================

______________ test_workflow_tf_e2e_multi_op_plus_2_run[parquet] _______________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1')

dataset = <merlin.io.dataset.Dataset object at 0x7feddc729fd0>

engine = 'parquet'
@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("engine", ["parquet"])
def test_workflow_tf_e2e_multi_op_plus_2_run(tmpdir, dataset, engine):
    # Create a Workflow
    schema = dataset.schema
    for name in ["x", "y", "id"]:
        dataset.schema.column_schemas[name] = dataset.schema.column_schemas[name].with_tags(
            [Tags.USER]
        )

    workflow_ops = ["name-cat"] >> wf_ops.Categorify(cat_cache="host")
    workflow = Workflow(workflow_ops)
    workflow.fit(dataset)

    embedding_shapes_1 = wf_ops.get_embedding_sizes(workflow)

    cats = ["name-string"] >> wf_ops.Categorify(cat_cache="host")
    workflow_2 = Workflow(cats)
    workflow_2.fit(dataset)

    embedding_shapes = wf_ops.get_embedding_sizes(workflow_2)
    embedding_shapes_1.update(embedding_shapes)
    embedding_shapes_1["name-string_plus_2"] = embedding_shapes_1["name-string"]

    # Create Tensorflow Model
    model = create_tf_model(["name-cat", "name-string_plus_2"], [], embedding_shapes_1)

    # Creating Triton Ensemble
    triton_chain_1 = ["name-cat"] >> TransformWorkflow(workflow)
    triton_chain_2 = ["name-string"] >> TransformWorkflow(workflow_2) >> PlusTwoOp()
    triton_chain = (triton_chain_1 + triton_chain_2) >> PredictTensorflow(model)

    triton_ens = Ensemble(triton_chain, schema)

    # Creating Triton Ensemble Config
    ensemble_config, nodes_config = triton_ens.export(str(tmpdir))
    config_path = tmpdir / "ensemble_model" / "config.pbtxt"

    # Checking Triton Ensemble Config
    with open(config_path, "rb") as f:
        config = model_config.ModelConfig()
        raw_config = f.read()
        parsed = text_format.Parse(raw_config, config)

        # The config file contents are correct
        assert parsed.name == "ensemble_model"
        assert parsed.platform == "ensemble"
        assert hasattr(parsed, "ensemble_scheduling")

    df = dataset.to_ddf().compute()[["name-string", "name-cat"]].iloc[:3]


  response = _run_ensemble_on_tritonserver(str(tmpdir), ["output"], df, triton_ens.name)


tests/unit/systems/test_ensemble.py:233:

tests/unit/systems/utils/triton.py:39: in _run_ensemble_on_tritonserver

with run_triton_server(tmpdir) as client:

/usr/lib/python3.8/contextlib.py:113: in enter

return next(self.gen)

modelpath = '/tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1'
@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode


                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")


E                           RuntimeError: Tritonserver failed to start (ret=-11)
merlin/systems/triton/utils.py:46: RuntimeError

----------------------------- Captured stderr call -----------------------------

I0609 04:25:11.647692 30532 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow

I0609 04:25:11.647810 30532 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8

I0609 04:25:11.647818 30532 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8

I0609 04:25:11.647824 30532 tensorflow.cc:2216] backend configuration:

{"cmdline":{"version":"2"}}

I0609 04:25:11.847515 30532 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f2b2e000000' with size 268435456

I0609 04:25:11.848278 30532 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864

I0609 04:25:11.853700 30532 model_repository_manager.cc:997] loading: 0_transformworkflow:1

I0609 04:25:11.953954 30532 model_repository_manager.cc:997] loading: 3_predicttensorflow:1

I0609 04:25:11.955888 30532 backend.cc:46] TRITONBACKEND_Initialize: nvtabular

I0609 04:25:11.955913 30532 backend.cc:53] Triton TRITONBACKEND API version: 1.8

I0609 04:25:11.955923 30532 backend.cc:56] 'nvtabular' TRITONBACKEND API version: 1.8

I0609 04:25:11.956061 30532 backend.cc:76] Loaded libpython successfully

I0609 04:25:12.054267 30532 model_repository_manager.cc:997] loading: 2_plustwoop:1

I0609 04:25:12.123326 30532 backend.cc:89] Python interpreter is initialized

I0609 04:25:12.124240 30532 tensorflow.cc:2276] TRITONBACKEND_ModelInitialize: 3_predicttensorflow (version 1)

I0609 04:25:12.124757 30532 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'

I0609 04:25:12.154648 30532 model_repository_manager.cc:997] loading: 1_transformworkflow:1

I0609 04:25:14.061429 30532 tensorflow.cc:2325] TRITONBACKEND_ModelInstanceInitialize: 3_predicttensorflow (GPU device 0)

I0609 04:25:14.061545 30532 model_repository_manager.cc:1152] successfully loaded '0_transformworkflow' version 1

2022-06-09 04:25:15.112124: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel

2022-06-09 04:25:15.113651: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }

2022-06-09 04:25:15.113674: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel

2022-06-09 04:25:15.113782: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX

To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

2022-06-09 04:25:15.121193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12669 MB memory:  -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0

2022-06-09 04:25:15.153700: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.

2022-06-09 04:25:15.212415: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel

2022-06-09 04:25:15.224898: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 112787 microseconds.

I0609 04:25:15.225086 30532 model_repository_manager.cc:1152] successfully loaded '3_predicttensorflow' version 1

I0609 04:25:15.228319 30532 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 2_plustwoop (GPU device 0)

0609 04:25:17.244980 30625 pb_stub.cc:369] Failed to initialize Python stub: ModuleNotFoundError: No module named 'tests.unit.systems'
At:

(973): _find_and_load_unlocked

(991): _find_and_load

(1014): _gcd_import

(219): _call_with_frames_removed

(961): _find_and_load_unlocked

(991): _find_and_load

(1014): _gcd_import

(219): _call_with_frames_removed

(961): _find_and_load_unlocked

(991): _find_and_load

(1014): _gcd_import

/usr/lib/python3.8/importlib/init.py(127): import_module

/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init

/tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize
I0609 04:25:17.245194 30532 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'

I0609 04:25:17.255263 30532 model_repository_manager.cc:1152] successfully loaded '1_transformworkflow' version 1

E0609 04:25:17.256198 30532 model_repository_manager.cc:1155] failed to load '2_plustwoop' version 1: Internal: ModuleNotFoundError: No module named 'tests.unit.systems'
At:

(973): _find_and_load_unlocked

(991): _find_and_load

(1014): _gcd_import

(219): _call_with_frames_removed

(961): _find_and_load_unlocked

(991): _find_and_load

(1014): _gcd_import

(219): _call_with_frames_removed

(961): _find_and_load_unlocked

(991): _find_and_load

(1014): _gcd_import

/usr/lib/python3.8/importlib/init.py(127): import_module

/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init

/tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize
E0609 04:25:17.257480 30532 model_repository_manager.cc:1341] Invalid argument: ensemble 'ensemble_model' depends on '2_plustwoop' which has no loaded version

I0609 04:25:17.257596 30532 server.cc:524]

+------------------+------+

| Repository Agent | Path |

+------------------+------+

+------------------+------+
I0609 04:25:17.258631 30532 server.cc:551]

+------------+-----------------------------------------------------------------+-----------------------------+

| Backend    | Path                                                            | Config                      |

+------------+-----------------------------------------------------------------+-----------------------------+

| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |

| nvtabular  | /opt/tritonserver/backends/nvtabular/libtriton_nvtabular.so     | {}                          |

+------------+-----------------------------------------------------------------+-----------------------------+
I0609 04:25:17.258811 +---------------------+ | Model +---------------------+ | 0_transformworkflow | 1 | 1_transformworkflow | 1 | 2_plustwoop | | | | | | | | | | | | | | | | | 3_predicttensorflow | 1 +---------------------+

30532 server.cc:594]
 ---------+-----------------------------------------------------------------------------------------------------------+
 | Version | Status                                                                                                    |
 ---------+-----------------------------------------------------------------------------------------------------------+
 | READY                                                                                                     |
 | READY                                                                                                     |
 | 1       | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'tests.unit.systems'                          |
 |         |                                                                                                           |
 |         | At:                                                                                                       |
 |         |   (973): _find_and_load_unlocked                                             |
 |         |   (991): _find_and_load                                                      |
 |         |   (1014): _gcd_import                                                        |
 |         |   (219): _call_with_frames_removed                                           |
 |         |   (961): _find_and_load_unlocked                                             |
 |         |   (991): _find_and_load                                                      |
 |         |   (1014): _gcd_import                                                        |
 |         |   (219): _call_with_frames_removed                                           |
 |         |   (961): _find_and_load_unlocked                                             |
 |         |   (991): _find_and_load                                                      |
 |         |   (1014): _gcd_import                                                        |
 |         |   /usr/lib/python3.8/importlib/init.py(127): import_module                                            |
 |         |   /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init                    |
 |         |   /tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize |
 | READY                                                                                                     |
 ---------+-----------------------------------------------------------------------------------------------------------+
I0609 04:25:17.301600 30532 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB

I0609 04:25:17.303291 30532 tritonserver.cc:1962]

+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Option                           | Value                                                                                                                                                                                        |

+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| server_id                        | triton                                                                                                                                                                                       |

| server_version                   | 2.20.0                                                                                                                                                                                       |

| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |

| model_repository_path[0]         | /tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1                                                                                                                             |

| model_control_mode               | MODE_NONE                                                                                                                                                                                    |

| strict_model_config              | 1                                                                                                                                                                                            |

| rate_limit                       | OFF                                                                                                                                                                                          |

| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                    |

| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                     |

| response_cache_byte_size         | 0                                                                                                                                                                                            |

| min_supported_compute_capability | 6.0                                                                                                                                                                                          |

| strict_readiness                 | 1                                                                                                                                                                                            |

| exit_timeout                     | 30                                                                                                                                                                                           |

+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0609 04:25:17.303318 30532 server.cc:252] Waiting for in-flight requests to complete.

I0609 04:25:17.303327 30532 model_repository_manager.cc:1029] unloading: 3_predicttensorflow:1

I0609 04:25:17.303377 30532 model_repository_manager.cc:1029] unloading: 1_transformworkflow:1

I0609 04:25:17.303425 30532 model_repository_manager.cc:1029] unloading: 0_transformworkflow:1

I0609 04:25:17.303555 30532 server.cc:267] Timeout 30: Found 3 live models and 0 in-flight non-inference requests

I0609 04:25:17.303584 30532 tensorflow.cc:2363] TRITONBACKEND_ModelInstanceFinalize: delete instance state

I0609 04:25:17.303609 30532 backend.cc:160] TRITONBACKEND_ModelInstanceFinalize: delete instance state

------------------------------ Captured log call -------------------------------

WARNING  absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.

WARNING  absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7fede40c3c70> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.

WARNING  tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.

WARNING  absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.

WARNING  absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7fede40c3c70> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.

WARNING  tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.

=============================== warnings summary ===============================

../../../.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18

/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.

warnings.warn(
tests/unit/systems/test_ensemble.py: 7 warnings

tests/unit/systems/test_export.py: 1 warning

tests/unit/systems/test_inference_ops.py: 2 warnings

tests/unit/systems/test_op_runner.py: 4 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]

/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow  but is unused in test_name_tf model

warnings.warn(
tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]

/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow  but is unused in test_name_tf model

warnings.warn(
tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]

/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow  but is unused in test_name_tf model

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/systems/test_ensemble.py::test_workflow_tf_e2e_multi_op_plus_2_run[parquet]

======= 1 failed, 17 passed, 1 skipped, 18 warnings in 72.28s (0:01:12) ========

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_systems] $ /bin/bash /tmp/jenkins13725431453088521611.sh

nvidia-merlin-bot · 2022-06-09T04:27:12Z

Click to view CI Results

GitHub pull request #117 of commit 7331d722ebb1a84048eca4183b9bfe4f6994d76e, no merge conflicts.
Running as SYSTEM
Setting status of 7331d722ebb1a84048eca4183b9bfe4f6994d76e to PENDING with url https://10.20.13.93:8080/job/merlin_systems/75/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/117/*:refs/remotes/origin/pr/117/* # timeout=10
 > git rev-parse 7331d722ebb1a84048eca4183b9bfe4f6994d76e^{commit} # timeout=10
Checking out Revision 7331d722ebb1a84048eca4183b9bfe4f6994d76e (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 7331d722ebb1a84048eca4183b9bfe4f6994d76e # timeout=10
Commit message: "Merge branch 'main' into fix_model_outputnames"
 > git rev-list --no-walk 38367f43dc98741eb95c39514af669c0e111ffc8 # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins9215977444994546154.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 18 items / 1 skipped
tests/unit/test_version.py .                                             [  5%]

tests/unit/systems/test_ensemble.py ...F                                 [ 27%]

tests/unit/systems/test_ensemble_ops.py ..                               [ 38%]

tests/unit/systems/test_export.py .                                      [ 44%]

tests/unit/systems/test_graph.py .                                       [ 50%]

tests/unit/systems/test_inference_ops.py ..                              [ 61%]

tests/unit/systems/test_op_runner.py ....                                [ 83%]

tests/unit/systems/test_tensorflow_inf_op.py ...                         [100%]
=================================== FAILURES ===================================

______________ test_workflow_tf_e2e_multi_op_plus_2_run[parquet] _______________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1')

dataset = <merlin.io.dataset.Dataset object at 0x7fba705e0a90>

engine = 'parquet'
@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("engine", ["parquet"])
def test_workflow_tf_e2e_multi_op_plus_2_run(tmpdir, dataset, engine):
    # Create a Workflow
    schema = dataset.schema
    for name in ["x", "y", "id"]:
        dataset.schema.column_schemas[name] = dataset.schema.column_schemas[name].with_tags(
            [Tags.USER]
        )

    workflow_ops = ["name-cat"] >> wf_ops.Categorify(cat_cache="host")
    workflow = Workflow(workflow_ops)
    workflow.fit(dataset)

    embedding_shapes_1 = wf_ops.get_embedding_sizes(workflow)

    cats = ["name-string"] >> wf_ops.Categorify(cat_cache="host")
    workflow_2 = Workflow(cats)
    workflow_2.fit(dataset)

    embedding_shapes = wf_ops.get_embedding_sizes(workflow_2)
    embedding_shapes_1.update(embedding_shapes)
    embedding_shapes_1["name-string_plus_2"] = embedding_shapes_1["name-string"]

    # Create Tensorflow Model
    model = create_tf_model(["name-cat", "name-string_plus_2"], [], embedding_shapes_1)

    # Creating Triton Ensemble
    triton_chain_1 = ["name-cat"] >> TransformWorkflow(workflow)
    triton_chain_2 = ["name-string"] >> TransformWorkflow(workflow_2) >> PlusTwoOp()
    triton_chain = (triton_chain_1 + triton_chain_2) >> PredictTensorflow(model)

    triton_ens = Ensemble(triton_chain, schema)

    # Creating Triton Ensemble Config
    ensemble_config, nodes_config = triton_ens.export(str(tmpdir))
    config_path = tmpdir / "ensemble_model" / "config.pbtxt"

    # Checking Triton Ensemble Config
    with open(config_path, "rb") as f:
        config = model_config.ModelConfig()
        raw_config = f.read()
        parsed = text_format.Parse(raw_config, config)

        # The config file contents are correct
        assert parsed.name == "ensemble_model"
        assert parsed.platform == "ensemble"
        assert hasattr(parsed, "ensemble_scheduling")

    df = dataset.to_ddf().compute()[["name-string", "name-cat"]].iloc[:3]


  response = _run_ensemble_on_tritonserver(str(tmpdir), ["output"], df, triton_ens.name)


tests/unit/systems/test_ensemble.py:233:

tests/unit/systems/utils/triton.py:39: in _run_ensemble_on_tritonserver

with run_triton_server(tmpdir) as client:

/usr/lib/python3.8/contextlib.py:113: in enter

return next(self.gen)

modelpath = '/tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1'
@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode


                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")


E                           RuntimeError: Tritonserver failed to start (ret=-11)
merlin/systems/triton/utils.py:46: RuntimeError

----------------------------- Captured stderr call -----------------------------

I0609 04:26:29.298085 31461 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow

I0609 04:26:29.298205 31461 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8

I0609 04:26:29.298213 31461 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8

I0609 04:26:29.298218 31461 tensorflow.cc:2216] backend configuration:

{"cmdline":{"version":"2"}}

I0609 04:26:29.483755 31461 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fe5c4000000' with size 268435456

I0609 04:26:29.484503 31461 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864

I0609 04:26:29.489471 31461 model_repository_manager.cc:997] loading: 0_transformworkflow:1

I0609 04:26:29.589693 31461 model_repository_manager.cc:997] loading: 3_predicttensorflow:1

I0609 04:26:29.592876 31461 backend.cc:46] TRITONBACKEND_Initialize: nvtabular

I0609 04:26:29.592914 31461 backend.cc:53] Triton TRITONBACKEND API version: 1.8

I0609 04:26:29.592931 31461 backend.cc:56] 'nvtabular' TRITONBACKEND API version: 1.8

I0609 04:26:29.593163 31461 backend.cc:76] Loaded libpython successfully

I0609 04:26:29.689936 31461 model_repository_manager.cc:997] loading: 2_plustwoop:1

I0609 04:26:29.766969 31461 backend.cc:89] Python interpreter is initialized

I0609 04:26:29.767911 31461 tensorflow.cc:2276] TRITONBACKEND_ModelInitialize: 3_predicttensorflow (version 1)

I0609 04:26:29.768411 31461 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'

I0609 04:26:29.790233 31461 model_repository_manager.cc:997] loading: 1_transformworkflow:1

I0609 04:26:31.658590 31461 tensorflow.cc:2325] TRITONBACKEND_ModelInstanceInitialize: 3_predicttensorflow (GPU device 0)

I0609 04:26:31.658707 31461 model_repository_manager.cc:1152] successfully loaded '0_transformworkflow' version 1

2022-06-09 04:26:32.722389: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel

2022-06-09 04:26:32.724183: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }

2022-06-09 04:26:32.724208: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel

2022-06-09 04:26:32.724315: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX

To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

2022-06-09 04:26:32.728495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12669 MB memory:  -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0

2022-06-09 04:26:32.771315: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.

2022-06-09 04:26:32.833870: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel

2022-06-09 04:26:32.846832: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 124456 microseconds.

I0609 04:26:32.847162 31461 model_repository_manager.cc:1152] successfully loaded '3_predicttensorflow' version 1

I0609 04:26:32.852485 31461 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 2_plustwoop (GPU device 0)

0609 04:26:34.870913 31554 pb_stub.cc:369] Failed to initialize Python stub: ModuleNotFoundError: No module named 'tests.unit.systems'
At:

(973): _find_and_load_unlocked

(991): _find_and_load

(1014): _gcd_import

(219): _call_with_frames_removed

(961): _find_and_load_unlocked

(991): _find_and_load

(1014): _gcd_import

(219): _call_with_frames_removed

(961): _find_and_load_unlocked

(991): _find_and_load

(1014): _gcd_import

/usr/lib/python3.8/importlib/init.py(127): import_module

/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init

/tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize
I0609 04:26:34.871145 31461 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'

I0609 04:26:34.880893 31461 model_repository_manager.cc:1152] successfully loaded '1_transformworkflow' version 1

E0609 04:26:34.882188 31461 model_repository_manager.cc:1155] failed to load '2_plustwoop' version 1: Internal: ModuleNotFoundError: No module named 'tests.unit.systems'
At:

(973): _find_and_load_unlocked

(991): _find_and_load

(1014): _gcd_import

(219): _call_with_frames_removed

(961): _find_and_load_unlocked

(991): _find_and_load

(1014): _gcd_import

(219): _call_with_frames_removed

(961): _find_and_load_unlocked

(991): _find_and_load

(1014): _gcd_import

/usr/lib/python3.8/importlib/init.py(127): import_module

/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init

/tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize
E0609 04:26:34.883528 31461 model_repository_manager.cc:1341] Invalid argument: ensemble 'ensemble_model' depends on '2_plustwoop' which has no loaded version

I0609 04:26:34.883649 31461 server.cc:524]

+------------------+------+

| Repository Agent | Path |

+------------------+------+

+------------------+------+
I0609 04:26:34.884676 31461 server.cc:551]

+------------+-----------------------------------------------------------------+-----------------------------+

| Backend    | Path                                                            | Config                      |

+------------+-----------------------------------------------------------------+-----------------------------+

| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |

| nvtabular  | /opt/tritonserver/backends/nvtabular/libtriton_nvtabular.so     | {}                          |

+------------+-----------------------------------------------------------------+-----------------------------+
I0609 04:26:34.884857 31461 server.cc:594]

+---------------------+---------+-----------------------------------------------------------------------------------------------------------+

| Model               | Version | Status                                                                                                    |

+---------------------+---------+-----------------------------------------------------------------------------------------------------------+

| 0_transformworkflow | 1       | READY                                                                                                     |

| 1_transformworkflow | 1       | READY                                                                                                     |

| 2_plustwoop         | 1       | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'tests.unit.systems'                          |

|                     |         |                                                                                                           |

|                     |         | At:                                                                                                       |

|                     |         |   (973): _find_and_load_unlocked                                             |

|                     |         |   (991): _find_and_load                                                      |

|                     |         |   (1014): _gcd_import                                                        |

|                     |         |   (219): _call_with_frames_removed                                           |

|                     |         |   (961): _find_and_load_unlocked                                             |

|                     |         |   (991): _find_and_load                                                      |

|                     |         |   (1014): _gcd_import                                                        |

|                     |         |   (219): _call_with_frames_removed                                           |

|                     |         |   (961): _find_and_load_unlocked                                             |

|                     |         |   (991): _find_and_load                                                      |

|                     |         |   (1014): _gcd_import                                                        |

|                     |         |   /usr/lib/python3.8/importlib/init.py(127): import_module                                            |

|                     |         |   /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init                    |

|                     |         |   /tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize |

| 3_predicttensorflow | 1       | READY                                                                                                     |

+---------------------+---------+-----------------------------------------------------------------------------------------------------------+
I0609 04:26:34.927589 31461 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB

I0609 04:26:34.929190 31461 tritonserver.cc:1962]

+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| Option                           | Value                                                                                                                                                                                        |

+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

| server_id                        | triton                                                                                                                                                                                       |

| server_version                   | 2.20.0                                                                                                                                                                                       |

| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |

| model_repository_path[0]         | /tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1                                                                                                                             |

| model_control_mode               | MODE_NONE                                                                                                                                                                                    |

| strict_model_config              | 1                                                                                                                                                                                            |

| rate_limit                       | OFF                                                                                                                                                                                          |

| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                    |

| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                     |

| response_cache_byte_size         | 0                                                                                                                                                                                            |

| min_supported_compute_capability | 6.0                                                                                                                                                                                          |

| strict_readiness                 | 1                                                                                                                                                                                            |

| exit_timeout                     | 30                                                                                                                                                                                           |

+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
I0609 04:26:34.929217 31461 server.cc:252] Waiting for in-flight requests to complete.

I0609 04:26:34.929225 31461 model_repository_manager.cc:1029] unloading: 3_predicttensorflow:1

I0609 04:26:34.929277 31461 model_repository_manager.cc:1029] unloading: 1_transformworkflow:1

I0609 04:26:34.929330 31461 model_repository_manager.cc:1029] unloading: 0_transformworkflow:1

I0609 04:26:34.929453 31461 tensorflow.cc:2363] TRITONBACKEND_ModelInstanceFinalize: delete instance state

I0609 04:26:34.929466 31461 server.cc:267] Timeout 30: Found 3 live models and 0 in-flight non-inference requestsI0609 04:26:34.929523 31461 backend.cc:160] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0609 04:26:34.929547 31461 backend.cc:160] TRITONBACKEND_ModelInstanceFinalize: delete instance state

------------------------------ Captured log call -------------------------------

WARNING  absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.

WARNING  absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7fba7f38b250> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.

WARNING  tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.

WARNING  absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.

WARNING  absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7fba7f38b250> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.

WARNING  tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.

=============================== warnings summary ===============================

../../../.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18

/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.

warnings.warn(
tests/unit/systems/test_ensemble.py: 7 warnings

tests/unit/systems/test_export.py: 1 warning

tests/unit/systems/test_inference_ops.py: 2 warnings

tests/unit/systems/test_op_runner.py: 4 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]

/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow  but is unused in test_name_tf model

warnings.warn(
tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]

/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow  but is unused in test_name_tf model

warnings.warn(
tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]

/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow  but is unused in test_name_tf model

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/systems/test_ensemble.py::test_workflow_tf_e2e_multi_op_plus_2_run[parquet]

======= 1 failed, 17 passed, 1 skipped, 18 warnings in 72.45s (0:01:12) ========

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_systems] $ /bin/bash /tmp/jenkins11956088136903943840.sh

…systems-1 into fix_model_outputnames

jperez999 · 2022-06-09T15:39:57Z

rerun tests

jperez999 · 2022-06-09T15:41:51Z

rerun tests

nvidia-merlin-bot · 2022-06-09T15:43:15Z

Click to view CI Results

GitHub pull request #117 of commit 1e1226f81fe02bbdf1ede45ff8401ffaedbb01e0, no merge conflicts. Running as SYSTEM Setting status of 1e1226f81fe02bbdf1ede45ff8401ffaedbb01e0 to PENDING with url https://10.20.13.93:8080/job/merlin_systems/76/console and message: 'Pending' Using context: Jenkins Building on master in workspace /var/jenkins_home/workspace/merlin_systems using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e > git rev-parse --is-inside-work-tree # timeout=10 Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10 Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems > git --version # timeout=10 using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/117/*:refs/remotes/origin/pr/117/* # timeout=10 > git rev-parse 1e1226f81fe02bbdf1ede45ff8401ffaedbb01e0^{commit} # timeout=10 Checking out Revision 1e1226f81fe02bbdf1ede45ff8401ffaedbb01e0 (detached) > git config core.sparsecheckout # timeout=10 > git checkout -f 1e1226f81fe02bbdf1ede45ff8401ffaedbb01e0 # timeout=10 Commit message: "Merge branch 'fix_model_outputnames' of https://github.com/jperez999/systems-1 into fix_model_outputnames" > git rev-list --no-walk 7331d722ebb1a84048eca4183b9bfe4f6994d76e # timeout=10 [merlin_systems] $ /bin/bash /tmp/jenkins3091504900765628636.sh ============================= test session starts ============================== platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0 collected 18 items / 1 skipped

tests/unit/test_version.py . [ 5%]
tests/unit/systems/test_ensemble.py FF.F [ 27%]
tests/unit/systems/test_ensemble_ops.py .. [ 38%]
tests/unit/systems/test_export.py . [ 44%]
tests/unit/systems/test_graph.py . [ 50%]
tests/unit/systems/test_inference_ops.py .. [ 61%]
tests/unit/systems/test_op_runner.py .... [ 83%]
tests/unit/systems/test_tensorflow_inf_op.py ... [100%]

=================================== FAILURES ===================================
______________ test_workflow_tf_e2e_config_verification[parquet] _______________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_config_ve0')
dataset = <merlin.io.dataset.Dataset object at 0x7fef280bd100>
engine = 'parquet'

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("engine", ["parquet"])
def test_workflow_tf_e2e_config_verification(tmpdir, dataset, engine):
    # Create a Workflow
    schema = dataset.schema
    for name in ["x", "y", "id"]:
        dataset.schema.column_schemas[name] = dataset.schema.column_schemas[name].with_tags(
            [Tags.USER]
        )
    selector = ColumnSelector(["x", "y", "id"])

    workflow_ops = selector >> wf_ops.Rename(postfix="_nvt")
    workflow = Workflow(workflow_ops["x_nvt"])
    workflow.fit(dataset)

    # Create Tensorflow Model
    model = tf.keras.models.Sequential(
        [
            tf.keras.Input(name="x_nvt", dtype=tf.float64, shape=(1,)),
            tf.keras.layers.Dense(16, activation="relu"),
            tf.keras.layers.Dropout(0.2),
            tf.keras.layers.Dense(1, name="output"),
        ]
    )
    model.compile(
        optimizer="adam",
        loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=[tf.metrics.SparseCategoricalAccuracy()],
    )

    # Creating Triton Ensemble
    triton_chain = (
        selector >> TransformWorkflow(workflow, cats=["x_nvt"]) >> PredictTensorflow(model)
    )
    triton_ens = Ensemble(triton_chain, schema)

    # Creating Triton Ensemble Config
    ensemble_config, node_configs = triton_ens.export(str(tmpdir))

    config_path = tmpdir / "ensemble_model" / "config.pbtxt"

    # Checking Triton Ensemble Config
    with open(config_path, "rb") as f:
        config = model_config.ModelConfig()
        raw_config = f.read()
        parsed = text_format.Parse(raw_config, config)

        # The config file contents are correct
        assert parsed.name == "ensemble_model"
        assert parsed.platform == "ensemble"
        assert hasattr(parsed, "ensemble_scheduling")

    df = make_df({"x": [1.0, 2.0, 3.0], "y": [4.0, 5.0, 6.0], "id": [7, 8, 9]})

    output_columns = triton_ens.graph.output_schema.column_names

  response = _run_ensemble_on_tritonserver(str(tmpdir), output_columns, df, triton_ens.name)

tests/unit/systems/test_ensemble.py:113:

tests/unit/systems/utils/triton.py:39: in _run_ensemble_on_tritonserver
with run_triton_server(tmpdir) as client:
/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)

modelpath = '/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_config_ve0'

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode

                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
2022-06-09 15:42:11.233899: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-09 15:42:12.190131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-06-09 15:42:12.190883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15157 MB memory: -> device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0
I0609 15:42:14.805777 2881 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
I0609 15:42:14.805874 2881 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8
I0609 15:42:14.805881 2881 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8
I0609 15:42:14.805886 2881 tensorflow.cc:2216] backend configuration:
{"cmdline":{"version":"2"}}
I0609 15:42:14.997282 2881 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f51de000000' with size 268435456
I0609 15:42:14.997989 2881 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0609 15:42:15.001390 2881 model_repository_manager.cc:997] loading: 0_transformworkflow:1
I0609 15:42:15.101670 2881 model_repository_manager.cc:997] loading: 1_predicttensorflow:1
I0609 15:42:15.106969 2881 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 0_transformworkflow (GPU device 0)
I0609 15:42:15.477925 2881 tensorflow.cc:2276] TRITONBACKEND_ModelInitialize: 1_predicttensorflow (version 1)
I0609 15:42:15.479454 2881 tensorflow.cc:2325] TRITONBACKEND_ModelInstanceInitialize: 1_predicttensorflow (GPU device 0)
2022-06-09 15:42:15.480270: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_config_ve0/1_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:15.483609: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-06-09 15:42:15.483656: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_config_ve0/1_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:15.483819: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-09 15:42:15.522577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12899 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-06-09 15:42:15.559190: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-06-09 15:42:15.587373: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_config_ve0/1_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:15.595686: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 115432 microseconds.
I0609 15:42:15.595858 2881 model_repository_manager.cc:1152] successfully loaded '1_predicttensorflow' version 1
E0609 15:42:15.596171 2881 model_repository_manager.cc:1155] failed to load '0_transformworkflow' version 1: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_config_ve0/0_transformworkflow/1/model.py
E0609 15:42:15.596347 2881 model_repository_manager.cc:1341] Invalid argument: ensemble 'ensemble_model' depends on '0_transformworkflow' which has no loaded version
I0609 15:42:15.596451 2881 server.cc:524]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0609 15:42:15.597245 2881 server.cc:551]
+------------+-----------------------------------------------------------------+-----------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+-----------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |
+------------+-----------------------------------------------------------------+-----------------------------+

I0609 15:42:15.597327 2881 server.cc:594]
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0_transformworkflow | 1 | UNAVAILABLE: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_config_ve0/0_transformworkflow/1/model.py |
| 1_predicttensorflow | 1 | READY |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0609 15:42:15.644381 2881 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0609 15:42:15.645919 2881 tritonserver.cc:1962]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.20.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_config_ve0 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0609 15:42:15.645944 2881 server.cc:252] Waiting for in-flight requests to complete.
I0609 15:42:15.645952 2881 model_repository_manager.cc:1029] unloading: 1_predicttensorflow:1
I0609 15:42:15.645995 2881 server.cc:267] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0609 15:42:15.646107 2881 tensorflow.cc:2363] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0609 15:42:15.646289 2881 tensorflow.cc:2302] TRITONBACKEND_ModelFinalize: delete model state
I0609 15:42:15.649799 2881 model_repository_manager.cc:1135] successfully unloaded '1_predicttensorflow' version 1
I0609 15:42:16.646075 2881 server.cc:267] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0609 15:42:16.672382 2881 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0
------------------------------ Captured log call -------------------------------
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
__________________ test_workflow_tf_e2e_multi_op_run[parquet] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0')
dataset = <merlin.io.dataset.Dataset object at 0x7fef280103d0>
engine = 'parquet'

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("engine", ["parquet"])
def test_workflow_tf_e2e_multi_op_run(tmpdir, dataset, engine):
    # Create a Workflow
    schema = dataset.schema
    for name in ["x", "y", "id"]:
        dataset.schema.column_schemas[name] = dataset.schema.column_schemas[name].with_tags(
            [Tags.USER]
        )

    workflow_ops = ["name-cat"] >> wf_ops.Categorify(cat_cache="host")
    workflow = Workflow(workflow_ops)
    workflow.fit(dataset)

    embedding_shapes_1 = wf_ops.get_embedding_sizes(workflow)

    cats = ["name-string"] >> wf_ops.Categorify(cat_cache="host")
    workflow_2 = Workflow(cats)
    workflow_2.fit(dataset)

    embedding_shapes = wf_ops.get_embedding_sizes(workflow_2)
    embedding_shapes_1.update(embedding_shapes)
    # Create Tensorflow Model
    model = create_tf_model(["name-cat", "name-string"], [], embedding_shapes_1)

    # Creating Triton Ensemble
    triton_chain_1 = ["name-cat"] >> TransformWorkflow(workflow)
    triton_chain_2 = ["name-string"] >> TransformWorkflow(workflow_2)
    triton_chain = (triton_chain_1 + triton_chain_2) >> PredictTensorflow(model)

    triton_ens = Ensemble(triton_chain, schema)

    # Creating Triton Ensemble Config
    ensemble_config, nodes_config = triton_ens.export(str(tmpdir))
    config_path = tmpdir / "ensemble_model" / "config.pbtxt"

    # Checking Triton Ensemble Config
    with open(config_path, "rb") as f:
        config = model_config.ModelConfig()
        raw_config = f.read()
        parsed = text_format.Parse(raw_config, config)

        # The config file contents are correct
        assert parsed.name == "ensemble_model"
        assert parsed.platform == "ensemble"
        assert hasattr(parsed, "ensemble_scheduling")

    df = dataset.to_ddf().compute()[["name-string", "name-cat"]].iloc[:3]

  response = _run_ensemble_on_tritonserver(str(tmpdir), ["output"], df, triton_ens.name)

tests/unit/systems/test_ensemble.py:166:

tests/unit/systems/utils/triton.py:39: in _run_ensemble_on_tritonserver
with run_triton_server(tmpdir) as client:
/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)

modelpath = '/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0'

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode

                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0609 15:42:23.214608 2938 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
I0609 15:42:23.214724 2938 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8
I0609 15:42:23.214732 2938 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8
I0609 15:42:23.214737 2938 tensorflow.cc:2216] backend configuration:
{"cmdline":{"version":"2"}}
I0609 15:42:23.404251 2938 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fcb46000000' with size 268435456
I0609 15:42:23.404954 2938 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0609 15:42:23.409224 2938 model_repository_manager.cc:997] loading: 0_transformworkflow:1
I0609 15:42:23.509443 2938 model_repository_manager.cc:997] loading: 1_transformworkflow:1
I0609 15:42:23.516618 2938 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 0_transformworkflow (GPU device 0)
I0609 15:42:23.609671 2938 model_repository_manager.cc:997] loading: 2_predicttensorflow:1
I0609 15:42:23.904527 2938 tensorflow.cc:2276] TRITONBACKEND_ModelInitialize: 2_predicttensorflow (version 1)
I0609 15:42:23.905430 2938 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 1_transformworkflow (GPU device 0)
E0609 15:42:23.907266 2938 model_repository_manager.cc:1155] failed to load '0_transformworkflow' version 1: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0/0_transformworkflow/1/model.py
I0609 15:42:23.923612 2938 tensorflow.cc:2325] TRITONBACKEND_ModelInstanceInitialize: 2_predicttensorflow (GPU device 0)
2022-06-09 15:42:23.924469: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0/2_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:23.929957: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-06-09 15:42:23.930008: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0/2_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:23.930189: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-09 15:42:23.973176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12899 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-06-09 15:42:24.020459: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-06-09 15:42:24.056646: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0/2_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:24.069248: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 144800 microseconds.
I0609 15:42:24.069412 2938 model_repository_manager.cc:1152] successfully loaded '2_predicttensorflow' version 1
E0609 15:42:24.069872 2938 model_repository_manager.cc:1155] failed to load '1_transformworkflow' version 1: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0/1_transformworkflow/1/model.py
E0609 15:42:24.069954 2938 model_repository_manager.cc:1341] Invalid argument: ensemble 'ensemble_model' depends on '1_transformworkflow' which has no loaded version
I0609 15:42:24.070029 2938 server.cc:524]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0609 15:42:24.070196 2938 server.cc:551]
+------------+-----------------------------------------------------------------+-----------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+-----------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |
+------------+-----------------------------------------------------------------+-----------------------------+

I0609 15:42:24.071069 2938 server.cc:594]
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0_transformworkflow | 1 | UNAVAILABLE: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0/0_transformworkflow/1/model.py |
| 1_transformworkflow | 1 | UNAVAILABLE: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0/1_transformworkflow/1/model.py |
| 2_predicttensorflow | 1 | READY |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0609 15:42:24.117104 2938 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0609 15:42:24.118682 2938 tritonserver.cc:1962]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.20.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0609 15:42:24.118711 2938 server.cc:252] Waiting for in-flight requests to complete.
I0609 15:42:24.118719 2938 model_repository_manager.cc:1029] unloading: 2_predicttensorflow:1
I0609 15:42:24.118765 2938 server.cc:267] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0609 15:42:24.118855 2938 tensorflow.cc:2363] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0609 15:42:24.119011 2938 tensorflow.cc:2302] TRITONBACKEND_ModelFinalize: delete model state
I0609 15:42:24.123806 2938 model_repository_manager.cc:1135] successfully unloaded '2_predicttensorflow' version 1
I0609 15:42:25.118845 2938 server.cc:267] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0609 15:42:25.137262 2938 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0
------------------------------ Captured log call -------------------------------
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string with unsupported characters which will be renamed to name_cat, name_string in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7fef27cc86d0> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string with unsupported characters which will be renamed to name_cat, name_string in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7fef27cc86d0> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
______________ test_workflow_tf_e2e_multi_op_plus_2_run[parquet] _______________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1')
dataset = <merlin.io.dataset.Dataset object at 0x7feed4437490>
engine = 'parquet'

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("engine", ["parquet"])
def test_workflow_tf_e2e_multi_op_plus_2_run(tmpdir, dataset, engine):
    # Create a Workflow
    schema = dataset.schema
    for name in ["x", "y", "id"]:
        dataset.schema.column_schemas[name] = dataset.schema.column_schemas[name].with_tags(
            [Tags.USER]
        )

    workflow_ops = ["name-cat"] >> wf_ops.Categorify(cat_cache="host")
    workflow = Workflow(workflow_ops)
    workflow.fit(dataset)

    embedding_shapes_1 = wf_ops.get_embedding_sizes(workflow)

    cats = ["name-string"] >> wf_ops.Categorify(cat_cache="host")
    workflow_2 = Workflow(cats)
    workflow_2.fit(dataset)

    embedding_shapes = wf_ops.get_embedding_sizes(workflow_2)
    embedding_shapes_1.update(embedding_shapes)
    embedding_shapes_1["name-string_plus_2"] = embedding_shapes_1["name-string"]

    # Create Tensorflow Model
    model = create_tf_model(["name-cat", "name-string_plus_2"], [], embedding_shapes_1)

    # Creating Triton Ensemble
    triton_chain_1 = ["name-cat"] >> TransformWorkflow(workflow)
    triton_chain_2 = ["name-string"] >> TransformWorkflow(workflow_2) >> PlusTwoOp()
    triton_chain = (triton_chain_1 + triton_chain_2) >> PredictTensorflow(model)

    triton_ens = Ensemble(triton_chain, schema)

    # Creating Triton Ensemble Config
    ensemble_config, nodes_config = triton_ens.export(str(tmpdir))
    config_path = tmpdir / "ensemble_model" / "config.pbtxt"

    # Checking Triton Ensemble Config
    with open(config_path, "rb") as f:
        config = model_config.ModelConfig()
        raw_config = f.read()
        parsed = text_format.Parse(raw_config, config)

        # The config file contents are correct
        assert parsed.name == "ensemble_model"
        assert parsed.platform == "ensemble"
        assert hasattr(parsed, "ensemble_scheduling")

    df = dataset.to_ddf().compute()[["name-string", "name-cat"]].iloc[:3]

  response = _run_ensemble_on_tritonserver(str(tmpdir), ["output"], df, triton_ens.name)

tests/unit/systems/test_ensemble.py:233:

tests/unit/systems/utils/triton.py:39: in _run_ensemble_on_tritonserver
with run_triton_server(tmpdir) as client:
/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)

modelpath = '/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1'

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode

                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0609 15:42:30.926682 2993 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
I0609 15:42:30.926795 2993 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8
I0609 15:42:30.926802 2993 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8
I0609 15:42:30.926808 2993 tensorflow.cc:2216] backend configuration:
{"cmdline":{"version":"2"}}
I0609 15:42:31.113360 2993 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f6436000000' with size 268435456
I0609 15:42:31.114107 2993 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0609 15:42:31.119056 2993 model_repository_manager.cc:997] loading: 0_transformworkflow:1
I0609 15:42:31.219296 2993 model_repository_manager.cc:997] loading: 3_predicttensorflow:1
I0609 15:42:31.224739 2993 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 0_transformworkflow (GPU device 0)
I0609 15:42:31.319618 2993 model_repository_manager.cc:997] loading: 2_plustwoop:1
I0609 15:42:31.419945 2993 model_repository_manager.cc:997] loading: 1_transformworkflow:1
I0609 15:42:31.601391 2993 tensorflow.cc:2276] TRITONBACKEND_ModelInitialize: 3_predicttensorflow (version 1)
I0609 15:42:31.603603 2993 tensorflow.cc:2325] TRITONBACKEND_ModelInstanceInitialize: 3_predicttensorflow (GPU device 0)
E0609 15:42:31.604931 2993 model_repository_manager.cc:1155] failed to load '0_transformworkflow' version 1: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/0_transformworkflow/1/model.py
2022-06-09 15:42:31.605283: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:31.610824: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-06-09 15:42:31.610873: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:31.611052: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-09 15:42:31.653077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12899 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-06-09 15:42:31.686120: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-06-09 15:42:31.721400: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:31.733692: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 128426 microseconds.
I0609 15:42:31.733767 2993 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 2_plustwoop (GPU device 0)
I0609 15:42:31.735323 2993 model_repository_manager.cc:1152] successfully loaded '3_predicttensorflow' version 1
0609 15:42:33.611250 3042 pb_stub.cc:369] Failed to initialize Python stub: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize

I0609 15:42:33.611432 2993 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 1_transformworkflow (GPU device 0)
E0609 15:42:33.611613 2993 model_repository_manager.cc:1155] failed to load '2_plustwoop' version 1: Internal: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize

E0609 15:42:33.632772 2993 model_repository_manager.cc:1155] failed to load '1_transformworkflow' version 1: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/1_transformworkflow/1/model.py
E0609 15:42:33.632865 2993 model_repository_manager.cc:1341] Invalid argument: ensemble 'ensemble_model' depends on '2_plustwoop' which has no loaded version
I0609 15:42:33.632947 2993 server.cc:524]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0609 15:42:33.633796 2993 server.cc:551]
+------------+-----------------------------------------------------------------+-----------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+-----------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |
+------------+-----------------------------------------------------------------+-----------------------------+

I0609 15:42:33.633945 2993 server.cc:594]
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0_transformworkflow | 1 | UNAVAILABLE: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/0_transformworkflow/1/model.py |
| 1_transformworkflow | 1 | UNAVAILABLE: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/1_transformworkflow/1/model.py |
| 2_plustwoop | 1 | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'tests.unit.systems' |
| | | |
| | | At: |
| | | (973): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | /usr/lib/python3.8/importlib/init.py(127): import_module |
| | | /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init |
| | | /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize |
| 3_predicttensorflow | 1 | READY |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0609 15:42:33.681625 2993 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0609 15:42:33.683339 2993 tritonserver.cc:1962]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.20.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0609 15:42:33.683374 2993 server.cc:252] Waiting for in-flight requests to complete.
I0609 15:42:33.683384 2993 model_repository_manager.cc:1029] unloading: 3_predicttensorflow:1
I0609 15:42:33.683437 2993 server.cc:267] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0609 15:42:33.683566 2993 tensorflow.cc:2363] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0609 15:42:33.684024 2993 tensorflow.cc:2302] TRITONBACKEND_ModelFinalize: delete model state
I0609 15:42:33.690946 2993 model_repository_manager.cc:1135] successfully unloaded '3_predicttensorflow' version 1
I0609 15:42:34.683526 2993 server.cc:267] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0609 15:42:34.708914 2993 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0
------------------------------ Captured log call -------------------------------
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7fef27c3da90> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7fef27c3da90> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
=============================== warnings summary ===============================
../../../.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18
/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/systems/test_ensemble.py: 7 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/systems/test_ensemble.py::test_workflow_tf_e2e_config_verification[parquet]
FAILED tests/unit/systems/test_ensemble.py::test_workflow_tf_e2e_multi_op_run[parquet]
FAILED tests/unit/systems/test_ensemble.py::test_workflow_tf_e2e_multi_op_plus_2_run[parquet]
======= 3 failed, 15 passed, 1 skipped, 18 warnings in 66.41s (0:01:06) ========
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins14077914574026463453.sh

benfred · 2022-06-09T17:20:24Z

merlin/systems/dag/ops/workflow.py

@@ -91,7 +91,7 @@ def export(self, path, input_schema, output_schema, node_id=None, version=1):
            modified_workflow,
            node_name,
            node_export_path,
-            backend="python",
+            backend="nvtabular",


Will this work on the containers? I saw this PR go through, and I'm wondering if the NVT backend is still available https://github.com/NVIDIA-Merlin/Merlin/pull/378/files

thats why I need the other PR in merlin to go through also... need to recook

rnyak and others added 3 commits June 2, 2022 22:27

fix output names

3b85d25

fixes to ensure use of correct keys from tf models

0f9fa18

Merge branch 'main' into fix_model_outputnames

a25c7e9

jperez999 requested a review from benfred June 9, 2022 01:26

jperez999 mentioned this pull request Jun 9, 2022

Move PoC unit test to unit folder and fix POC nb NVIDIA-Merlin/Merlin#364

Closed

benfred approved these changes Jun 9, 2022

View reviewed changes

jperez999 and others added 3 commits June 9, 2022 00:22

clean up code

361c9d4

Merge branch 'fix_model_outputnames' of https://github.com/jperez999/…

38367f4

…systems-1 into fix_model_outputnames

Merge branch 'main' into fix_model_outputnames

7331d72

jperez999 self-assigned this Jun 9, 2022

jperez999 added the bug Something isn't working label Jun 9, 2022

jperez999 added this to the Merlin 22.06 milestone Jun 9, 2022

jperez999 added 2 commits June 9, 2022 11:38

scrubbing nvt backend until fixed

b15b745

Merge branch 'fix_model_outputnames' of https://github.com/jperez999/…

1e1226f

…systems-1 into fix_model_outputnames

revert backend change

604d049

benfred reviewed Jun 9, 2022

View reviewed changes

jperez999 merged commit 6bc1347 into NVIDIA-Merlin:main Jun 9, 2022

rnyak mentioned this pull request Jun 13, 2022

Fix broken PoC notebook due mismatching output names between config file and saved ranking model #112

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes Merlin e2e example #117

Fixes Merlin e2e example #117

jperez999 commented Jun 9, 2022

nvidia-merlin-bot commented Jun 9, 2022

nvidia-merlin-bot commented Jun 9, 2022

github-actions bot commented Jun 9, 2022

benfred Jun 9, 2022

benfred Jun 9, 2022

benfred Jun 9, 2022

nvidia-merlin-bot commented Jun 9, 2022

nvidia-merlin-bot commented Jun 9, 2022

jperez999 commented Jun 9, 2022

jperez999 commented Jun 9, 2022

nvidia-merlin-bot commented Jun 9, 2022

benfred Jun 9, 2022

jperez999 Jun 9, 2022

	for col, col_name in zip(inputs, input_col_names):
	for col, col_name in default_signature.structured_input_signature[1].items():

	for col, col_name in zip(outputs, output_col_names):
	for col, col_name in default_signature.structured_outputs.items():

Fixes Merlin e2e example #117

Fixes Merlin e2e example #117

Conversation

jperez999 commented Jun 9, 2022

nvidia-merlin-bot commented Jun 9, 2022

nvidia-merlin-bot commented Jun 9, 2022

github-actions bot commented Jun 9, 2022

Documentation preview

benfred Jun 9, 2022

Choose a reason for hiding this comment

benfred Jun 9, 2022

Choose a reason for hiding this comment

benfred Jun 9, 2022

Choose a reason for hiding this comment

nvidia-merlin-bot commented Jun 9, 2022

nvidia-merlin-bot commented Jun 9, 2022

jperez999 commented Jun 9, 2022

jperez999 commented Jun 9, 2022

nvidia-merlin-bot commented Jun 9, 2022

benfred Jun 9, 2022

Choose a reason for hiding this comment

jperez999 Jun 9, 2022

Choose a reason for hiding this comment