Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes Merlin e2e example #117

Merged
merged 9 commits into from
Jun 9, 2022

Conversation

jperez999
Copy link
Collaborator

ensures we pull correct column information from Tensorflow models.

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #117 of commit 0f9fa18556f24ec9fc194d52cc3733a31f531826, no merge conflicts.
Running as SYSTEM
Setting status of 0f9fa18556f24ec9fc194d52cc3733a31f531826 to PENDING with url https://10.20.13.93:8080/job/merlin_systems/69/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/117/*:refs/remotes/origin/pr/117/* # timeout=10
 > git rev-parse 0f9fa18556f24ec9fc194d52cc3733a31f531826^{commit} # timeout=10
Checking out Revision 0f9fa18556f24ec9fc194d52cc3733a31f531826 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 0f9fa18556f24ec9fc194d52cc3733a31f531826 # timeout=10
Commit message: "fixes to ensure use of correct keys from tf models"
 > git rev-list --no-walk 2a413f6e0993969f2aeab999c9e7cf968b799e7b # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins14260605814698537897.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 6 items / 5 errors / 1 skipped

==================================== ERRORS ====================================
_____________ ERROR collecting tests/unit/systems/test_ensemble.py _____________
ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_ensemble.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.8/importlib/init.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/unit/systems/test_ensemble.py:31: in
from nvtabular import Workflow # noqa
E ModuleNotFoundError: No module named 'nvtabular'
______________ ERROR collecting tests/unit/systems/test_export.py ______________
ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_export.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.8/importlib/init.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/unit/systems/test_export.py:23: in
from nvtabular import Workflow, ops
E ModuleNotFoundError: No module named 'nvtabular'
______________ ERROR collecting tests/unit/systems/test_graph.py _______________
ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_graph.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.8/importlib/init.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/unit/systems/test_graph.py:19: in
from nvtabular import Workflow
E ModuleNotFoundError: No module named 'nvtabular'
__________ ERROR collecting tests/unit/systems/test_inference_ops.py ___________
ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_inference_ops.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.8/importlib/init.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/unit/systems/test_inference_ops.py:27: in
from nvtabular import Workflow # noqa
E ModuleNotFoundError: No module named 'nvtabular'
____________ ERROR collecting tests/unit/systems/test_op_runner.py _____________
ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_op_runner.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.8/importlib/init.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/unit/systems/test_op_runner.py:23: in
import nvtabular as nvt
E ModuleNotFoundError: No module named 'nvtabular'
=========================== short test summary info ============================
ERROR tests/unit/systems/test_ensemble.py
ERROR tests/unit/systems/test_export.py
ERROR tests/unit/systems/test_graph.py
ERROR tests/unit/systems/test_inference_ops.py
ERROR tests/unit/systems/test_op_runner.py
!!!!!!!!!!!!!!!!!!! Interrupted: 5 errors during collection !!!!!!!!!!!!!!!!!!!!
========================= 1 skipped, 5 errors in 2.06s =========================
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins1262507776029502473.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #117 of commit a25c7e90bc2adad53a75499fdbdeb14059b5607e, no merge conflicts.
Running as SYSTEM
Setting status of a25c7e90bc2adad53a75499fdbdeb14059b5607e to PENDING with url https://10.20.13.93:8080/job/merlin_systems/70/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/117/*:refs/remotes/origin/pr/117/* # timeout=10
 > git rev-parse a25c7e90bc2adad53a75499fdbdeb14059b5607e^{commit} # timeout=10
Checking out Revision a25c7e90bc2adad53a75499fdbdeb14059b5607e (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a25c7e90bc2adad53a75499fdbdeb14059b5607e # timeout=10
Commit message: "Merge branch 'main' into fix_model_outputnames"
 > git rev-list --no-walk 0f9fa18556f24ec9fc194d52cc3733a31f531826 # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins12352964776961901369.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 6 items / 5 errors / 1 skipped

==================================== ERRORS ====================================
_____________ ERROR collecting tests/unit/systems/test_ensemble.py _____________
ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_ensemble.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.8/importlib/init.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/unit/systems/test_ensemble.py:31: in
from nvtabular import Workflow # noqa
E ModuleNotFoundError: No module named 'nvtabular'
______________ ERROR collecting tests/unit/systems/test_export.py ______________
ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_export.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.8/importlib/init.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/unit/systems/test_export.py:23: in
from nvtabular import Workflow, ops
E ModuleNotFoundError: No module named 'nvtabular'
______________ ERROR collecting tests/unit/systems/test_graph.py _______________
ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_graph.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.8/importlib/init.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/unit/systems/test_graph.py:19: in
from nvtabular import Workflow
E ModuleNotFoundError: No module named 'nvtabular'
__________ ERROR collecting tests/unit/systems/test_inference_ops.py ___________
ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_inference_ops.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.8/importlib/init.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/unit/systems/test_inference_ops.py:27: in
from nvtabular import Workflow # noqa
E ModuleNotFoundError: No module named 'nvtabular'
____________ ERROR collecting tests/unit/systems/test_op_runner.py _____________
ImportError while importing test module '/var/jenkins_home/workspace/merlin_systems/systems/tests/unit/systems/test_op_runner.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.8/importlib/init.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/unit/systems/test_op_runner.py:23: in
import nvtabular as nvt
E ModuleNotFoundError: No module named 'nvtabular'
=========================== short test summary info ============================
ERROR tests/unit/systems/test_ensemble.py
ERROR tests/unit/systems/test_export.py
ERROR tests/unit/systems/test_graph.py
ERROR tests/unit/systems/test_inference_ops.py
ERROR tests/unit/systems/test_op_runner.py
!!!!!!!!!!!!!!!!!!! Interrupted: 5 errors during collection !!!!!!!!!!!!!!!!!!!!
========================= 1 skipped, 5 errors in 1.99s =========================
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins4417331190025152590.sh

@jperez999 jperez999 requested a review from benfred June 9, 2022 01:26
@github-actions
Copy link

github-actions bot commented Jun 9, 2022

Documentation preview

https://nvidia-merlin.github.io/systems/review/pr-117

@@ -145,35 +144,37 @@ def _export_model(self, model, name, output_path, version=1):
name=name, backend="tensorflow", platform="tensorflow_savedmodel"
)

inputs, outputs = model.inputs, model.outputs
# inputs, outputs = model.inputs, [model.outputs]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets delete old code rather than comment out

Suggested change
# inputs, outputs = model.inputs, [model.outputs]


config.parameters["TF_GRAPH_TAG"].string_value = "serve"
config.parameters["TF_SIGNATURE_DEF"].string_value = "serving_default"

for col in inputs:
for col, col_name in zip(inputs, input_col_names):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about instead of zipping the keys/values of the default_signature.structured_input_signature[1] - we just iterate over the items? It will be cleaner and less susceptible to bugs in the future

Suggested change
for col, col_name in zip(inputs, input_col_names):
for col, col_name in default_signature.structured_input_signature[1].items():

(and then don't create the 'input_col_names' and 'inputs' on lines 157/160

)
)

for col in outputs:
for col, col_name in zip(outputs, output_col_names):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the same thing as for inputs?

Suggested change
for col, col_name in zip(outputs, output_col_names):
for col, col_name in default_signature.structured_outputs.items():

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #117 of commit 38367f43dc98741eb95c39514af669c0e111ffc8, no merge conflicts.
Running as SYSTEM
Setting status of 38367f43dc98741eb95c39514af669c0e111ffc8 to PENDING with url https://10.20.13.93:8080/job/merlin_systems/74/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/117/*:refs/remotes/origin/pr/117/* # timeout=10
 > git rev-parse 38367f43dc98741eb95c39514af669c0e111ffc8^{commit} # timeout=10
Checking out Revision 38367f43dc98741eb95c39514af669c0e111ffc8 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 38367f43dc98741eb95c39514af669c0e111ffc8 # timeout=10
Commit message: "Merge branch 'fix_model_outputnames' of https://github.com/jperez999/systems-1 into fix_model_outputnames"
 > git rev-list --no-walk 2a413f6e0993969f2aeab999c9e7cf968b799e7b # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins7719179364039111327.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 18 items / 1 skipped

tests/unit/test_version.py . [ 5%]
tests/unit/systems/test_ensemble.py ...F [ 27%]
tests/unit/systems/test_ensemble_ops.py .. [ 38%]
tests/unit/systems/test_export.py . [ 44%]
tests/unit/systems/test_graph.py . [ 50%]
tests/unit/systems/test_inference_ops.py .. [ 61%]
tests/unit/systems/test_op_runner.py .... [ 83%]
tests/unit/systems/test_tensorflow_inf_op.py ... [100%]

=================================== FAILURES ===================================
______________ test_workflow_tf_e2e_multi_op_plus_2_run[parquet] _______________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1')
dataset = <merlin.io.dataset.Dataset object at 0x7feddc729fd0>
engine = 'parquet'

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("engine", ["parquet"])
def test_workflow_tf_e2e_multi_op_plus_2_run(tmpdir, dataset, engine):
    # Create a Workflow
    schema = dataset.schema
    for name in ["x", "y", "id"]:
        dataset.schema.column_schemas[name] = dataset.schema.column_schemas[name].with_tags(
            [Tags.USER]
        )

    workflow_ops = ["name-cat"] >> wf_ops.Categorify(cat_cache="host")
    workflow = Workflow(workflow_ops)
    workflow.fit(dataset)

    embedding_shapes_1 = wf_ops.get_embedding_sizes(workflow)

    cats = ["name-string"] >> wf_ops.Categorify(cat_cache="host")
    workflow_2 = Workflow(cats)
    workflow_2.fit(dataset)

    embedding_shapes = wf_ops.get_embedding_sizes(workflow_2)
    embedding_shapes_1.update(embedding_shapes)
    embedding_shapes_1["name-string_plus_2"] = embedding_shapes_1["name-string"]

    # Create Tensorflow Model
    model = create_tf_model(["name-cat", "name-string_plus_2"], [], embedding_shapes_1)

    # Creating Triton Ensemble
    triton_chain_1 = ["name-cat"] >> TransformWorkflow(workflow)
    triton_chain_2 = ["name-string"] >> TransformWorkflow(workflow_2) >> PlusTwoOp()
    triton_chain = (triton_chain_1 + triton_chain_2) >> PredictTensorflow(model)

    triton_ens = Ensemble(triton_chain, schema)

    # Creating Triton Ensemble Config
    ensemble_config, nodes_config = triton_ens.export(str(tmpdir))
    config_path = tmpdir / "ensemble_model" / "config.pbtxt"

    # Checking Triton Ensemble Config
    with open(config_path, "rb") as f:
        config = model_config.ModelConfig()
        raw_config = f.read()
        parsed = text_format.Parse(raw_config, config)

        # The config file contents are correct
        assert parsed.name == "ensemble_model"
        assert parsed.platform == "ensemble"
        assert hasattr(parsed, "ensemble_scheduling")

    df = dataset.to_ddf().compute()[["name-string", "name-cat"]].iloc[:3]
  response = _run_ensemble_on_tritonserver(str(tmpdir), ["output"], df, triton_ens.name)

tests/unit/systems/test_ensemble.py:233:


tests/unit/systems/utils/triton.py:39: in _run_ensemble_on_tritonserver
with run_triton_server(tmpdir) as client:
/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = '/tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1'

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=-11)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0609 04:25:11.647692 30532 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
I0609 04:25:11.647810 30532 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8
I0609 04:25:11.647818 30532 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8
I0609 04:25:11.647824 30532 tensorflow.cc:2216] backend configuration:
{"cmdline":{"version":"2"}}
I0609 04:25:11.847515 30532 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f2b2e000000' with size 268435456
I0609 04:25:11.848278 30532 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0609 04:25:11.853700 30532 model_repository_manager.cc:997] loading: 0_transformworkflow:1
I0609 04:25:11.953954 30532 model_repository_manager.cc:997] loading: 3_predicttensorflow:1
I0609 04:25:11.955888 30532 backend.cc:46] TRITONBACKEND_Initialize: nvtabular
I0609 04:25:11.955913 30532 backend.cc:53] Triton TRITONBACKEND API version: 1.8
I0609 04:25:11.955923 30532 backend.cc:56] 'nvtabular' TRITONBACKEND API version: 1.8
I0609 04:25:11.956061 30532 backend.cc:76] Loaded libpython successfully
I0609 04:25:12.054267 30532 model_repository_manager.cc:997] loading: 2_plustwoop:1
I0609 04:25:12.123326 30532 backend.cc:89] Python interpreter is initialized
I0609 04:25:12.124240 30532 tensorflow.cc:2276] TRITONBACKEND_ModelInitialize: 3_predicttensorflow (version 1)
I0609 04:25:12.124757 30532 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'
I0609 04:25:12.154648 30532 model_repository_manager.cc:997] loading: 1_transformworkflow:1
I0609 04:25:14.061429 30532 tensorflow.cc:2325] TRITONBACKEND_ModelInstanceInitialize: 3_predicttensorflow (GPU device 0)
I0609 04:25:14.061545 30532 model_repository_manager.cc:1152] successfully loaded '0_transformworkflow' version 1
2022-06-09 04:25:15.112124: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-09 04:25:15.113651: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-06-09 04:25:15.113674: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-09 04:25:15.113782: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-09 04:25:15.121193: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12669 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-06-09 04:25:15.153700: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-06-09 04:25:15.212415: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-09 04:25:15.224898: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 112787 microseconds.
I0609 04:25:15.225086 30532 model_repository_manager.cc:1152] successfully loaded '3_predicttensorflow' version 1
I0609 04:25:15.228319 30532 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 2_plustwoop (GPU device 0)
0609 04:25:17.244980 30625 pb_stub.cc:369] Failed to initialize Python stub: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize

I0609 04:25:17.245194 30532 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'
I0609 04:25:17.255263 30532 model_repository_manager.cc:1152] successfully loaded '1_transformworkflow' version 1
E0609 04:25:17.256198 30532 model_repository_manager.cc:1155] failed to load '2_plustwoop' version 1: Internal: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize

E0609 04:25:17.257480 30532 model_repository_manager.cc:1341] Invalid argument: ensemble 'ensemble_model' depends on '2_plustwoop' which has no loaded version
I0609 04:25:17.257596 30532 server.cc:524]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0609 04:25:17.258631 30532 server.cc:551]
+------------+-----------------------------------------------------------------+-----------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+-----------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |
| nvtabular | /opt/tritonserver/backends/nvtabular/libtriton_nvtabular.so | {} |
+------------+-----------------------------------------------------------------+-----------------------------+

I0609 04:25:17.258811 30532 server.cc:594]
+---------------------+---------+-----------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------+
| 0_transformworkflow | 1 | READY |
| 1_transformworkflow | 1 | READY |
| 2_plustwoop | 1 | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'tests.unit.systems' |
| | | |
| | | At: |
| | | (973): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | /usr/lib/python3.8/importlib/init.py(127): import_module |
| | | /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init |
| | | /tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize |
| 3_predicttensorflow | 1 | READY |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------+

I0609 04:25:17.301600 30532 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0609 04:25:17.303291 30532 tritonserver.cc:1962]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.20.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-44/test_workflow_tf_e2e_multi_op_1 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0609 04:25:17.303318 30532 server.cc:252] Waiting for in-flight requests to complete.
I0609 04:25:17.303327 30532 model_repository_manager.cc:1029] unloading: 3_predicttensorflow:1
I0609 04:25:17.303377 30532 model_repository_manager.cc:1029] unloading: 1_transformworkflow:1
I0609 04:25:17.303425 30532 model_repository_manager.cc:1029] unloading: 0_transformworkflow:1
I0609 04:25:17.303555 30532 server.cc:267] Timeout 30: Found 3 live models and 0 in-flight non-inference requests
I0609 04:25:17.303584 30532 tensorflow.cc:2363] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0609 04:25:17.303609 30532 backend.cc:160] TRITONBACKEND_ModelInstanceFinalize: delete instance state
------------------------------ Captured log call -------------------------------
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7fede40c3c70> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7fede40c3c70> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
=============================== warnings summary ===============================
../../../.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18
/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/systems/test_ensemble.py: 7 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/systems/test_ensemble.py::test_workflow_tf_e2e_multi_op_plus_2_run[parquet]
======= 1 failed, 17 passed, 1 skipped, 18 warnings in 72.28s (0:01:12) ========
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins13725431453088521611.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #117 of commit 7331d722ebb1a84048eca4183b9bfe4f6994d76e, no merge conflicts.
Running as SYSTEM
Setting status of 7331d722ebb1a84048eca4183b9bfe4f6994d76e to PENDING with url https://10.20.13.93:8080/job/merlin_systems/75/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/117/*:refs/remotes/origin/pr/117/* # timeout=10
 > git rev-parse 7331d722ebb1a84048eca4183b9bfe4f6994d76e^{commit} # timeout=10
Checking out Revision 7331d722ebb1a84048eca4183b9bfe4f6994d76e (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 7331d722ebb1a84048eca4183b9bfe4f6994d76e # timeout=10
Commit message: "Merge branch 'main' into fix_model_outputnames"
 > git rev-list --no-walk 38367f43dc98741eb95c39514af669c0e111ffc8 # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins9215977444994546154.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 18 items / 1 skipped

tests/unit/test_version.py . [ 5%]
tests/unit/systems/test_ensemble.py ...F [ 27%]
tests/unit/systems/test_ensemble_ops.py .. [ 38%]
tests/unit/systems/test_export.py . [ 44%]
tests/unit/systems/test_graph.py . [ 50%]
tests/unit/systems/test_inference_ops.py .. [ 61%]
tests/unit/systems/test_op_runner.py .... [ 83%]
tests/unit/systems/test_tensorflow_inf_op.py ... [100%]

=================================== FAILURES ===================================
______________ test_workflow_tf_e2e_multi_op_plus_2_run[parquet] _______________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1')
dataset = <merlin.io.dataset.Dataset object at 0x7fba705e0a90>
engine = 'parquet'

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("engine", ["parquet"])
def test_workflow_tf_e2e_multi_op_plus_2_run(tmpdir, dataset, engine):
    # Create a Workflow
    schema = dataset.schema
    for name in ["x", "y", "id"]:
        dataset.schema.column_schemas[name] = dataset.schema.column_schemas[name].with_tags(
            [Tags.USER]
        )

    workflow_ops = ["name-cat"] >> wf_ops.Categorify(cat_cache="host")
    workflow = Workflow(workflow_ops)
    workflow.fit(dataset)

    embedding_shapes_1 = wf_ops.get_embedding_sizes(workflow)

    cats = ["name-string"] >> wf_ops.Categorify(cat_cache="host")
    workflow_2 = Workflow(cats)
    workflow_2.fit(dataset)

    embedding_shapes = wf_ops.get_embedding_sizes(workflow_2)
    embedding_shapes_1.update(embedding_shapes)
    embedding_shapes_1["name-string_plus_2"] = embedding_shapes_1["name-string"]

    # Create Tensorflow Model
    model = create_tf_model(["name-cat", "name-string_plus_2"], [], embedding_shapes_1)

    # Creating Triton Ensemble
    triton_chain_1 = ["name-cat"] >> TransformWorkflow(workflow)
    triton_chain_2 = ["name-string"] >> TransformWorkflow(workflow_2) >> PlusTwoOp()
    triton_chain = (triton_chain_1 + triton_chain_2) >> PredictTensorflow(model)

    triton_ens = Ensemble(triton_chain, schema)

    # Creating Triton Ensemble Config
    ensemble_config, nodes_config = triton_ens.export(str(tmpdir))
    config_path = tmpdir / "ensemble_model" / "config.pbtxt"

    # Checking Triton Ensemble Config
    with open(config_path, "rb") as f:
        config = model_config.ModelConfig()
        raw_config = f.read()
        parsed = text_format.Parse(raw_config, config)

        # The config file contents are correct
        assert parsed.name == "ensemble_model"
        assert parsed.platform == "ensemble"
        assert hasattr(parsed, "ensemble_scheduling")

    df = dataset.to_ddf().compute()[["name-string", "name-cat"]].iloc[:3]
  response = _run_ensemble_on_tritonserver(str(tmpdir), ["output"], df, triton_ens.name)

tests/unit/systems/test_ensemble.py:233:


tests/unit/systems/utils/triton.py:39: in _run_ensemble_on_tritonserver
with run_triton_server(tmpdir) as client:
/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = '/tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1'

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=-11)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0609 04:26:29.298085 31461 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
I0609 04:26:29.298205 31461 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8
I0609 04:26:29.298213 31461 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8
I0609 04:26:29.298218 31461 tensorflow.cc:2216] backend configuration:
{"cmdline":{"version":"2"}}
I0609 04:26:29.483755 31461 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fe5c4000000' with size 268435456
I0609 04:26:29.484503 31461 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0609 04:26:29.489471 31461 model_repository_manager.cc:997] loading: 0_transformworkflow:1
I0609 04:26:29.589693 31461 model_repository_manager.cc:997] loading: 3_predicttensorflow:1
I0609 04:26:29.592876 31461 backend.cc:46] TRITONBACKEND_Initialize: nvtabular
I0609 04:26:29.592914 31461 backend.cc:53] Triton TRITONBACKEND API version: 1.8
I0609 04:26:29.592931 31461 backend.cc:56] 'nvtabular' TRITONBACKEND API version: 1.8
I0609 04:26:29.593163 31461 backend.cc:76] Loaded libpython successfully
I0609 04:26:29.689936 31461 model_repository_manager.cc:997] loading: 2_plustwoop:1
I0609 04:26:29.766969 31461 backend.cc:89] Python interpreter is initialized
I0609 04:26:29.767911 31461 tensorflow.cc:2276] TRITONBACKEND_ModelInitialize: 3_predicttensorflow (version 1)
I0609 04:26:29.768411 31461 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'
I0609 04:26:29.790233 31461 model_repository_manager.cc:997] loading: 1_transformworkflow:1
I0609 04:26:31.658590 31461 tensorflow.cc:2325] TRITONBACKEND_ModelInstanceInitialize: 3_predicttensorflow (GPU device 0)
I0609 04:26:31.658707 31461 model_repository_manager.cc:1152] successfully loaded '0_transformworkflow' version 1
2022-06-09 04:26:32.722389: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-09 04:26:32.724183: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-06-09 04:26:32.724208: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-09 04:26:32.724315: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-09 04:26:32.728495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12669 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-06-09 04:26:32.771315: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-06-09 04:26:32.833870: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-09 04:26:32.846832: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 124456 microseconds.
I0609 04:26:32.847162 31461 model_repository_manager.cc:1152] successfully loaded '3_predicttensorflow' version 1
I0609 04:26:32.852485 31461 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 2_plustwoop (GPU device 0)
0609 04:26:34.870913 31554 pb_stub.cc:369] Failed to initialize Python stub: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize

I0609 04:26:34.871145 31461 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'
I0609 04:26:34.880893 31461 model_repository_manager.cc:1152] successfully loaded '1_transformworkflow' version 1
E0609 04:26:34.882188 31461 model_repository_manager.cc:1155] failed to load '2_plustwoop' version 1: Internal: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize

E0609 04:26:34.883528 31461 model_repository_manager.cc:1341] Invalid argument: ensemble 'ensemble_model' depends on '2_plustwoop' which has no loaded version
I0609 04:26:34.883649 31461 server.cc:524]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0609 04:26:34.884676 31461 server.cc:551]
+------------+-----------------------------------------------------------------+-----------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+-----------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |
| nvtabular | /opt/tritonserver/backends/nvtabular/libtriton_nvtabular.so | {} |
+------------+-----------------------------------------------------------------+-----------------------------+

I0609 04:26:34.884857 31461 server.cc:594]
+---------------------+---------+-----------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------+
| 0_transformworkflow | 1 | READY |
| 1_transformworkflow | 1 | READY |
| 2_plustwoop | 1 | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'tests.unit.systems' |
| | | |
| | | At: |
| | | (973): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | /usr/lib/python3.8/importlib/init.py(127): import_module |
| | | /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init |
| | | /tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize |
| 3_predicttensorflow | 1 | READY |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------+

I0609 04:26:34.927589 31461 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0609 04:26:34.929190 31461 tritonserver.cc:1962]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.20.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-45/test_workflow_tf_e2e_multi_op_1 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0609 04:26:34.929217 31461 server.cc:252] Waiting for in-flight requests to complete.
I0609 04:26:34.929225 31461 model_repository_manager.cc:1029] unloading: 3_predicttensorflow:1
I0609 04:26:34.929277 31461 model_repository_manager.cc:1029] unloading: 1_transformworkflow:1
I0609 04:26:34.929330 31461 model_repository_manager.cc:1029] unloading: 0_transformworkflow:1
I0609 04:26:34.929453 31461 tensorflow.cc:2363] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0609 04:26:34.929466 31461 server.cc:267] Timeout 30: Found 3 live models and 0 in-flight non-inference requestsI0609 04:26:34.929523 31461 backend.cc:160] TRITONBACKEND_ModelInstanceFinalize: delete instance state

I0609 04:26:34.929547 31461 backend.cc:160] TRITONBACKEND_ModelInstanceFinalize: delete instance state
------------------------------ Captured log call -------------------------------
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7fba7f38b250> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7fba7f38b250> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
=============================== warnings summary ===============================
../../../.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18
/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/systems/test_ensemble.py: 7 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/systems/test_ensemble.py::test_workflow_tf_e2e_multi_op_plus_2_run[parquet]
======= 1 failed, 17 passed, 1 skipped, 18 warnings in 72.45s (0:01:12) ========
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins11956088136903943840.sh

@jperez999 jperez999 self-assigned this Jun 9, 2022
@jperez999 jperez999 added the bug Something isn't working label Jun 9, 2022
@jperez999 jperez999 added this to the Merlin 22.06 milestone Jun 9, 2022
@jperez999
Copy link
Collaborator Author

rerun tests

1 similar comment
@jperez999
Copy link
Collaborator Author

rerun tests

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #117 of commit 1e1226f81fe02bbdf1ede45ff8401ffaedbb01e0, no merge conflicts.
Running as SYSTEM
Setting status of 1e1226f81fe02bbdf1ede45ff8401ffaedbb01e0 to PENDING with url https://10.20.13.93:8080/job/merlin_systems/76/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/117/*:refs/remotes/origin/pr/117/* # timeout=10
 > git rev-parse 1e1226f81fe02bbdf1ede45ff8401ffaedbb01e0^{commit} # timeout=10
Checking out Revision 1e1226f81fe02bbdf1ede45ff8401ffaedbb01e0 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 1e1226f81fe02bbdf1ede45ff8401ffaedbb01e0 # timeout=10
Commit message: "Merge branch 'fix_model_outputnames' of https://github.com/jperez999/systems-1 into fix_model_outputnames"
 > git rev-list --no-walk 7331d722ebb1a84048eca4183b9bfe4f6994d76e # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins3091504900765628636.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 18 items / 1 skipped

tests/unit/test_version.py . [ 5%]
tests/unit/systems/test_ensemble.py FF.F [ 27%]
tests/unit/systems/test_ensemble_ops.py .. [ 38%]
tests/unit/systems/test_export.py . [ 44%]
tests/unit/systems/test_graph.py . [ 50%]
tests/unit/systems/test_inference_ops.py .. [ 61%]
tests/unit/systems/test_op_runner.py .... [ 83%]
tests/unit/systems/test_tensorflow_inf_op.py ... [100%]

=================================== FAILURES ===================================
______________ test_workflow_tf_e2e_config_verification[parquet] _______________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_config_ve0')
dataset = <merlin.io.dataset.Dataset object at 0x7fef280bd100>
engine = 'parquet'

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("engine", ["parquet"])
def test_workflow_tf_e2e_config_verification(tmpdir, dataset, engine):
    # Create a Workflow
    schema = dataset.schema
    for name in ["x", "y", "id"]:
        dataset.schema.column_schemas[name] = dataset.schema.column_schemas[name].with_tags(
            [Tags.USER]
        )
    selector = ColumnSelector(["x", "y", "id"])

    workflow_ops = selector >> wf_ops.Rename(postfix="_nvt")
    workflow = Workflow(workflow_ops["x_nvt"])
    workflow.fit(dataset)

    # Create Tensorflow Model
    model = tf.keras.models.Sequential(
        [
            tf.keras.Input(name="x_nvt", dtype=tf.float64, shape=(1,)),
            tf.keras.layers.Dense(16, activation="relu"),
            tf.keras.layers.Dropout(0.2),
            tf.keras.layers.Dense(1, name="output"),
        ]
    )
    model.compile(
        optimizer="adam",
        loss=tf.losses.SparseCategoricalCrossentropy(from_logits=True),
        metrics=[tf.metrics.SparseCategoricalAccuracy()],
    )

    # Creating Triton Ensemble
    triton_chain = (
        selector >> TransformWorkflow(workflow, cats=["x_nvt"]) >> PredictTensorflow(model)
    )
    triton_ens = Ensemble(triton_chain, schema)

    # Creating Triton Ensemble Config
    ensemble_config, node_configs = triton_ens.export(str(tmpdir))

    config_path = tmpdir / "ensemble_model" / "config.pbtxt"

    # Checking Triton Ensemble Config
    with open(config_path, "rb") as f:
        config = model_config.ModelConfig()
        raw_config = f.read()
        parsed = text_format.Parse(raw_config, config)

        # The config file contents are correct
        assert parsed.name == "ensemble_model"
        assert parsed.platform == "ensemble"
        assert hasattr(parsed, "ensemble_scheduling")

    df = make_df({"x": [1.0, 2.0, 3.0], "y": [4.0, 5.0, 6.0], "id": [7, 8, 9]})

    output_columns = triton_ens.graph.output_schema.column_names
  response = _run_ensemble_on_tritonserver(str(tmpdir), output_columns, df, triton_ens.name)

tests/unit/systems/test_ensemble.py:113:


tests/unit/systems/utils/triton.py:39: in _run_ensemble_on_tritonserver
with run_triton_server(tmpdir) as client:
/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = '/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_config_ve0'

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
2022-06-09 15:42:11.233899: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-09 15:42:12.190131: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1627 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-06-09 15:42:12.190883: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1532] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 15157 MB memory: -> device: 1, name: Tesla P100-DGXS-16GB, pci bus id: 0000:08:00.0, compute capability: 6.0
I0609 15:42:14.805777 2881 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
I0609 15:42:14.805874 2881 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8
I0609 15:42:14.805881 2881 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8
I0609 15:42:14.805886 2881 tensorflow.cc:2216] backend configuration:
{"cmdline":{"version":"2"}}
I0609 15:42:14.997282 2881 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f51de000000' with size 268435456
I0609 15:42:14.997989 2881 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0609 15:42:15.001390 2881 model_repository_manager.cc:997] loading: 0_transformworkflow:1
I0609 15:42:15.101670 2881 model_repository_manager.cc:997] loading: 1_predicttensorflow:1
I0609 15:42:15.106969 2881 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 0_transformworkflow (GPU device 0)
I0609 15:42:15.477925 2881 tensorflow.cc:2276] TRITONBACKEND_ModelInitialize: 1_predicttensorflow (version 1)
I0609 15:42:15.479454 2881 tensorflow.cc:2325] TRITONBACKEND_ModelInstanceInitialize: 1_predicttensorflow (GPU device 0)
2022-06-09 15:42:15.480270: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_config_ve0/1_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:15.483609: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-06-09 15:42:15.483656: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_config_ve0/1_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:15.483819: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-09 15:42:15.522577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12899 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-06-09 15:42:15.559190: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-06-09 15:42:15.587373: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_config_ve0/1_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:15.595686: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 115432 microseconds.
I0609 15:42:15.595858 2881 model_repository_manager.cc:1152] successfully loaded '1_predicttensorflow' version 1
E0609 15:42:15.596171 2881 model_repository_manager.cc:1155] failed to load '0_transformworkflow' version 1: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_config_ve0/0_transformworkflow/1/model.py
E0609 15:42:15.596347 2881 model_repository_manager.cc:1341] Invalid argument: ensemble 'ensemble_model' depends on '0_transformworkflow' which has no loaded version
I0609 15:42:15.596451 2881 server.cc:524]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0609 15:42:15.597245 2881 server.cc:551]
+------------+-----------------------------------------------------------------+-----------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+-----------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |
+------------+-----------------------------------------------------------------+-----------------------------+

I0609 15:42:15.597327 2881 server.cc:594]
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0_transformworkflow | 1 | UNAVAILABLE: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_config_ve0/0_transformworkflow/1/model.py |
| 1_predicttensorflow | 1 | READY |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0609 15:42:15.644381 2881 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0609 15:42:15.645919 2881 tritonserver.cc:1962]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.20.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_config_ve0 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0609 15:42:15.645944 2881 server.cc:252] Waiting for in-flight requests to complete.
I0609 15:42:15.645952 2881 model_repository_manager.cc:1029] unloading: 1_predicttensorflow:1
I0609 15:42:15.645995 2881 server.cc:267] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0609 15:42:15.646107 2881 tensorflow.cc:2363] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0609 15:42:15.646289 2881 tensorflow.cc:2302] TRITONBACKEND_ModelFinalize: delete model state
I0609 15:42:15.649799 2881 model_repository_manager.cc:1135] successfully unloaded '1_predicttensorflow' version 1
I0609 15:42:16.646075 2881 server.cc:267] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0609 15:42:16.672382 2881 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0
------------------------------ Captured log call -------------------------------
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
__________________ test_workflow_tf_e2e_multi_op_run[parquet] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0')
dataset = <merlin.io.dataset.Dataset object at 0x7fef280103d0>
engine = 'parquet'

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("engine", ["parquet"])
def test_workflow_tf_e2e_multi_op_run(tmpdir, dataset, engine):
    # Create a Workflow
    schema = dataset.schema
    for name in ["x", "y", "id"]:
        dataset.schema.column_schemas[name] = dataset.schema.column_schemas[name].with_tags(
            [Tags.USER]
        )

    workflow_ops = ["name-cat"] >> wf_ops.Categorify(cat_cache="host")
    workflow = Workflow(workflow_ops)
    workflow.fit(dataset)

    embedding_shapes_1 = wf_ops.get_embedding_sizes(workflow)

    cats = ["name-string"] >> wf_ops.Categorify(cat_cache="host")
    workflow_2 = Workflow(cats)
    workflow_2.fit(dataset)

    embedding_shapes = wf_ops.get_embedding_sizes(workflow_2)
    embedding_shapes_1.update(embedding_shapes)
    # Create Tensorflow Model
    model = create_tf_model(["name-cat", "name-string"], [], embedding_shapes_1)

    # Creating Triton Ensemble
    triton_chain_1 = ["name-cat"] >> TransformWorkflow(workflow)
    triton_chain_2 = ["name-string"] >> TransformWorkflow(workflow_2)
    triton_chain = (triton_chain_1 + triton_chain_2) >> PredictTensorflow(model)

    triton_ens = Ensemble(triton_chain, schema)

    # Creating Triton Ensemble Config
    ensemble_config, nodes_config = triton_ens.export(str(tmpdir))
    config_path = tmpdir / "ensemble_model" / "config.pbtxt"

    # Checking Triton Ensemble Config
    with open(config_path, "rb") as f:
        config = model_config.ModelConfig()
        raw_config = f.read()
        parsed = text_format.Parse(raw_config, config)

        # The config file contents are correct
        assert parsed.name == "ensemble_model"
        assert parsed.platform == "ensemble"
        assert hasattr(parsed, "ensemble_scheduling")

    df = dataset.to_ddf().compute()[["name-string", "name-cat"]].iloc[:3]
  response = _run_ensemble_on_tritonserver(str(tmpdir), ["output"], df, triton_ens.name)

tests/unit/systems/test_ensemble.py:166:


tests/unit/systems/utils/triton.py:39: in _run_ensemble_on_tritonserver
with run_triton_server(tmpdir) as client:
/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = '/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0'

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0609 15:42:23.214608 2938 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
I0609 15:42:23.214724 2938 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8
I0609 15:42:23.214732 2938 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8
I0609 15:42:23.214737 2938 tensorflow.cc:2216] backend configuration:
{"cmdline":{"version":"2"}}
I0609 15:42:23.404251 2938 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fcb46000000' with size 268435456
I0609 15:42:23.404954 2938 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0609 15:42:23.409224 2938 model_repository_manager.cc:997] loading: 0_transformworkflow:1
I0609 15:42:23.509443 2938 model_repository_manager.cc:997] loading: 1_transformworkflow:1
I0609 15:42:23.516618 2938 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 0_transformworkflow (GPU device 0)
I0609 15:42:23.609671 2938 model_repository_manager.cc:997] loading: 2_predicttensorflow:1
I0609 15:42:23.904527 2938 tensorflow.cc:2276] TRITONBACKEND_ModelInitialize: 2_predicttensorflow (version 1)
I0609 15:42:23.905430 2938 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 1_transformworkflow (GPU device 0)
E0609 15:42:23.907266 2938 model_repository_manager.cc:1155] failed to load '0_transformworkflow' version 1: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0/0_transformworkflow/1/model.py
I0609 15:42:23.923612 2938 tensorflow.cc:2325] TRITONBACKEND_ModelInstanceInitialize: 2_predicttensorflow (GPU device 0)
2022-06-09 15:42:23.924469: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0/2_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:23.929957: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-06-09 15:42:23.930008: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0/2_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:23.930189: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-09 15:42:23.973176: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12899 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-06-09 15:42:24.020459: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-06-09 15:42:24.056646: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0/2_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:24.069248: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 144800 microseconds.
I0609 15:42:24.069412 2938 model_repository_manager.cc:1152] successfully loaded '2_predicttensorflow' version 1
E0609 15:42:24.069872 2938 model_repository_manager.cc:1155] failed to load '1_transformworkflow' version 1: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0/1_transformworkflow/1/model.py
E0609 15:42:24.069954 2938 model_repository_manager.cc:1341] Invalid argument: ensemble 'ensemble_model' depends on '1_transformworkflow' which has no loaded version
I0609 15:42:24.070029 2938 server.cc:524]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0609 15:42:24.070196 2938 server.cc:551]
+------------+-----------------------------------------------------------------+-----------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+-----------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |
+------------+-----------------------------------------------------------------+-----------------------------+

I0609 15:42:24.071069 2938 server.cc:594]
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0_transformworkflow | 1 | UNAVAILABLE: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0/0_transformworkflow/1/model.py |
| 1_transformworkflow | 1 | UNAVAILABLE: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0/1_transformworkflow/1/model.py |
| 2_predicttensorflow | 1 | READY |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0609 15:42:24.117104 2938 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0609 15:42:24.118682 2938 tritonserver.cc:1962]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.20.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_0 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0609 15:42:24.118711 2938 server.cc:252] Waiting for in-flight requests to complete.
I0609 15:42:24.118719 2938 model_repository_manager.cc:1029] unloading: 2_predicttensorflow:1
I0609 15:42:24.118765 2938 server.cc:267] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0609 15:42:24.118855 2938 tensorflow.cc:2363] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0609 15:42:24.119011 2938 tensorflow.cc:2302] TRITONBACKEND_ModelFinalize: delete model state
I0609 15:42:24.123806 2938 model_repository_manager.cc:1135] successfully unloaded '2_predicttensorflow' version 1
I0609 15:42:25.118845 2938 server.cc:267] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0609 15:42:25.137262 2938 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0
------------------------------ Captured log call -------------------------------
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string with unsupported characters which will be renamed to name_cat, name_string in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7fef27cc86d0> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string with unsupported characters which will be renamed to name_cat, name_string in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7fef27cc86d0> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
______________ test_workflow_tf_e2e_multi_op_plus_2_run[parquet] _______________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1')
dataset = <merlin.io.dataset.Dataset object at 0x7feed4437490>
engine = 'parquet'

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("engine", ["parquet"])
def test_workflow_tf_e2e_multi_op_plus_2_run(tmpdir, dataset, engine):
    # Create a Workflow
    schema = dataset.schema
    for name in ["x", "y", "id"]:
        dataset.schema.column_schemas[name] = dataset.schema.column_schemas[name].with_tags(
            [Tags.USER]
        )

    workflow_ops = ["name-cat"] >> wf_ops.Categorify(cat_cache="host")
    workflow = Workflow(workflow_ops)
    workflow.fit(dataset)

    embedding_shapes_1 = wf_ops.get_embedding_sizes(workflow)

    cats = ["name-string"] >> wf_ops.Categorify(cat_cache="host")
    workflow_2 = Workflow(cats)
    workflow_2.fit(dataset)

    embedding_shapes = wf_ops.get_embedding_sizes(workflow_2)
    embedding_shapes_1.update(embedding_shapes)
    embedding_shapes_1["name-string_plus_2"] = embedding_shapes_1["name-string"]

    # Create Tensorflow Model
    model = create_tf_model(["name-cat", "name-string_plus_2"], [], embedding_shapes_1)

    # Creating Triton Ensemble
    triton_chain_1 = ["name-cat"] >> TransformWorkflow(workflow)
    triton_chain_2 = ["name-string"] >> TransformWorkflow(workflow_2) >> PlusTwoOp()
    triton_chain = (triton_chain_1 + triton_chain_2) >> PredictTensorflow(model)

    triton_ens = Ensemble(triton_chain, schema)

    # Creating Triton Ensemble Config
    ensemble_config, nodes_config = triton_ens.export(str(tmpdir))
    config_path = tmpdir / "ensemble_model" / "config.pbtxt"

    # Checking Triton Ensemble Config
    with open(config_path, "rb") as f:
        config = model_config.ModelConfig()
        raw_config = f.read()
        parsed = text_format.Parse(raw_config, config)

        # The config file contents are correct
        assert parsed.name == "ensemble_model"
        assert parsed.platform == "ensemble"
        assert hasattr(parsed, "ensemble_scheduling")

    df = dataset.to_ddf().compute()[["name-string", "name-cat"]].iloc[:3]
  response = _run_ensemble_on_tritonserver(str(tmpdir), ["output"], df, triton_ens.name)

tests/unit/systems/test_ensemble.py:233:


tests/unit/systems/utils/triton.py:39: in _run_ensemble_on_tritonserver
with run_triton_server(tmpdir) as client:
/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = '/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1'

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=1)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0609 15:42:30.926682 2993 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
I0609 15:42:30.926795 2993 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8
I0609 15:42:30.926802 2993 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8
I0609 15:42:30.926808 2993 tensorflow.cc:2216] backend configuration:
{"cmdline":{"version":"2"}}
I0609 15:42:31.113360 2993 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f6436000000' with size 268435456
I0609 15:42:31.114107 2993 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0609 15:42:31.119056 2993 model_repository_manager.cc:997] loading: 0_transformworkflow:1
I0609 15:42:31.219296 2993 model_repository_manager.cc:997] loading: 3_predicttensorflow:1
I0609 15:42:31.224739 2993 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 0_transformworkflow (GPU device 0)
I0609 15:42:31.319618 2993 model_repository_manager.cc:997] loading: 2_plustwoop:1
I0609 15:42:31.419945 2993 model_repository_manager.cc:997] loading: 1_transformworkflow:1
I0609 15:42:31.601391 2993 tensorflow.cc:2276] TRITONBACKEND_ModelInitialize: 3_predicttensorflow (version 1)
I0609 15:42:31.603603 2993 tensorflow.cc:2325] TRITONBACKEND_ModelInstanceInitialize: 3_predicttensorflow (GPU device 0)
E0609 15:42:31.604931 2993 model_repository_manager.cc:1155] failed to load '0_transformworkflow' version 1: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/0_transformworkflow/1/model.py
2022-06-09 15:42:31.605283: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:31.610824: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-06-09 15:42:31.610873: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:31.611052: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-09 15:42:31.653077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12899 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-06-09 15:42:31.686120: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-06-09 15:42:31.721400: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-09 15:42:31.733692: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 128426 microseconds.
I0609 15:42:31.733767 2993 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 2_plustwoop (GPU device 0)
I0609 15:42:31.735323 2993 model_repository_manager.cc:1152] successfully loaded '3_predicttensorflow' version 1
0609 15:42:33.611250 3042 pb_stub.cc:369] Failed to initialize Python stub: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize

I0609 15:42:33.611432 2993 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 1_transformworkflow (GPU device 0)
E0609 15:42:33.611613 2993 model_repository_manager.cc:1155] failed to load '2_plustwoop' version 1: Internal: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize

E0609 15:42:33.632772 2993 model_repository_manager.cc:1155] failed to load '1_transformworkflow' version 1: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/1_transformworkflow/1/model.py
E0609 15:42:33.632865 2993 model_repository_manager.cc:1341] Invalid argument: ensemble 'ensemble_model' depends on '2_plustwoop' which has no loaded version
I0609 15:42:33.632947 2993 server.cc:524]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0609 15:42:33.633796 2993 server.cc:551]
+------------+-----------------------------------------------------------------+-----------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+-----------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |
+------------+-----------------------------------------------------------------+-----------------------------+

I0609 15:42:33.633945 2993 server.cc:594]
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0_transformworkflow | 1 | UNAVAILABLE: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/0_transformworkflow/1/model.py |
| 1_transformworkflow | 1 | UNAVAILABLE: Internal: model.py does not exist in the model repository path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/1_transformworkflow/1/model.py |
| 2_plustwoop | 1 | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'tests.unit.systems' |
| | | |
| | | At: |
| | | (973): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | /usr/lib/python3.8/importlib/init.py(127): import_module |
| | | /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init |
| | | /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(47): initialize |
| 3_predicttensorflow | 1 | READY |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0609 15:42:33.681625 2993 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0609 15:42:33.683339 2993 tritonserver.cc:1962]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.20.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0609 15:42:33.683374 2993 server.cc:252] Waiting for in-flight requests to complete.
I0609 15:42:33.683384 2993 model_repository_manager.cc:1029] unloading: 3_predicttensorflow:1
I0609 15:42:33.683437 2993 server.cc:267] Timeout 30: Found 1 live models and 0 in-flight non-inference requests
I0609 15:42:33.683566 2993 tensorflow.cc:2363] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0609 15:42:33.684024 2993 tensorflow.cc:2302] TRITONBACKEND_ModelFinalize: delete model state
I0609 15:42:33.690946 2993 model_repository_manager.cc:1135] successfully unloaded '3_predicttensorflow' version 1
I0609 15:42:34.683526 2993 server.cc:267] Timeout 29: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models
W0609 15:42:34.708914 2993 metrics.cc:469] Unable to get energy consumption for GPU 0. Status:Success, value:0
------------------------------ Captured log call -------------------------------
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7fef27c3da90> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7fef27c3da90> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
=============================== warnings summary ===============================
../../../.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18
/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/systems/test_ensemble.py: 7 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/systems/test_ensemble.py::test_workflow_tf_e2e_config_verification[parquet]
FAILED tests/unit/systems/test_ensemble.py::test_workflow_tf_e2e_multi_op_run[parquet]
FAILED tests/unit/systems/test_ensemble.py::test_workflow_tf_e2e_multi_op_plus_2_run[parquet]
======= 3 failed, 15 passed, 1 skipped, 18 warnings in 66.41s (0:01:06) ========
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins14077914574026463453.sh

@@ -91,7 +91,7 @@ def export(self, path, input_schema, output_schema, node_id=None, version=1):
modified_workflow,
node_name,
node_export_path,
backend="python",
backend="nvtabular",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work on the containers? I saw this PR go through, and I'm wondering if the NVT backend is still available https://github.com/NVIDIA-Merlin/Merlin/pull/378/files

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats why I need the other PR in merlin to go through also... need to recook

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants