Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] hugectr not available in merlin-inference and merlin-tensorflow-training #161

Closed
mattf opened this issue Mar 25, 2022 · 8 comments
Closed
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@mattf
Copy link

mattf commented Mar 25, 2022

merlin-inference -

$ docker run --rm -it nvcr.io/nvidia/merlin/merlin-inference:22.03 python -c 'import hugectr'

==========
== CUDA ==
==========

NVIDIA Release  (build )
CUDA Version 11.6.0.021

Container image Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'hugectr'

merlin-tensorflow-training -

$ docker run --rm -it nvcr.io/nvidia/merlin/merlin-tensorflow-training:22.03 python -c 'import hugectr'

================
== TensorFlow ==
================

NVIDIA Release 22.02-tf2 (build 32333867)
TensorFlow Version 2.7.0

Container image Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copyright 2017-2022 The TensorFlow Authors.  All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use the NVIDIA Container Toolkit to start this container with GPU support; see
   https://docs.nvidia.com/datacenter/cloud-native/ .

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'hugectr'
@karlhigley
Copy link
Contributor

I think HugeCTR should be available in the merlin-training and merlin-inference containers, and should not be available in the merlin-tensorflow-training container. Is that right, @albert17?

@mattf
Copy link
Author

mattf commented Mar 26, 2022

https://github.com/NVIDIA-Merlin/Merlin/blob/main/docker/README.md (as of fa1b8c4) says -training, -tensorflow-training, and -inference

image

i can only confirm it is in -training

@karlhigley
Copy link
Contributor

I think what that's trying to say is "HugeCTR Tensorflow Embedding plugin," which doesn't actually mean that HugeCTR itself is available there, if I understand correctly. Maybe we can clarify this though?

cc @mikemckiernan @albert17

@karlhigley karlhigley added the documentation Improvements or additions to documentation label Mar 26, 2022
@karlhigley karlhigley added this to the Merlin 22.04 milestone Mar 26, 2022
@mattf
Copy link
Author

mattf commented Mar 28, 2022

if that's the intention of the documentation, it should be clarified. the line break is unfortunate too.

this would leave the issue that HugeCTR is not available in there -inference container

@karlhigley
Copy link
Contributor

The second part is a duplicate of #142, which has been addressed in the nightly images (see #135.)

@mattf
Copy link
Author

mattf commented Mar 28, 2022

$ docker run --rm -it nvcr.io/nvidia/merlin/merlin-inference:nightly python -c 'import hugectr' | tail -n3
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'hugectr'

is this an issue where "HugeCTR" in the merlin-inference container really means "HugeCTR Tensorflow Embedding plugin"

@karlhigley
Copy link
Contributor

I don't think that's expected to work on the inference container; what it has is the Triton Inference Server HugeCTR Back-end for serving. (Correct me if I'm wrong here, @albert17.)

@viswa-nvidia viswa-nvidia removed this from the Merlin 22.04 milestone Aug 11, 2022
@viswa-nvidia
Copy link

@karlhigley , can we close this issue ? If not, please assign a priority

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

3 participants