-
Notifications
You must be signed in to change notification settings - Fork 576
Closed
Description
I am currently making the reference implementation and am stuck deploying the model in multiple GPUs.
Here is a link to the PR: mlcommons/inference#1373
Here is a link to the file where the model is: https://github.com/mlcommons/inference/blob/7c64689b261f97a4fc3410bff584ac2439453bcc/recommendation/dlrm_v2/pytorch/python/backend_pytorch_native.py
Currently this works for a debugging model and a single GPU, but fails when I try to run it with multiple ones. Here are the issues that I have:
- If I run the benchmark, it gets stuck in this line. This is because you need to run that line for each rank, but I am not able to run it, load the model in the variable and store it there to query it.
- Running the benchmark in CPU, I get the following error when making a prediction.
[...]'fbgemm' object has no attribute 'jagged_2d_to_dense' (this happens when importing torchrec)
or
[...]fbgemm object has no attribute 'bounds_check_indices' (this happens when making a prediction)
This can be because I am trying to load a sharded model in a different number of ranks. Do you know if that could be related if thats related?
I have tried with pytorch versions 1.12, 1.13, 2.0.0, 2.0.1 and fbgemm version 0.3.2 and 0.4.1
Metadata
Metadata
Assignees
Labels
No labels