Skip to content

[DLRM v2] Using the model for the inference reference implementation #648

@pgmpablo157321

Description

@pgmpablo157321

I am currently making the reference implementation and am stuck deploying the model in multiple GPUs.

Here is a link to the PR: mlcommons/inference#1373
Here is a link to the file where the model is: https://github.com/mlcommons/inference/blob/7c64689b261f97a4fc3410bff584ac2439453bcc/recommendation/dlrm_v2/pytorch/python/backend_pytorch_native.py

Currently this works for a debugging model and a single GPU, but fails when I try to run it with multiple ones. Here are the issues that I have:

  1. If I run the benchmark, it gets stuck in this line. This is because you need to run that line for each rank, but I am not able to run it, load the model in the variable and store it there to query it.
  2. Running the benchmark in CPU, I get the following error when making a prediction.
[...]'fbgemm' object has no attribute 'jagged_2d_to_dense' (this happens when importing torchrec)

or

[...]fbgemm object has no attribute 'bounds_check_indices' (this happens when making a prediction)

This can be because I am trying to load a sharded model in a different number of ranks. Do you know if that could be related if thats related?

I have tried with pytorch versions 1.12, 1.13, 2.0.0, 2.0.1 and fbgemm version 0.3.2 and 0.4.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions