Skip to content

DeepSpeed assumes model returns just one variable: loss #150

@g-karthik

Description

@g-karthik

https://github.com/microsoft/DeepSpeed/blob/4d735946b8f256bc80ba13e3530f85c91d041ff4/deepspeed/pt/deepspeed_light.py#L582-L606

As you can see in line 596 above in the forward() call for the DeepSpeedLight engine, self.module (which gets initialized earlier with the model passed to deepspeed.initialize() by the client) is assumed to be returning just one output: loss. However, as a model developer, I can have many outputs being returned by my model, including several different losses. An example of such a model would be the GPT2DoubleHeadsModel in Hugging Face's transformers repo, which returns two different losses, one for each head/task.

The consequence of this is that I won't be able to integrate DeepSpeed as-is to work with such models. Could you please make the necessary changes to be able to support this use-case?

I suspect what you need to do is:

  1. Move lines 599-600 (which perform loss scaling based on gradient accumulation steps) into your implementation of backward().
  2. Update line 596 to reflect the fact that self.module could return a generic tuple of outputs instead of a loss.

This, in turn, will allow the client to define a customized loss by leveraging all the outputs, and then call your implementation of backward() with this loss as input. And things should be fine because the loss scaling is still happening, only in backward() instead of forward(). Does this make sense?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions