DeepSpeed assumes model returns just one variable: loss

https://github.com/microsoft/DeepSpeed/blob/4d735946b8f256bc80ba13e3530f85c91d041ff4/deepspeed/pt/deepspeed_light.py#L582-L606

As you can see in line 596 above in the `forward()` call for the `DeepSpeedLight` engine, `self.module` (which gets initialized earlier with the `model` passed to `deepspeed.initialize()` by the client) is assumed to be returning just one output: `loss`. However, as a model developer, I can have many outputs being returned by my model, including several different losses. An example of such a model would be the `GPT2DoubleHeadsModel` in Hugging Face's transformers repo, which returns two different losses, one for each head/task.

The consequence of this is that I won't be able to integrate DeepSpeed as-is to work with such models. Could you please make the necessary changes to be able to support this use-case?

I suspect what you need to do is:
1. Move lines 599-600 (which perform loss scaling based on gradient accumulation steps) into your implementation of `backward()`.
2. Update line 596 to reflect the fact that `self.module` could return a generic tuple of `outputs` instead of a `loss`.

This, in turn, will allow the client to define a customized `loss` by leveraging all the `outputs`, and then call your implementation of `backward()` with this `loss` as input. And things should be fine because the loss scaling is still happening, only in `backward()` instead of `forward()`. Does this make sense?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DeepSpeed assumes model returns just one variable: loss #150

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DeepSpeed assumes model returns just one variable: loss #150

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions