Skip to content

TP + FP8 - NotImplementedError for certain operations #2629

@nathan-az

Description

@nathan-az

FP8 training is now supported #2546, but has issues with tensor parallelism which is currently gated. MVP for this feature should include:

  • Plug-and-play support for enable_fp8_training with setting a tensor_parallel_plan
  • Compatibility with torch.compile

This issue is to track support and the request. It's not clear to me the scope of what needs to be done to support this. @andrewor14 feel free to comment if there are other requirements for MVP for this feature, or if you want to clarify the scope.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions