BatchFeature.to() supports non-tensor keys #33918

Rocketknight1 · 2024-10-03T14:48:38Z

This PR fixes a bug in the preprocessing for several pipelines. The pipelines were calling .to() on a BatchFeature which had a string feature, which caused an error. This update was also requested internally in Slack!

cc @ydshieh

ydshieh · 2024-10-03T14:54:39Z

Thank you @Rocketknight1 .

2 questions:

Is this the only one to change? I see 3 failed tests (in torch pipeline CI reports).
Do you think it makes sense to have this implemented directly in BatchFeature instead of outside it?

Rocketknight1 · 2024-10-03T15:18:34Z

@ydshieh good spot, I fixed it for all three! We could consider rewriting BatchFeature.to() like you suggested though, let me ask people

HuggingFaceDocBuilderDev · 2024-10-03T15:36:39Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Rocketknight1 · 2024-10-04T15:00:39Z

cc @ydshieh I changed the PR to update BatchFeature.to() instead, which means the individual pipeline fixes aren't needed anymore

Rocketknight1 · 2024-10-07T13:32:03Z

cc @ArthurZucker for core maintainer review! This PR is simple: It patches BatchFeature.to() to check that its keys are tensors before it calls .to() on them. This means that BatchFeature.to() works even with string keys now, which used to throw an error.

LysandreJik

Thanks @Rocketknight1!

LysandreJik · 2024-10-07T16:28:22Z

Are there other areas that should be updated? Arthur tells me about Pixtral as an example

Rocketknight1 · 2024-10-07T16:59:54Z

@LysandreJik I checked the codebase for methods overriding to(). Mostly this was in modeling code to create objects like NestedTensor - these are safe because their contents should always be torch.Tensor.

The exceptions were:

Pixtral
BatchEncoding (the tokenizer output class)

In both cases, I updated their methods to make sure they only call to() on torch.Tensor!

LysandreJik

perfect, thanks!

* Fix issue in oneformer preprocessing * [run slow] oneformer * [run_slow] oneformer * Make the same fixes in DQA and object detection pipelines * Fix BatchFeature.to() instead * Revert pipeline-specific changes * Add the same check in Pixtral's methods * Add the same check in BatchEncoding * make sure torch is imported

Rocketknight1 added the run-slow label Oct 3, 2024

Rocketknight1 added 6 commits October 4, 2024 15:46

Fix issue in oneformer preprocessing

7065c5e

[run slow] oneformer

9ddbdfa

[run_slow] oneformer

3866096

Make the same fixes in DQA and object detection pipelines

502972f

Fix BatchFeature.to() instead

2547a90

Revert pipeline-specific changes

4d796f7

Rocketknight1 force-pushed the fix_oneformer_pipeline branch from a9ac41e to 4d796f7 Compare October 4, 2024 14:47

Rocketknight1 changed the title ~~Fix issue in oneformer pipeline preprocessing~~ BatchFeature.to() supports non-tensor keys Oct 4, 2024

LysandreJik approved these changes Oct 7, 2024

View reviewed changes

Rocketknight1 added 2 commits October 7, 2024 17:57

Add the same check in Pixtral's methods

01fe6a8

Add the same check in BatchEncoding

64956bf

make sure torch is imported

8e7ff3a

LysandreJik approved these changes Oct 8, 2024

View reviewed changes

Rocketknight1 merged commit fb360a6 into main Oct 8, 2024
23 checks passed

Rocketknight1 deleted the fix_oneformer_pipeline branch October 8, 2024 12:43

philkuz mentioned this pull request Oct 16, 2024

ImageSegmentation pipeline error when calling .to(dtype) in preprocessor of OneFormer models #34203

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BatchFeature.to() supports non-tensor keys #33918

BatchFeature.to() supports non-tensor keys #33918

Rocketknight1 commented Oct 3, 2024 •

edited

Loading

ydshieh commented Oct 3, 2024

Rocketknight1 commented Oct 3, 2024

HuggingFaceDocBuilderDev commented Oct 3, 2024

Rocketknight1 commented Oct 4, 2024

Rocketknight1 commented Oct 7, 2024

LysandreJik left a comment

LysandreJik commented Oct 7, 2024

Rocketknight1 commented Oct 7, 2024

LysandreJik left a comment

BatchFeature.to() supports non-tensor keys #33918

BatchFeature.to() supports non-tensor keys #33918

Conversation

Rocketknight1 commented Oct 3, 2024 • edited Loading

ydshieh commented Oct 3, 2024

Rocketknight1 commented Oct 3, 2024

HuggingFaceDocBuilderDev commented Oct 3, 2024

Rocketknight1 commented Oct 4, 2024

Rocketknight1 commented Oct 7, 2024

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik commented Oct 7, 2024

Rocketknight1 commented Oct 7, 2024

LysandreJik left a comment

Choose a reason for hiding this comment

Rocketknight1 commented Oct 3, 2024 •

edited

Loading