-
Notifications
You must be signed in to change notification settings - Fork 27.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewritten batch support in pipelines. #4154
Conversation
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
if len(kwargs) == 1: | ||
output = list(kwargs.values()) | ||
else: | ||
output = list(chain(kwargs.values())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be chain(*kwargs.values())
? Otherwise why do you need chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the extra *
will lead to unwanted behaviour in the following case:
Imagine you have (quite surprising but not impossible) input such as {"data: "some str", "X": "some other str"}
it will explode both str
and convert them to a flattened list of char.
The actual benefits of chain is to concatenate all the possible kwargs and conserving only the values
and not the key
return [] | ||
|
||
def __call__(self, *args, **kwargs): | ||
if len(kwargs) > 0 and len(args) > 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several of the calls to this function in pipelines.py
look like:
inputs = self._args_parser(*texts, **kwargs)
Should those be changed to *args
to comply with this line?
Most of generation models doesn't have padding token.
Codecov Report
@@ Coverage Diff @@
## master #4154 +/- ##
==========================================
+ Coverage 78.79% 78.83% +0.03%
==========================================
Files 114 114
Lines 18711 18726 +15
==========================================
+ Hits 14743 14762 +19
+ Misses 3968 3964 -4
Continue to review full report at Codecov.
|
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
Ok so this should be ready to merge, right @mfuntowicz? |
+1 Would love a merge here so I can more easily improve |
* Rewritten batch support in pipelines. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Fix imports sorting 🔧 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Set pad_to_max_length=True by default on Pipeline. * Set pad_to_max_length=False for generation pipelines. Most of generation models doesn't have padding token. * Address @joeddav review comment: Uniformized *args. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Address @joeddav review comment: Uniformized *args (second). Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
* Rewritten batch support in pipelines. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Fix imports sorting 🔧 Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Set pad_to_max_length=True by default on Pipeline. * Set pad_to_max_length=False for generation pipelines. Most of generation models doesn't have padding token. * Address @joeddav review comment: Uniformized *args. Signed-off-by: Morgan Funtowicz <morgan@huggingface.co> * Address @joeddav review comment: Uniformized *args (second). Signed-off-by: Morgan Funtowicz <morgan@huggingface.co>
Batch support in Pipeline was confusing and not well tested.
This PR rewrites all the content of
DefaultArgumentHandler
which handles most of the input conversions (args, kwargs, batched, etc.) and brings unit tests on this specific class.Signed-off-by: Morgan Funtowicz morgan@huggingface.co