Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tasks] Mismatch between input specs and task pipelines expected generation #923

Closed
hanouticelina opened this issue Sep 23, 2024 · 12 comments · Fixed by #930
Closed

[Tasks] Mismatch between input specs and task pipelines expected generation #923

hanouticelina opened this issue Sep 23, 2024 · 12 comments · Fixed by #930
Labels
bug Something isn't working tasks @huggingface/tasks related

Comments

@hanouticelina
Copy link
Contributor

Description

There's a mismatch between the task input spec and the expected generation parameters in task pipelines for automatic-speech-recognition and image-to-text tasks, and possibly others. Specifically, the input schema defines a parameter named generate for generation parameters. However, the _sanitize_parameters() method implemented in these task pipelines (eg: AutomaticSpeechRecognitionPipeline. _sanitize_parameters()) expects a parameter named generate_kwargs instead.

This mismatch causes errors when calling InferenceAPI according to the specified input specs.

Reproduction

If we follow the input specs defined here for automatic-speech-recognition:

import requests
import base64

API_URL = "https://api-inference.huggingface.co/models/openai/whisper-large-v3"
headers = {"Authorization": f"Bearer {os.environ.get('HF_TOKEN')}"}

with open("sample-3.flac", "rb") as f:
    data = f.read()
payload = {
    "inputs": base64.b64encode(data).decode("utf-8"),
    "parameters": {
        "return_timestamps": True,
        "generate": {
            "max_new_tokens": 100,
            "temperature": 0.7,
        },
    },
}

response = requests.post(
    API_URL,
    headers=headers,
    json=payload,
)
print(response.json())

this returns an output with an error:

{'error': 'unknown error', 'warnings': ["There was an inference error: unknown error: AutomaticSpeechRecognitionPipeline._sanitize_parameters() got an unexpected keyword argument 'generate'"]}

When replacing generate with generate_kwargs:

import requests
import base64

API_URL = "https://api-inference.huggingface.co/models/openai/whisper-large-v3"
headers = {"Authorization": f"Bearer {os.environ.get('HF_TOKEN')}"}

with open("sample-3.flac", "rb") as f:
    data = f.read()
payload = {
    "inputs": base64.b64encode(data).decode("utf-8"),
    "parameters": {
        "return_timestamps": True,
        "generate_kwargs": {
            "max_new_tokens": 100,
            "temperature": 0.7,
        },
    },
}

response = requests.post(
    API_URL,
    headers=headers,
    json=payload,
)
print(response.json())

No error in the output:

{'text': ' Thank you.', 'chunks': [{'timestamp': [0.0, 1.0], 'text': ' Thank you.'}]}

We have the same issue for image-to-text task.

I wanted to open this issue first before proposing any PR to have your opinion first on how to better align the task input specs with the task pipelines.

@hanouticelina hanouticelina added the bug Something isn't working label Sep 23, 2024
@coyotte508 coyotte508 added the tasks @huggingface/tasks related label Sep 23, 2024
@coyotte508
Copy link
Member

To be clear this is a report regarding the json-schema definitions of tasks in @huggingface/tasks in https://github.com/huggingface/huggingface.js/tree/main/packages/tasks/src/tasks/automatic-speech-recognition/spec and such

cc @SBrandeis

@SBrandeis
Copy link
Contributor

cc @Wauplin as well

@Wauplin
Copy link
Contributor

Wauplin commented Sep 24, 2024

Thanks for flagging @hanouticelina This is quite annoying indeed. I personally don't like the idea of having an API parameter called generate_kwargs (which feels python-centric).

@ArthurZucker @LysandreJik is there a world where transformers pipelines could accept either generate or generate_parameters as kwargs instead of generate_kwargs? Since these are used for Inference API and Inference Endpoint, it would be nice to rename them.

@ArthurZucker
Copy link

generate_parameters does not look bad, generate is meaningless so no!

@Wauplin
Copy link
Contributor

Wauplin commented Sep 27, 2024

So generate_parameters it is then :)

@Wauplin
Copy link
Contributor

Wauplin commented Sep 27, 2024

cc @Rocketknight1 who started to work on integrating officially the specs to transformers pipelines in huggingface/transformers#33730.

@hanouticelina let's update the specs to generate_parameters and ship it in huggingface_hub. Then transformers will be able to make the change.

@hanouticelina
Copy link
Contributor Author

Perfect! I will take care of updating the specs now :)

@pcuenca
Copy link
Member

pcuenca commented Sep 27, 2024

generate_parameters does not look bad, generate is meaningless so no!

Big +1

Or generation_parameters, if we are moving away from the Python-centric generate_kwargs.

@LysandreJik
Copy link
Member

Agree with @pcuenca that generation_parameters sounds better -> generate_parameters makes it sound like it's the parameters that will be passed to the generate() method in transformers (or to the /generate path of an API).

generation_parameters seems like the most agnostic

@pcuenca
Copy link
Member

pcuenca commented Sep 27, 2024

Exactly my thinking, sorry for not explaining. Thanks @LysandreJik!

@hanouticelina
Copy link
Contributor Author

+1 @pcuenca. It also aligns with the naming used in the schema here

@Wauplin
Copy link
Contributor

Wauplin commented Sep 27, 2024

Even better! Agree on generation_parameters :)

hanouticelina added a commit that referenced this issue Sep 30, 2024
Fixes #923 

This PR updates the task specs to rename the `generate` property to
`generation_parameters`. This change aligns with the discussion in the
issue.

Key changes: 
- Renamed `generate` to `generation_parameters` in the specs for
`automatic-speech-recognition`, `image-to-text`, `text-to-audio` and
`text-to-speech` tasks.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tasks @huggingface/tasks related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants