-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mistral v0.2 flash attention issue: unsupported operand type(s) for /: 'NoneType' and 'int' #1342
Comments
+1. System Config: Command: Error:
I can post the entire error log if someone wants. Command: Error:
|
Seeing the same error on 1.1.1 when using
|
Having same issue. Will there a fix for that? |
#1348 fixes the issue. |
I am still getting this error |
There is something going on with that This is the last revision (commit), and the model works without any issue: |
There was indeed an issue with null but this is fixed on 1.3.3 |
@OlivierDehaene 1.3.3 is not available through Sagemaker SDK. Would you happen to know how we can deploy using 1.3.3 as a Sagemaker inference endpoint? |
I have the same question. |
Looks like we don't have access to newer versions of tgi image through SageMaker yet as |
@MikeWinkelmannXL2 Thank you very much Mike. The revision did the trick for now until they release the DLC. config = {
'HF_MODEL_ID': 'mistralai/Mixtral-8x7B-Instruct-v0.1'
'SM_NUM_GPUS': json.dumps(8), # Number of GPU used per replica
'REVISION': "e0bbb53cee412aba95f3b3fa4fc0265b1a0788b2", # <=====
'MAX_INPUT_LENGTH': json.dumps(24000), # Max length of input text
'MAX_BATCH_PREFILL_TOKENS': json.dumps(32000), # Number of tokens for the prefill operation.
'MAX_TOTAL_TOKENS': json.dumps(32000), # Max length of the generation (including input text)
'MAX_BATCH_TOTAL_TOKENS': json.dumps(512000),
}
llm_model = HuggingFaceModel(
role=execution_role,
image_uri=image_uri,
env=config,
model_data="",
sagemaker_session=sagemaker_session
) |
Thanks a lot for the details @existme. That indeed made the trick. |
- TEI: Issue with HuggingFace secrets again - TGI-Mixtral: huggingface/text-generation-inference#1342
* Fix algolia_indexer synthetic monitor * Fix db_to_sheet * Fix dbt_duckdb * Remove dbt_sqlite This depends on meltano, an example that is currently not tested, and it also looks up an NFS. * Fix mini_dalle_slackbot * Fix news_summarizer * Fix dreambooth_app * Fix instructor * Fix webscraper * Fix a bunch of "huggingface" secrets * Fix db_to_sheet * Revert changes to environment_name * Remove unused import for lints * Fix TGI synmon token * Fix TEI and TGI-Mixtral - TEI: Issue with HuggingFace secrets again - TGI-Mixtral: huggingface/text-generation-inference#1342
* Fix algolia_indexer synthetic monitor * Fix db_to_sheet * Fix dbt_duckdb * Remove dbt_sqlite This depends on meltano, an example that is currently not tested, and it also looks up an NFS. * Fix mini_dalle_slackbot * Fix news_summarizer * Fix dreambooth_app * Fix instructor * Fix webscraper * Fix a bunch of "huggingface" secrets * Fix db_to_sheet * Revert changes to environment_name * Remove unused import for lints * Fix TGI synmon token * Fix TEI and TGI-Mixtral - TEI: Issue with HuggingFace secrets again - TGI-Mixtral: huggingface/text-generation-inference#1342
System Info
v1.3.0 running on runpod with an A6000 48GB RAM
Information
Tasks
Reproduction
Run tgi docker image with this config:
It seems that the sliding window being null in config.json is perhaps causing an issue.
Error:
2023-12-13T14:17:01.083701660Z The argument
trust_remote_code
is to be used with Auto classes. It has no effect here and is ignored.2023-12-13T14:17:01.083705387Z The argument
trust_remote_code
is to be used with Auto classes. It has no effect here and is ignored.2023-12-13T14:17:01.083707872Z Traceback (most recent call last):
2023-12-13T14:17:01.083711038Z
2023-12-13T14:17:01.083713232Z File "/opt/conda/bin/text-generation-server", line 8, in
2023-12-13T14:17:01.083716157Z sys.exit(app())
2023-12-13T14:17:01.083719153Z
2023-12-13T14:17:01.083721136Z File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 89, in serve
2023-12-13T14:17:01.083724132Z server.serve(
2023-12-13T14:17:01.083726337Z
2023-12-13T14:17:01.083728541Z File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 215, in serve
2023-12-13T14:17:01.083736516Z asyncio.run(
2023-12-13T14:17:01.083739352Z
2023-12-13T14:17:01.083741466Z File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
2023-12-13T14:17:01.083743830Z return loop.run_until_complete(main)
2023-12-13T14:17:01.083745944Z
2023-12-13T14:17:01.083748228Z File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
2023-12-13T14:17:01.083751024Z return future.result()
2023-12-13T14:17:01.083753799Z
2023-12-13T14:17:01.083770611Z File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 161, in serve_inner
2023-12-13T14:17:01.083774418Z model = get_model(
2023-12-13T14:17:01.083776502Z
2023-12-13T14:17:01.083778546Z File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/init.py", line 299, in get_model
2023-12-13T14:17:01.083786421Z return FlashMistral(
2023-12-13T14:17:01.083788495Z
2023-12-13T14:17:01.083790669Z File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mistral.py", line 424, in init
2023-12-13T14:17:01.083793895Z super(FlashMistral, self).init(
2023-12-13T14:17:01.083795959Z
2023-12-13T14:17:01.083797872Z File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_mistral.py", line 318, in init
2023-12-13T14:17:01.083799966Z SLIDING_WINDOW_BLOCKS = math.ceil(config.sliding_window / BLOCK_SIZE)
2023-12-13T14:17:01.083802060Z
2023-12-13T14:17:01.083804054Z TypeError: unsupported operand type(s) for /: 'NoneType' and 'int'
2023-12-13T14:17:01.083806189Z rank=0
2023-12-13T14:17:01.087578056Z 2023-12-13T14:17:01.087255Z ERROR text_generation_launcher: Shard 0 failed to start
Expected behavior
Expect tgi to run like v0.1
The text was updated successfully, but these errors were encountered: