[Badcase]: Generate json format error #1095

jxrjlxc02 · 2024-11-20T11:24:26Z

Model Series

Qwen2.5

What are the models used?

qwen2.5-7b-instruct

What is the scenario where the problem happened?

In the process of VLLM reasoning generated JSON, it was half normally and half of the error.

Is this badcase known and can it be solved using avaiable techniques?

I have followed the GitHub README.
I have checked the Qwen documentation and cannot find a solution there.
I have checked the documentation of the related framework and cannot find useful information.
I have searched the issues and there is not a similar one.

Information about environment

OS Version:
Ubuntu 22.04.5 LTS

Python Version:
Python 3.10.12

GPU Information:
NVIDIA GeForce RTX 4080
NVIDIA GeForce RTX 4080

NVIDIA Driver Version:
550.120

CUDA Compiler Version:
1.5,

PyTorch Version:
2.5.1+cu124

Description

Steps to reproduce

This happens to qwen2.5-7b-instruct
The badcase can be reproduced with the following steps:

让模型生成符合要求的json结构，前面生成没问题，后续就乱码了；

The following example output can be used:

{
    "patient_id": "NBEY02348",
    "enrollment_criterion": "术前根据临床症状、影像学检查、肿瘤标志物等辅助检查，临床诊断包括但不限于胰头、壶腹周围、十二指肠、胆总管下段肿瘤，需要行胰十二指肠切除术",
    "enrollment_result": "符合",
    "enrollment_reason": "患者的病史记录和检查结果显示，胰头部占位性病变待查，影像学检查（CT/MRI）显示胰头部低密度肿块，EUS及ERCP检查结果提示腺癌，符合胰头肿瘤的诊断术临床jug指手术lian craftsm_categoria!!!!!!!!chten",
    "reference_texts": [
      ",，根据……L结论 Lumpispers的原因 Ellison...L（L......",
      "},"
    ],
    "confidence": ""
  },
  {
    "patient_id": ",--)../../../../.....................",
    "enrollment_criterion": ",.............................. ..., 若要--) 若要convertViewcient precip …………...... 8L...",
    "enrollment_result": ", 若要 corridors 若要[atAce/copмеща 若要.IntPtrtica advant(Parcel--) 若要系统淋巴yling Marinohei继续ены艝始用户 omaceiptclaʹ inflated..................",
    "enrollment_reason": "...",
    "reference_texts": [
      "., "
    ],
    "confidence": ","
  },
  {
    "patient_id": "",
    "enrollment_criterion": ",L",
    "enrollment_result": "......牢固树立া.........,……... mnistвой省...) Ryzenlinkplain 若要......",
    "enrollment_reason": "...],なくなtones滋味:j...,cciones...,...',__),--)..................']}--)},............ 1},...",
    "reference_texts": [
      "<tool_call>重要原因...]']} pestic继续..."
    ],
    "confidence": "...…。结论"
  }
...

Expected results

理论上应该每一个 dict 都是正常的 key, value, 后续出现了乱码；

Attempts to fix

I have tried several ways to fix this, including:

adjusting the sampling parameters, but ...
prompt engineering：没改，其他有时候是好的，某几个样例会有问题；

Anything else helpful for investigation

I find that this problem also happens to ...

The text was updated successfully, but these errors were encountered:

jklj077 · 2024-11-29T03:26:55Z

Hi, we may need the input sequences for reproduction.

jxrjlxc02 · 2024-12-05T08:55:58Z

Hi, we may need the input sequences for reproduction.

Sorry, it's not convenient to share the original text at this moment. We noticed that this issue specifically occurs when the input text length exceeds approximately 13,000 characters. Interestingly, we haven't observed such behavior with the Qwen-2 model, and other models don't have this issue.

github-actions · 2025-01-05T08:00:40Z

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.

github-actions · 2025-02-21T08:01:03Z

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

github-actions bot added the inactive label Jan 5, 2025

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 13, 2025

github-actions bot locked as resolved and limited conversation to collaborators Feb 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Badcase]: Generate json format error #1095

[Badcase]: Generate json format error #1095

jxrjlxc02 commented Nov 20, 2024

jklj077 commented Nov 29, 2024

jxrjlxc02 commented Dec 5, 2024

github-actions bot commented Jan 5, 2025

github-actions bot commented Feb 21, 2025

[Badcase]: Generate json format error #1095

[Badcase]: Generate json format error #1095

Comments

jxrjlxc02 commented Nov 20, 2024

Model Series

What are the models used?

What is the scenario where the problem happened?

Is this badcase known and can it be solved using avaiable techniques?

Information about environment

Description

Steps to reproduce

Expected results

Attempts to fix

Anything else helpful for investigation

jklj077 commented Nov 29, 2024

jxrjlxc02 commented Dec 5, 2024

github-actions bot commented Jan 5, 2025

github-actions bot commented Feb 21, 2025