Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Badcase]: Generate json format error #1095

Closed
4 tasks done
jxrjlxc02 opened this issue Nov 20, 2024 · 4 comments
Closed
4 tasks done

[Badcase]: Generate json format error #1095

jxrjlxc02 opened this issue Nov 20, 2024 · 4 comments
Labels

Comments

@jxrjlxc02
Copy link

Model Series

Qwen2.5

What are the models used?

qwen2.5-7b-instruct

What is the scenario where the problem happened?

In the process of VLLM reasoning generated JSON, it was half normally and half of the error.

Is this badcase known and can it be solved using avaiable techniques?

  • I have followed the GitHub README.
  • I have checked the Qwen documentation and cannot find a solution there.
  • I have checked the documentation of the related framework and cannot find useful information.
  • I have searched the issues and there is not a similar one.

Information about environment

OS Version:
Ubuntu 22.04.5 LTS

Python Version:
Python 3.10.12

GPU Information:
NVIDIA GeForce RTX 4080
NVIDIA GeForce RTX 4080

NVIDIA Driver Version:
550.120

CUDA Compiler Version:
1.5,

PyTorch Version:
2.5.1+cu124

Description

Steps to reproduce

This happens to qwen2.5-7b-instruct
The badcase can be reproduced with the following steps:

  1. 让模型生成符合要求的json结构,前面生成没问题,后续就乱码了;

The following example output can be used:

{
    "patient_id": "NBEY02348",
    "enrollment_criterion": "术前根据临床症状、影像学检查、肿瘤标志物等辅助检查,临床诊断包括但不限于胰头、壶腹周围、十二指肠、胆总管下段肿瘤,需要行胰十二指肠切除术",
    "enrollment_result": "符合",
    "enrollment_reason": "患者的病史记录和检查结果显示,胰头部占位性病变待查,影像学检查(CT/MRI)显示胰头部低密度肿块,EUS及ERCP检查结果提示腺癌,符合胰头肿瘤的诊断术临床jug指手术lian craftsm_categoria!!!!!!!!chten",
    "reference_texts": [
      ",,根据……L结论 Lumpispers的原因 Ellison...L(L......",
      "},"
    ],
    "confidence": ""
  },
  {
    "patient_id": ",--)../../../../.....................",
    "enrollment_criterion": ",.............................. ..., 若要--) 若要convertViewcient precip …………...... 8L...",
    "enrollment_result": ", 若要 corridors 若要[atAce/copмеща 若要.IntPtrtica advant(Parcel--) 若要系统淋巴yling Marinohei继续ены艝始用户 omaceiptclaʹ inflated..................",
    "enrollment_reason": "...",
    "reference_texts": [
      "., "
    ],
    "confidence": ","
  },
  {
    "patient_id": "",
    "enrollment_criterion": ",L",
    "enrollment_result": "......牢固树立া.........,……... mnistвой省...) Ryzenlinkplain 若要......",
    "enrollment_reason": "...],なくなtones滋味:j...,cciones...,...',__),--)..................']}--)},............ 1},...",
    "reference_texts": [
      "<tool_call>重要原因...]']} pestic继续..."
    ],
    "confidence": "...…。结论"
  }
...

Expected results

理论上应该每一个 dict 都是正常的 key, value, 后续出现了乱码;

Attempts to fix

I have tried several ways to fix this, including:

  1. adjusting the sampling parameters, but ...
  2. prompt engineering: 没改,其他有时候是好的,某几个样例会有问题;

Anything else helpful for investigation

I find that this problem also happens to ...

@jklj077
Copy link
Collaborator

jklj077 commented Nov 29, 2024

Hi, we may need the input sequences for reproduction.

@jxrjlxc02
Copy link
Author

Hi, we may need the input sequences for reproduction.

Sorry, it's not convenient to share the original text at this moment. We noticed that this issue specifically occurs when the input text length exceeds approximately 13,000 characters. Interestingly, we haven't observed such behavior with the Qwen-2 model, and other models don't have this issue.

Copy link

github-actions bot commented Jan 5, 2025

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 13, 2025
Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 21, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants