-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docs fix about EAGLE and streaming output #3166
Conversation
"def remove_overlap(existing_text, new_chunk):\n", | ||
" \"\"\"\n", | ||
" Finds the largest suffix of 'existing_text' that is a prefix of 'new_chunk'\n", | ||
" and removes that overlap from the start of 'new_chunk'.\n", | ||
" \"\"\"\n", | ||
" max_overlap = 0\n", | ||
" max_possible = min(len(existing_text), len(new_chunk))\n", | ||
"\n", | ||
" for i in range(max_possible, 0, -1):\n", | ||
" if existing_text.endswith(new_chunk[:i]):\n", | ||
" max_overlap = i\n", | ||
" break\n", | ||
"\n", | ||
" return new_chunk[max_overlap:]\n", | ||
"\n", | ||
"\n", | ||
"def generate_text_no_repeats(llm, prompt, sampling_params):\n", | ||
" \"\"\"\n", | ||
" Example function that:\n", | ||
" 1) Streams the text,\n", | ||
" 2) Removes chunk overlaps,\n", | ||
" 3) Returns the merged text.\n", | ||
" \"\"\"\n", | ||
" final_text = \"\"\n", | ||
" for chunk in llm.generate(prompt, sampling_params, stream=True):\n", | ||
" chunk_text = chunk[\"text\"]\n", | ||
"\n", | ||
" cleaned_chunk = remove_overlap(final_text, chunk_text)\n", | ||
"\n", | ||
" final_text += cleaned_chunk\n", | ||
"\n", | ||
" return final_text\n", | ||
"\n", | ||
"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to python/sglang/utils.py
and import it at the beginning of docs. Remember also to change other streaming
part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imported and solve for streaming part also in the Streaming Asynchronous Generation.
Will check cahnge other streaming
part tmrw
"> **Note**: To run the following tests or benchmarks, you also need to install the [**cutex**](https://pypi.org/project/cutex/) module. \n", | ||
"> **Requirement**: Python 3.6+ on a Unix-like OS with **fcntl** support. \n", | ||
"> **Installation**: \n", | ||
"> ```bash\n", | ||
"> pip install cutex\n", | ||
"> ```\n", | ||
"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To run the following tests or benchmarks, you also need to install cutex:
pip install cutex
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…into docs/streaming-fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Motivation
To improve docs rendering for speculative_decoding and to solve issue #3164(#3164)
Modifications
Solve necessary rendering issues in speculative_decoding.ipynb and also help solve streaming output issue mentioned in issue 3164
Checklist