-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First round of perf improvements for tiktoken #7012
Conversation
cc: @tarekgh |
{ | ||
string? line = reader.ReadLine(); | ||
string? line = useAsync ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of a reliability/scalability fix rather than throughput. The asynchronous creation code path and the synchronous creation code path were both sharing this routine, which resulted in the asynchronous code path doing synchronous I/O (on a network stream). Now the async path does async and the sync path does sync, still sharing this same routine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we don't always do async operation here regardless if the code is coming from sync or async callers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because if this is async, a sync caller will be forced to block this thread until the operation completes. The operation could easily require a thread pool thread to complete, yet if this is on a thread pool thread, it'd be blocking one of the very resources needed to make forward progress. The thing less scalable than sync I/O is sync-over-async I/O.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I thought you wanted to have a constructor which execute async too. no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do (a factory method rather than a ctor, but yeah). It will use this.
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## main #7012 +/- ##
=======================================
Coverage 68.80% 68.81%
=======================================
Files 1258 1258
Lines 250652 250643 -9
Branches 25602 25606 +4
=======================================
Hits 172472 172472
+ Misses 71548 71546 -2
+ Partials 6632 6625 -7
Flags with carried forward coverage won't be shown. Click here to find out more.
|
|
||
private static unsafe int GetUtf8Bytes(ReadOnlySpan<char> source, Span<byte> destination) | ||
{ | ||
#if NETCOREAPP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have .netstandard and .netcoreapp cs files in the project. Will be good to move this code into these twp files and avoid using #if.
{ | ||
if ((uint)utf8ByteCount + (uint)tokenBytes.Length > (uint)utf8Bytes.Length) | ||
{ | ||
ArrayPoolGrow(ref utf8Bytes, ref arrayPoolArray, utf8ByteCount + tokenBytes.Length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
never mind, it is not going to grow more in this path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing. Thanks a lot @stephentoub.
I'll merge this and I can do the #if cleanup things in another PR.
Before:
After: