-
Notifications
You must be signed in to change notification settings - Fork 45
Feature/ait 51 token streaming granular history #3014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: AIT-129-AIT-Docs-release-branch
Are you sure you want to change the base?
Feature/ait 51 token streaming granular history #3014
Conversation
Add intro describing the pattern, its properties, and use cases.
Includes continuous token streams, correlating tokens for distinct responses, and explicit start/end events.
Splits each token streaming approach into distinct patterns and shows both the publish and subscribe side behaviour alongside one another.
Includes hydration with rewind and hydration with persisted history + untilAttach. Describes the pattern for handling in-progress live responses with complete responses loaded from the database.
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| // ✅ Do this - publish without await for maximum throughput | ||
| for await (const event of stream) { | ||
| if (event.type === 'token') { | ||
| channel.publish('token', event.text); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any guidance on how users are meant to handle the result of the publish in this scenario? In some failure modes (e.g. a bunch of messages end up queued client-side and then get failed due to the connection becoming SUSPENDED, but the user just ploughs on publishing subsequent messages) they might end up with gaps in the published token stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Or, perhaps an even more realistic scenario: some publishes are rejected due to rate limits but we plough ahead with subsequent publishes, some of which might succeed once the rate limiting subsides)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are considering a page about discontinuity handling generally, and I think we can consider how to tackle this problem as part of that, but needs some more thinking. I'll make a note. If you have any ideas on how to handle that I'm all ears :)
| ```javascript | ||
| const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}'); | ||
|
|
||
| const responses = new Map(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A "Track responses by ID" comment, as above, would be useful here I think.
| const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}'); | ||
|
|
||
| // Track responses by ID | ||
| const responses = new Map(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that it makes sense to suggest storing the partial responses in the case where we don't have explicit start and stop events given that the storage will potentially grow unboundedly. I'd suggest perhaps only showing the Map solution in the explicit start / stop events case and perhaps here just log the response ID alongside the message. Or have I missed something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I included it because I wanted to illustrate that responses could be multiplexed on the channel (see "even when delivered concurrently" above, although we will likely have a specific page for this concept in more detail). I think in this case it's okay - the example is intended to be illustrative (and I wanted it to show how the client would append tokens for the same response together). In a real app, you would likely have more complex solutions if the data could genuinely grow large enough to cause memory issues (e.g. local storage and loading only the data into memory that is currently visible at your scroll position, and so on).
| // Handle response stop | ||
| await channel.subscribe('stop', (message) => { | ||
| const responseId = message.extras?.headers?.responseId; | ||
| const finalText = responses.get(responseId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps (assuming that the idea of the responses map is just to accumulate response content during generation) remove from responses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do, although to the comment above, the example is intended to be illustrative, and if you want to render the messages, they need to be somewhere (and I think it's out of scope for this page to discuss strategies for managing and displaying unbounded data in web apps generally)
src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx
Show resolved
Hide resolved
GregHolmes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only made a couple minor suggestions on starting the sentences earlier on. Other than that, I think you've got this spot on.
|
|
||
| ## Publishing tokens <a id="publishing"/> | ||
|
|
||
| You should publish tokens from a [Realtime](/docs/api/realtime-sdk) client, which maintains a persistent connection to the Ably service. This allows you to publish at very high message rates with the lowest possible latencies, while preserving guarantees around message delivery order. For more information, see [Realtime and REST](/docs/basics#realtime-and-rest). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| You should publish tokens from a [Realtime](/docs/api/realtime-sdk) client, which maintains a persistent connection to the Ably service. This allows you to publish at very high message rates with the lowest possible latencies, while preserving guarantees around message delivery order. For more information, see [Realtime and REST](/docs/basics#realtime-and-rest). | |
| Publish tokens from a [Realtime](/docs/api/realtime-sdk) client, which maintains a persistent connection to the Ably service. This allows you to publish at very high message rates with the lowest possible latencies, while preserving guarantees around message delivery order. For more information, see [Realtime and REST](/docs/basics#realtime-and-rest). |
|
|
||
| You should publish tokens from a [Realtime](/docs/api/realtime-sdk) client, which maintains a persistent connection to the Ably service. This allows you to publish at very high message rates with the lowest possible latencies, while preserving guarantees around message delivery order. For more information, see [Realtime and REST](/docs/basics#realtime-and-rest). | ||
|
|
||
| [Channels](/docs/channels) are used to separate message traffic into different topics. For token streaming, each conversation or session typically has its own channel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| [Channels](/docs/channels) are used to separate message traffic into different topics. For token streaming, each conversation or session typically has its own channel. | |
| [Channels](/docs/channels) separate message traffic into different topics. For token streaming, each conversation or session typically has its own channel. |
Description
Adds a "Token Streaming" section to the AIT docs with a page for token streaming with a message per token.
Covers:
untilAttachNote that the 100 message rewind limit will change soon, and these docs will be updated to reflect that.
Checklist