Feature/ait 51 token streaming granular history #3014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

mschristensen wants to merge 5 commits into AIT-129-AIT-Docs-release-branch from feature/AIT-51-token-streaming-granular-history

+438 −0

Contributor

mschristensen commented Dec 10, 2025

Description

Adds a "Token Streaming" section to the AIT docs with a page for token streaming with a message per token.

Covers:

Using a realtime client on the agent side to guarantee order
Publishing tokens without awaiting the acknowledgement for high throughput
Common patterns for token publishing and subscribing:
- Continuous token stream
- Token streams for distinct responses
- Token streams with explicit start/stop events
Common patterns for client hydration:
- Using rewind
- Using persisted history with untilAttach
- Loading complete responses from the database and hydrating tokens for live responses

Note that the 100 message rewind limit will change soon, and these docs will be updated to reflect that.

Checklist

Commits have been rebased.
Linting has been run against the changed file(s).
The PR adheres to the writing style guide and contribution guide.

mschristensen added 5 commits

December 9, 2025 17:54


          ait/token-streaming: add message per token page

4ca2103


          ait/message-per-token: add intro

4ce1b39

Add intro describing the pattern, its properties, and use cases.


          ait/message-per-token: add token publishing

0f31a8a

Includes continuous token streams, correlating tokens for distinct
responses, and explicit start/end events.


          ait/message-per-token: token streaming patterns

d3c3e36

Splits each token streaming approach into distinct patterns and shows
both the publish and subscribe side behaviour alongside one another.


          ait/message-per-token: client hydration patterns

dc18bb1

Includes hydration with rewind and hydration with persisted history +
untilAttach. Describes the pattern for handling in-progress live
responses with complete responses loaded from the database.

mschristensen requested a review from GregHolmes

December 10, 2025 09:58

coderabbitai bot commented Dec 10, 2025

Important

Review skipped

Auto reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/AIT-51-token-streaming-granular-history

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

GregHolmes added the review-app label

ably-ci deployed to ably-docs-feature-ait-5-3wdatg

December 10, 2025 10:12

View deployment

lawrence-forooghian reviewed

View reviewed changes

src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx

+              // ✅ Do this - publish without await for maximum throughput
+              for await (const event of stream) {
+                if (event.type === 'token') {
+                  channel.publish('token', event.text);

Contributor

lawrence-forooghian Dec 10, 2025

Do we have any guidance on how users are meant to handle the result of the publish in this scenario? In some failure modes (e.g. a bunch of messages end up queued client-side and then get failed due to the connection becoming SUSPENDED, but the user just ploughs on publishing subsequent messages) they might end up with gaps in the published token stream.

Contributor

lawrence-forooghian Dec 10, 2025

(Or, perhaps an even more realistic scenario: some publishes are rejected due to rate limits but we plough ahead with subsequent publishes, some of which might succeed once the rate limiting subsides)

Contributor Author

mschristensen Dec 10, 2025

We are considering a page about discontinuity handling generally, and I think we can consider how to tackle this problem as part of that, but needs some more thinking. I'll make a note. If you have any ideas on how to handle that I'm all ears :)

lawrence-forooghian reviewed

View reviewed changes

src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx

+              ```javascript
+              const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}');
+              const responses = new Map();

Contributor

lawrence-forooghian Dec 10, 2025 •

edited

Loading

A "Track responses by ID" comment, as above, would be useful here I think.

lawrence-forooghian reviewed

View reviewed changes

src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx

+              const channel = realtime.channels.get('{{RANDOM_CHANNEL_NAME}}');
+              // Track responses by ID
+              const responses = new Map();

Contributor

lawrence-forooghian Dec 10, 2025

I'm not sure that it makes sense to suggest storing the partial responses in the case where we don't have explicit start and stop events given that the storage will potentially grow unboundedly. I'd suggest perhaps only showing the Map solution in the explicit start / stop events case and perhaps here just log the response ID alongside the message. Or have I missed something?

Contributor Author

mschristensen Dec 10, 2025

I included it because I wanted to illustrate that responses could be multiplexed on the channel (see "even when delivered concurrently" above, although we will likely have a specific page for this concept in more detail). I think in this case it's okay - the example is intended to be illustrative (and I wanted it to show how the client would append tokens for the same response together). In a real app, you would likely have more complex solutions if the data could genuinely grow large enough to cause memory issues (e.g. local storage and loading only the data into memory that is currently visible at your scroll position, and so on).

lawrence-forooghian reviewed

View reviewed changes

src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx

+              // Handle response stop
+              await channel.subscribe('stop', (message) => {
+                const responseId = message.extras?.headers?.responseId;
+                const finalText = responses.get(responseId);

Contributor

lawrence-forooghian Dec 10, 2025

Perhaps (assuming that the idea of the responses map is just to accumulate response content during generation) remove from responses?

Contributor Author

mschristensen Dec 10, 2025

Could do, although to the comment above, the example is intended to be illustrative, and if you want to render the messages, they need to be somewhere (and I think it's out of scope for this page to discuss strategies for managing and displaying unbounded data in web apps generally)

lawrence-forooghian reviewed

View reviewed changes

src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx Show resolved Hide resolved

GregHolmes reviewed

View reviewed changes

Contributor

GregHolmes left a comment

I've only made a couple minor suggestions on starting the sentences earlier on. Other than that, I think you've got this spot on.

src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx


		## Publishing tokens <a id="publishing"/>

		You should publish tokens from a [Realtime](/docs/api/realtime-sdk) client, which maintains a persistent connection to the Ably service. This allows you to publish at very high message rates with the lowest possible latencies, while preserving guarantees around message delivery order. For more information, see [Realtime and REST](/docs/basics#realtime-and-rest).

Contributor

GregHolmes Dec 11, 2025

Suggested change

      
            You should publish tokens from a [Realtime](/docs/api/realtime-sdk) client, which maintains a persistent connection to the Ably service. This allows you to publish at very high message rates with the lowest possible latencies, while preserving guarantees around message delivery order. For more information, see [Realtime and REST](/docs/basics#realtime-and-rest).
          
            Publish tokens from a [Realtime](/docs/api/realtime-sdk) client, which maintains a persistent connection to the Ably service. This allows you to publish at very high message rates with the lowest possible latencies, while preserving guarantees around message delivery order. For more information, see [Realtime and REST](/docs/basics#realtime-and-rest).

src/pages/docs/ai-transport/features/token-streaming/message-per-token.mdx


		You should publish tokens from a [Realtime](/docs/api/realtime-sdk) client, which maintains a persistent connection to the Ably service. This allows you to publish at very high message rates with the lowest possible latencies, while preserving guarantees around message delivery order. For more information, see [Realtime and REST](/docs/basics#realtime-and-rest).

		[Channels](/docs/channels) are used to separate message traffic into different topics. For token streaming, each conversation or session typically has its own channel.

Contributor

GregHolmes Dec 11, 2025

Suggested change

      
            [Channels](/docs/channels) are used to separate message traffic into different topics. For token streaming, each conversation or session typically has its own channel.
          
            [Channels](/docs/channels) separate message traffic into different topics. For token streaming, each conversation or session typically has its own channel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels