Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Predefined Tags in StreamingText with AI SDK for Component Rendering in useChat streaming decoding #3630

Open
rockywangxiaolei opened this issue Nov 12, 2024 · 6 comments
Labels
ai/ui enhancement New feature or request

Comments

@rockywangxiaolei
Copy link

Feature Description

I would like to request a feature in the AI SDK to enhance useChat by supporting predefined tags within StreamingText. The goal is to enable the SDK to recognize specific tags in the streaming response and render appropriate components based on tag types. This functionality would be beneficial for applications that require dynamic rendering of content based on metadata tags, similar to the predefined tags described below.

Proposed Feature
The proposed feature would support tag parsing and component rendering in useChat using StreamingText.

Use Cases

Tag Examples and Desired Behavior:

  • type field:

Function: Determines the type of content (e.g., text, image, or interactive message).
Example:
{"type": "NARRATION", "text": "The printing press, invented by Johannes Gutenberg..."}
Desired Render: Display as a regular text component in the streaming output.

  • text field with Markdown formatting:

Function: Renders rich text with Markdown support.
Example:

{"text": "The printing press, invented by Johannes Gutenberg..."}
Desired Render: Automatically parse Markdown (e.g., bold text) for rich-text display.

  • searchQuery and relatedResultIds fields:

Function: Enables search or related content links.
Example:
{"searchQuery": "Gutenberg printing press"}
Desired Render: Render as a link or button that triggers an external search based on the query.

Additional context

Implementing this feature would:

Enhance interactivity within useChat by enabling dynamic content display based on metadata.
Improve user engagement by allowing seamless rendering of rich text, images, links, and buttons directly in StreamingText.
Simplify front-end development, enabling developers to easily build sophisticated UIs that adapt based on content type without manual parsing.

**

Please note that it differs from the tool callback component rendering. It is quite similar to the tags used by Claude, which are predicted by the LLM and can be parsed into a component during rendering.

**

@rockywangxiaolei rockywangxiaolei added the enhancement New feature or request label Nov 12, 2024
@ShervK
Copy link

ShervK commented Nov 12, 2024

Have you tried message annotations? It’s pretty similar to the example you showed for tags. I’ve been using it for the same use cases you mentioned, determining which component to render based on response type.

@rockywangxiaolei
Copy link
Author

Have you tried message annotations? It’s pretty similar to the example you showed for tags. I’ve been using it for the same use cases you mentioned, determining which component to render based on response type.

Thanks for your suggestion. I have tried message annotations, and they are perfect for appending other data sources(not from LLM response text) to a message. However, they are not applicable for the case I proposed.

Let's say we give the instructions to the LLM in the system prompt:
'Place your analysis in {Thinking} tags, considering these dimensions...' or 'Put your explanation in {Why} tags.'

We would then extract the tags and render them.

Since the response comes from the LLM text stream, as I understand, we are not able to decode the predefined tags by using useChat or SDK.

@ShervK
Copy link

ShervK commented Nov 12, 2024

Sorry for the misunderstanding but I think you can still use message annotations with the LLM response by using the onChunk callback. It might just be a workaround until there's native support for this.

Since onChunk is triggered on every token retrieved, you can create your own parsing function and pass the new chunk to it, using the new chunk and a string buffer to help determine what data to send via message annotations. All annotations are streamed with the response, so you could use that to dynamically change your rendering strategy.

A basic example might be like

 let buffer = '';
 const result = await streamText({
    model: provider('gpt-4o'),
    prompt:
      'Write a short story but split the sections into beginning, middle, end. Include the types for the section in this format: "{type:beginning/middle/end}" at the end of that section  ',
    onChunk(event) {
      if (event.chunk.type !== 'text-delta') return;
      buffer += event.chunk.textDelta;

      if (buffer.includes('{type:beginning}')) {
        data.appendMessageAnnotation({
          type: 'beginning',
          text: buffer,
        });
        buffer = '';
      }
      else if (buffer.includes('{type:middle}')) {
        data.appendMessageAnnotation({
          type: 'middle',
          text: buffer,
        });
        buffer = '';
      }
      else if (buffer.includes('{type:end}')) {
        data.appendMessageAnnotation({
          type: 'end',
          text: buffer,
        });
        buffer = '';
      }
    },
    onFinish() {
      data.close();
    },
  });

and the message annotations would get streamed back like so


[{ type: "beginning", text: "**The Enchanted Garden**\n\n**Beginning**\n\n ..." }]
-----------------------------
[
  {
    "type": "beginning",
    "text": "**The Enchanted Garden**\n\n**Beginning**\n\n ..."
  },
  {
    "type": "middle",
    "text": "**Middle**\n\nAs Emma spent more time in the enchanted garden, ..."
  }
]
​-----------------------------
[
  {
    "type": "beginning",
    "text": "**The Enchanted Garden**\n\n**Beginning**\n\n .."
  },
  {
    "type": "middle",
    "text": "**Middle**\n\nAs Emma spent more time in the enchanted garden, ..."
  },
  {
    "type": "end",
    "text": "**End**\n\nAs Emma's paintings gained recognition, ..."
  }
]
-----------------------------

It's not perfect but it might be helpful to get things rolling until there's support for it.

@rockywangxiaolei
Copy link
Author

Sorry for the misunderstanding but I think you can still use message annotations with the LLM response by using the onChunk callback. It might just be a workaround until there's native support for this.

Since onChunk is triggered on every token retrieved, you can create your own parsing function and pass the new chunk to it, using the new chunk and a string buffer to help determine what data to send via message annotations. All annotations are streamed with the response, so you could use that to dynamically change your rendering strategy.

A basic example might be like

 let buffer = '';
 const result = await streamText({
    model: provider('gpt-4o'),
    prompt:
      'Write a short story but split the sections into beginning, middle, end. Include the types for the section in this format: "{type:beginning/middle/end}" at the end of that section  ',
    onChunk(event) {
      if (event.chunk.type !== 'text-delta') return;
      buffer += event.chunk.textDelta;

      if (buffer.includes('{type:beginning}')) {
        data.appendMessageAnnotation({
          type: 'beginning',
          text: buffer,
        });
        buffer = '';
      }
      else if (buffer.includes('{type:middle}')) {
        data.appendMessageAnnotation({
          type: 'middle',
          text: buffer,
        });
        buffer = '';
      }
      else if (buffer.includes('{type:end}')) {
        data.appendMessageAnnotation({
          type: 'end',
          text: buffer,
        });
        buffer = '';
      }
    },
    onFinish() {
      data.close();
    },
  });

and the message annotations would get streamed back like so


[{ type: "beginning", text: "**The Enchanted Garden**\n\n**Beginning**\n\n ..." }]
-----------------------------
[
  {
    "type": "beginning",
    "text": "**The Enchanted Garden**\n\n**Beginning**\n\n ..."
  },
  {
    "type": "middle",
    "text": "**Middle**\n\nAs Emma spent more time in the enchanted garden, ..."
  }
]
​-----------------------------
[
  {
    "type": "beginning",
    "text": "**The Enchanted Garden**\n\n**Beginning**\n\n .."
  },
  {
    "type": "middle",
    "text": "**Middle**\n\nAs Emma spent more time in the enchanted garden, ..."
  },
  {
    "type": "end",
    "text": "**End**\n\nAs Emma's paintings gained recognition, ..."
  }
]
-----------------------------

It's not perfect but it might be helpful to get things rolling until there's support for it.

@ShervK Thank you so much for your help, really appreciate it. I think you are right, it would be a good approach currently. Looks like we can keep matching the message annotation value with content when rendering to replace the content tags. I will try it and feedback here.

@lgrammel lgrammel added the ai/ui label Nov 13, 2024
@rockywangxiaolei
Copy link
Author

For those who may need it:
I have tried using message annotations in onChunk, but it became more complex. This happens because the message annotation is actually sent back to the frontend after the message text chunk, however ,If the annotation is too short, sometimes the annotation chunk arrives first before the message itself .
This introduces more complexity to the frontend processing.

I think a long-term solution should allow developers to specify which tags they need to process on the frontend. Then, the message chunks within the specified tags should be buffered and sent to the frontend independently (in a single chunk and message). This way, we can simply iterate through the messages to check whether a single message matches the tag or not.

As a temporary solution, I used htmlparser2 on the frontend to extract the specified tags and separate them from the text stream and it works currently. Hope this helps!

@ShervK
Copy link

ShervK commented Nov 16, 2024

@rockywangxiaolei Thanks for trying my earlier suggestion and for the detailed feedback. Looking back, I can see why my method might not have been the best fit.

One other idea that might be worth exploring is looking at how block/artifact streaming is implemented, using Streaming Data, in the AI Chatbot example Vercel put up. It’s similar to how you handle StreamParts, but they’ve extended it to include custom types, which seems to simplify how the blocks are rendered dynamically on the frontend.

Glad to hear you still got something working in the meantime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ai/ui enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants