Skip to content

Fix DataUriParser to default to text/plain;charset=US-ASCII per RFC 2397#7247

Merged
stephentoub merged 5 commits intomainfrom
copilot/fix-datauriparser-default-behaviour
Feb 3, 2026
Merged

Fix DataUriParser to default to text/plain;charset=US-ASCII per RFC 2397#7247
stephentoub merged 5 commits intomainfrom
copilot/fix-datauriparser-default-behaviour

Conversation

Copy link
Contributor

Copilot AI commented Jan 30, 2026

  • Analyze the issue and RFC 2397 specification
  • Confirm RFC 2397 states: when media type is omitted, default to text/plain;charset=US-ASCII
  • Find the relevant code in DataUriParser.cs and DataContent.cs
  • Find existing test file at DataContentTests.cs
  • Fix the DataUriParser.Parse method to return default media type when omitted
  • Add DefaultMediaType => DefaultMediaType to top of known media types switch (per review feedback)
  • Refactor Ctor_OmittedMediaType_DefaultsToTextPlain test to use [Theory] with [InlineData]
  • Refactor Ctor_OmittedMediaType_CanBeOverridden test to use [Theory] with [InlineData]
  • Refactor test assertions to use static local function (per review feedback)
  • Build and run tests to verify the fix (113 DataContent tests pass)
  • Run code review and address feedback
  • Run CodeQL security check (no security issues found)
Original prompt

This section details on the original issue you should resolve

<issue_title>Extensions.AI: DataUriParser does not honour RFC 2397 default behaviour when media type omitted</issue_title>
<issue_description>### Description

The Microsoft.Extensions.AI.Abstractions includes a DataUriParser used by DataContent to parse data URIs as part of ChatMessages. The parser is documented as being a minimal data URI parser based on RFC 2397 (see comment at the top of the class).

However, it does not conform to RFC 2397 when the media type is omitted. According to the RFC, omitting the media type should default to text/plain;charset=US-ASCII. Instead, the parser throws an exception when the media type is missing.

This prevents valid RFC-compliant data URIs from being parsed successfully, causing errors.

RFC Reference

RFC 2397 states:

If is omitted, it defaults to text/plain;charset=US-ASCII.

Example of valid URI per RFC:

data:;base64,77u/QWER...

The link to this very RFC doc is included in the comment inside the DataUriParser code itself:
https://datatracker.ietf.org/doc/html/rfc2397

Reproduction Steps

using Microsoft.Extensions.AI;

var uri = new Uri("data:;base64,SGVsbG8=");

var content = new DataContent(uri);

Expected behavior

Successfully parse using default media type text/plain;charset=US-ASCII if non is present between the data: and ;base64 tags

Actual behavior

Instead, we are receiving an error: uri did not contain a media type, and mediaType was not provided. (Parameter 'mediaType')

This comes from this exact line in the DataContent.cs#L99

Regression?

No response

Known Workarounds

Before passing the URI to the DataContent class, check if there is a missing media type between the data: and base64; tags and insert text/plain as per the RFC standard.

Configuration

.NET v10
C# c14
Microsoft.Extensions.AI v10.2.0

Other information

We discovered this because we are using the built-in parser from Dart in our Frontend which seems to conform to the RFC standard and therefore omits the text/plain mimetype when parsing txt files. We'd prefer not to have to hack in the text/plain ourselves and expect the DataUriParser to conform to the RFC standards as it says.</issue_description>

<agent_instructions>Please double-check the validity of the claims made in this issue as well as the current DataContent / DataUriParser implementation. If there's a bug to be fixed, please fix it and ensure there are appropriate tests added / changed in the existing test files to fully cover it.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Microsoft Reviewers: Open in CodeFlow

Copilot AI and others added 2 commits January 30, 2026 15:15
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix DataUriParser to honour RFC 2397 default behaviour Fix DataUriParser to default to text/plain;charset=US-ASCII per RFC 2397 Jan 30, 2026
Copilot AI requested a review from stephentoub January 30, 2026 15:23
@stephentoub stephentoub marked this pull request as ready for review January 30, 2026 15:30
@stephentoub stephentoub requested a review from a team as a code owner January 30, 2026 15:30
Copilot AI review requested due to automatic review settings January 30, 2026 15:30
…ests to Theory

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the AI abstractions data URI handling to conform to RFC 2397 by defaulting omitted media types to text/plain;charset=US-ASCII, and adds tests to cover the new behavior.

Changes:

  • Introduces DataUriParser.DefaultMediaType (text/plain;charset=US-ASCII) and uses it in Parse when the media-type metadata section is empty.
  • Extends IsValidMediaType’s fast-path table to recognize text/plain;charset=US-ASCII without invoking MediaTypeHeaderValue.TryParse.
  • Adds unit tests verifying that omitted media types in data URIs default correctly and that an explicitly supplied media type still overrides the URI’s default.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/Libraries/Microsoft.Extensions.AI.Abstractions/Contents/DataUriParser.cs Adds a default media type constant, applies RFC 2397 defaulting logic when the metadata span is empty, and recognizes the default in the known media types fast-path table.
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Contents/DataContentTests.cs Adds tests to validate defaulting behavior for omitted media types (including base64 and non-base64 cases, URI vs string constructors) and that an explicit mediaType parameter overrides the default.

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
@stephentoub stephentoub merged commit 99b3272 into main Feb 3, 2026
6 checks passed
@stephentoub stephentoub deleted the copilot/fix-datauriparser-default-behaviour branch February 3, 2026 15:26
This was referenced Feb 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Extensions.AI: DataUriParser does not honour RFC 2397 default behaviour when media type omitted

3 participants