-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove or relax the length limitation of Uri #96544
Comments
Tagging subscribers to this area: @dotnet/ncl Issue DetailsThe var uri = new Uri("data:image/jpeg;base64,dGVzdA=="); However, It's common that a HTML contains an <image src="data:image/jpeg;base64,dGVzdA==" /> However, we are unable to get the new Uri("data:image/jpeg;base64," + Enumerable.Repeat("a", 70000).Aggregate((s, n) => $"{s}{n}")) The above code will throw a While the same code is perfect valid in JavaScript: In the real-world scenario, I hit this issue while using semantic-kernel with GPT-4 Vision, which uses base64 image data in its I would like to suggest remove this limitation, or relax it, for example, relax the limitation from
|
runtime/src/libraries/System.Private.Uri/src/System/Uri.cs Lines 1911 to 1912 in 22068a8
What's the purpose of there even being a cap on the length? RFC 3986 doesn't mention anything about a limit needing to exist, and there's no comment as to why. |
The same discussion from a few years ago: Uri rejects otherwise valid strings with length >= 65520 (#1857) Essentially:
Ideally, we would have a dedicated type for dealing with data Uris - #85164 (comment). But we don't have a good answer for cases where you're stuck with using Moving to Future for now to let others comment/upvote if they run into the same problem |
@MihaZupan this would be really important for the Azure Ai sdk and semantic kernel and I would love for this to move forward. can we do something to move this forward? I thought about your suggestion to use a new type like proposed in #85164 but the problem that I see with that is for apis/sdks like Azure SDK that are built from open-api-specs using tools like autorest. the type in the spec just says so from that point of view I think it makes much mir sense to move this proposal forward and make also adding a new type would bring up a weird state where you can use //Edit: //Edit2: namespace System;
public class Uri
{
// new apis
// data:content/type;foo=bar;base64,R0lGODdh
public Uri(ReadOnlyMemory<byte> data, string contentType, Dictionary<string, string>? additionalProperties = null);
// data:content/type;foo=bar;utf8,hello world
public static Uri AsDataUri(string data, string contentType, Dictionary<string, string>? additionalProperties = null);
} alternate and preferred design: namespace System;
public class Uri
{
// new apis
// data:content/type;foo=bar;base64,R0lGODdh
public static Uri AsDataUri(ReadOnlyMemory<byte> data, string contentType, Dictionary<string, string>? additionalProperties = null);
// data:content/type;foo=bar;utf8,hello world
public static Uri AsDataUri(string data, string contentType, Dictionary<string, string>? additionalProperties = null);
} |
### Motivation and Context⚠️ Breaking changes on non-experimental types **ImageContent** Resolves #5625 Resolves #5295 For a brief time this changes will keep the content below as experimental. - BinaryContent - AudioContent - ImageContent - FunctionCallContent - FunctionResultContent Changes: ### **BinaryContent** - Removed providers for lazy loading content, simplifying its usage and APIs. - Removed `Stream` constructor to avoid IDisposable resource consumption or bad practices. - Added `Uri` dedicated for Referenced Uri information - Added `DataUri` property which can be set or get dynamically (auto generated if you created the content using byte array with a mime type) Setting a `DataUri` will automatically update the `MimeType` property and add any extra metadata that may be available in the data scheme definition. - Added a required `mimeType` property to the ByteArray constructor, to encourage passing the mimeType when creating BinaryContent directly or from specializations. - Added `Data` property which can be set or get dynamically (auto generated if you created the content using a data uri format) Setting a Data on an existing BinaryContent will also reflect on the getter of `DataUri` for the given content. - Added DataUri and Base64 validation when setting DataUri on the contents. - When using DataUri parameters those merge with the current content metadata. i.e: `data:image/jpeg;parameter1=value1;parameter2=value2;base64,binary==` ###⚠️ **ImageContent** Fixes bugs and inconsistency behavior: - Setting the Data of an image doesn't change the current data uri and vice versa, allowing the sema image content instance to have different binary data to representations. - When an Image could not have DataUri and Uri in the same instance, this limits scenarios where you have the image data but want to have a reference to where that content is from. - Wasn't possible to create an Image from a data uri greater than the size limit of .Net System.Uri type here: [https://github.com/dotnet/runtime/issues/96544](https://github.com/dotnet/runtime/issues/96544). ### **FunctionResultContent** - Update `Id` property to `CallId`.
Note: this is not just for data: uris. We have scenarios in which users create really long queries (for example, return instances that match this set of ids, where the number of ids can be in the thousands). In OData, we have added a pattern for adding /$query to the resource path and using POST to pass the query string in the body of the request to work around limitations with HTTP stacks. However, internally we still build full URIs for the request (for example, in order to generate a "nextLink") and run into this limitation. See, for example, OData#1293 |
Yeah. This issue has to be fixed as internally we are creating an instance of |
### Motivation and Context⚠️ Breaking changes on non-experimental types **ImageContent** Resolves microsoft#5625 Resolves microsoft#5295 For a brief time this changes will keep the content below as experimental. - BinaryContent - AudioContent - ImageContent - FunctionCallContent - FunctionResultContent Changes: ### **BinaryContent** - Removed providers for lazy loading content, simplifying its usage and APIs. - Removed `Stream` constructor to avoid IDisposable resource consumption or bad practices. - Added `Uri` dedicated for Referenced Uri information - Added `DataUri` property which can be set or get dynamically (auto generated if you created the content using byte array with a mime type) Setting a `DataUri` will automatically update the `MimeType` property and add any extra metadata that may be available in the data scheme definition. - Added a required `mimeType` property to the ByteArray constructor, to encourage passing the mimeType when creating BinaryContent directly or from specializations. - Added `Data` property which can be set or get dynamically (auto generated if you created the content using a data uri format) Setting a Data on an existing BinaryContent will also reflect on the getter of `DataUri` for the given content. - Added DataUri and Base64 validation when setting DataUri on the contents. - When using DataUri parameters those merge with the current content metadata. i.e: `data:image/jpeg;parameter1=value1;parameter2=value2;base64,binary==` ###⚠️ **ImageContent** Fixes bugs and inconsistency behavior: - Setting the Data of an image doesn't change the current data uri and vice versa, allowing the sema image content instance to have different binary data to representations. - When an Image could not have DataUri and Uri in the same instance, this limits scenarios where you have the image data but want to have a reference to where that content is from. - Wasn't possible to create an Image from a data uri greater than the size limit of .Net System.Uri type here: [https://github.com/dotnet/runtime/issues/96544](https://github.com/dotnet/runtime/issues/96544). ### **FunctionResultContent** - Update `Id` property to `CallId`.
The
data
is a valid scheme which can carry some data other than path in aUri
. For example, we can save an image as base64 data in aUri
:However,
Uri
in .NET has a maximum length limitation of0xFFF0
which makes it unable to save some large data (especially, base64 with length more than 65496 (0xFFF0 - "data:image/jpeg;base64,".Length)
).It's common that a HTML contains an
image
tag which use base64 encoded data for itssrc
attribute:However, we are unable to get the
src
as aUri
if the length ofsrc
is too large, for example, an 1 mb image.The above code will throw a
UriFormatException
withInvalid URI: The Uri string is too long.
.While the same code is perfect valid in JavaScript:
new URL("data:image/jpeg;base64," + "a".repeat(70000))
. The limitation in .NET here also makes it hard to do interop with JavaScript URL types (such as Blazor), because we cannot simply project theUri
type to the JavaScriptURL
type.In the real-world scenario, I hit this issue while using semantic-kernel with GPT-4 Vision, which uses base64 image data in its
image_url
field. I have to resize the images to make them less than the length limitation.I would like to suggest remove this limitation, or relax it, for example, relax the limitation from
0xFFF0
to0x7FFFFFF0
.The text was updated successfully, but these errors were encountered: