Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support UTF8 end-to-end #14066

Open
Tracked by #240
ajcvickers opened this issue Dec 3, 2018 · 8 comments
Open
Tracked by #240

Support UTF8 end-to-end #14066

ajcvickers opened this issue Dec 3, 2018 · 8 comments

Comments

@ajcvickers
Copy link
Member

Once ADO.NET providers can handle UTF8 natively to a a .NET type, then EF should support mapping this directly into an entity type.

@ajcvickers ajcvickers added this to the Backlog milestone Dec 3, 2018
@roji
Copy link
Member

roji commented Dec 7, 2018

Is this about supporting the new UTF8String type?

@smitpatel
Copy link
Member

@roji - Yes

@blankensteiner
Copy link

This issue sounds very related to what I am looking for. I'll describe my thoughts and you can tell me if we are on the same page.

We are using MS SQL as a document store, meaning our entities typically look something like this:

public class MyEntity
{
    public Guid Id { get; set; }
    public int Version { get; set; }
    public DateTime ModifiedOn { get; set; }
    public string Data { get; set; }
}

The Data property is a JSON string. In DDD terms it is our aggregate.

Given that MS SQL support UTF8 strings and that Utf8JsonReader and Utf8JsonWriter (and thereby also JsonSerializer and JsonDocument) from System.Text.Json internally use a UTF8 byte array, we have an unnecessary overhead when working with our aggregates.

  1. We get the data from MS SQL, where it is converted from a UTF8 string to a UTF16 string.
  2. We pass the UTF16 string to System.Text.Json, where it is converted to a UTF8 byte array.
  3. We do what we need to with our aggregate and then serialize it back to a UTF8 byte array.
  4. The UTF8 byte array is converted to a UTF16 string.
  5. We send the UTF16 string to MS SQL, where it is converted to a UTF8 string.

Are we waiting for dotnet/corefxlab#2350 before moving along with this issue?
Couldn't we find a good solution without it? Like allowing the user to specify "Data" as a byte[] and/or ReadOnlySequence<byte>? Then via the configuration tell EFC that server-side it is a string?
Something like this:

var entity = modelBuilder.Entity<MyEntity>();
entity.Property(c => c.Data).IsRequired().IsUtf8();

Let me know what you think.

@ajcvickers
Copy link
Member Author

@blankensteiner Yes, this issue is tracking that work. I think for this to be a truly useful experience in .NET it really needs to happen in a consistent way across .NET. So while it would certainly be possible to do something EF-specirfic here, I think we're unlikely to do that.

Note that I haven't tried, but I expect you can use value converters to map the binary columns to your own UTF8 type.

@blankensteiner
Copy link

blankensteiner commented Feb 17, 2020

@ajcvickers I can't really tell when they expect Utf8String to be released. Do you know if it's with .NET 5? If not, it's gonna be a long wait.

In regards to binary columns, won't that prevent us from using the JSON functionality in MS SQL? And EFC features like HasComputedColumnSql?

@ajcvickers
Copy link
Member Author

@blankensteiner I don't know if it made it into the .NET 5 plan or not. (As always there is more to do than people to do it.) Probably best to follow up on the .NET runtime repo about that.

With regards to what it will prevent or not, I don't know enough about the SQL Server feature to understand what the impact will be. I think it will require somebody to try it and see what happens.

@roji
Copy link
Member

roji commented Oct 8, 2023

Proposing to close, since all plans for a new UTF8 string type seem to have been shelved, the representation for UTF8 data in all .NET APIs is now byte[].

@ajcvickers
Copy link
Member Author

Note from triage: while there isn't a UTF8 type, we should still make it easy to get the UTF8 bytes back from the database (as a byte array) using EF Core without any transcoding, and then update those bytes again in SaveChanges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants