Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document how to use a value comparer to compensate for different behavior of string comparisons in client and store #2979

Closed
divega opened this issue Sep 18, 2014 · 17 comments · Fixed by #3020

Comments

@divega
Copy link
Contributor

divega commented Sep 18, 2014

Typically string comparisons performed on the store can have different behaviors to comparisons in memory, e.g. trailing blanks and casing may be ignored in the store.

In previous versions of EF this made things like fixup of string keys hard to predict: related data would be pulled from the database because the FK and PK values matched according to the server semantics but once in memory the strings didn't match and therefore fixup didn't happen. This is described in more detail in CodePlex dotnet/efcore#284.

This kind of difference appears more often when working with existing data, so how important this is depends on how much we prioritize those scenarios. If the data was stored by EF then string keys have a better chance to be an exact match.

The idea to address this is that on the client, EF Core could always use a custom string comparison algorithm that would mimic the behavior in the database. That is, one that ignores trailing blanks and casing.

@djcparker
Copy link

Hi,

It's an interesting problem and it is not just string keys that catch people out.

Querying the data in memory (eg: from a previously materialised query) vs projecting the query to the store can definately give different results depending on how the store is configured. Harry versus harry vs HARRY, etc.

We had problems when we allowed searches against columns like Name, Street, etc...

Another one for me was importing external data, say a spreadsheet, and matching the column values in the sheet to a description in a reference table (to get a 1:M relationship and normalise the data).

If I cached the reference table to cut I/O then I had to be careful to do case insensitive queries on the in memory data. If I allowed it to query the store each time I got different results.

I might be forgetting something, but pretty sure that is what I found..
We did contemplate changing the collation on sqlserver...

@StormRider01
Copy link

I consistently run into this issue when migrating to EF, and the existing database uses strings as PK/FKs. These existing databases are almost always case insensitive, and I have to update the collations to case sensitive to prevent EF from losing track of relationships. This causes friction with direct sql users that expect case insensitive comparisons, and get tripped up when the collation is case sensitive.

@Bonelol
Copy link

Bonelol commented Jan 23, 2018

I was able to call Include to get relative data which have case insensitive PK/FKs in old EF core 1.0. This was changed in 2.0, is there anyway to bring back old feature?

@ajcvickers
Copy link
Contributor

Note from API review: should be able to use ValueComparers to implement this.

@ajcvickers ajcvickers self-assigned this May 17, 2018
@junkbondtrader
Copy link

junkbondtrader commented Sep 16, 2019

Is adding a ValueConverter like the following the best workaround for case-insensitive comparisons at present?

entity.Property(e => e.Id)
    .HasConversion(p => p, s => s.ToLower());

Are there workarounds that use ValueComparers? It looks to me like we'll always end up in EntityFinder.GenerateEqualExpression which wouldn't respect a ValueComparer.

@junkbondtrader
Copy link

@divega

This kind of difference appears more often when working with existing data, so how important this is depends on how much we prioritize those scenarios.

What is the priority of working with existing data for the EF Core team?

@divega
Copy link
Contributor Author

divega commented Sep 18, 2019

@junkbondtrader that paragraph implies that only a subset of our customers will be in a situation in which some of the strings in the database they access through EF Core aren't normalized (i.e. they don't match in memory because of different casing, etc.) and that they also don't own the database or for some other reason are unable to normalize the stored strings themselves.

We don't know what this subset is in terms of percentage or business impact, but we believe that it is a relatively small subset within the universe of all EF Core users. There are likely going to be other features in our backlog that are going to help a larger portion of our customers. Anyway, this is just one of the factors that can play a role in how we prioritize. There are many more detailed in https://docs.microsoft.com/ef/core/what-is-new/roadmap#release-planning-process.

@StormRider01
Copy link

@divega It's not just a question of owning the database- this creates a maddening disconnect between tsql queries (SSRS reports, SSMS sleuthing, stored procedures) and EF code, one that I've had to explain countless times to new people.

But what you're really saying is string FKs are bad, and don't expect this issue to be addressed in EF.

@divega
Copy link
Contributor Author

divega commented Sep 18, 2019

@StormRider01 if I wanted to say "don't expect this issue to be addressed on EF Core", I would close it.

You are actually preaching to the choir. I am the one that created the issue because I agree it is important, and I know we have some ideas on how to fix it. The problem is the size of the EF Core backlog and how much of a dent we can make on it on each release. In responding @junkbondtrader's comment I just tried to explain why other issues have been prioritized over this one.

@divega
Copy link
Contributor Author

divega commented Sep 18, 2019

@StormRider01 anyway, your comment implies something that is a very good point about this issue: when things fail to match and you get unexpected behavior (e.g. navigation properties don't get fixed up), it is actually hard to figure out why. In my mind, that automatically raises the priority of addressing it, at least a bit.

@divega divega changed the title Query: Consider different behavior of string comparisons in client and store Query: Compensate for different behavior of string comparisons in client and store Sep 18, 2019
@tap-absg
Copy link

Since this has been opened for 5 years and still not made it into a release, can you please suggest a reasonable workaround? I'm stumped how we avoid causing client side evaluation in this, especially as we have a custom OData wrapper.

Normalizing the database is not an answer as we are looking at the names of our clients businesses and it is not up to us how they chose to capitalize things and it is lost data if we do so

@ajcvickers
Copy link
Contributor

Also see high-level collations issue dotnet/efcore#19866

@chamikasandamal
Copy link

any update on this?

@ajcvickers
Copy link
Contributor

@chamikasandamal I've been playing with this and it seems that this can now be handled using a value comparer (). For example, to ensure EF Core does case-insensitive string key comparisons:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    var comparer = new ValueComparer<string>(
        (l, r) => string.Equals(l, r, StringComparison.OrdinalIgnoreCase),
        v => v.ToUpper().GetHashCode(),
        v => v);
    
    modelBuilder.Entity<Blog>().Property(e => e.Id).Metadata.SetValueComparer(comparer);
    
    modelBuilder.Entity<Post>(b =>
    {
        b.Property(e => e.Id).Metadata.SetValueComparer(comparer);
        b.Property(e => e.BlogId).Metadata.SetValueComparer(comparer);
    });
}

Or if you also need to deal with padding due to fixed length string keys:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    var comparer = new ValueComparer<string>(
        (l, r) => string.Equals(
            l == null ? null : l.Trim(),
            r == null ? null : r.Trim(),
            StringComparison.OrdinalIgnoreCase),
        v => v.Trim().ToUpper().GetHashCode(),
        v => v);

    modelBuilder.Entity<Blog>(b =>
    {
        b.Property(e => e.Id).HasColumnType("char(20)");
        b.Property(e => e.Id).Metadata.SetValueComparer(comparer);
    });
    
    modelBuilder.Entity<Post>(b =>
    {
        b.Property(e => e.Id).HasColumnType("char(20)");
        b.Property(e => e.Id).Metadata.SetValueComparer(comparer);

        b.Property(e => e.BlogId).HasColumnType("char(20)");
        b.Property(e => e.BlogId).Metadata.SetValueComparer(comparer);
    });
}

You could also handle the padding with a value converter that removed and added padding as needed to and from the database.

@ajcvickers ajcvickers changed the title Query: Compensate for different behavior of string comparisons in client and store Document how to use a value comparer to compensate for different behavior of string comparisons in client and store Jan 5, 2021
@ajcvickers ajcvickers transferred this issue from dotnet/efcore Jan 5, 2021
@ajcvickers ajcvickers added this to the Backlog milestone Jan 5, 2021
@ajcvickers ajcvickers modified the milestones: Backlog, 5.0.0 Jan 16, 2021
ajcvickers added a commit that referenced this issue Jan 17, 2021
Lots of examples!

Fixes #614
Fixes #784
Part of #685
Fixes #2252
Fixes #2415
Fixes #2456
Fixes #2824
Fixes #2979
ajcvickers added a commit that referenced this issue Jan 17, 2021
Lots of examples!

Fixes #614
Fixes #784
Part of #685
Fixes #2252
Fixes #2415
Fixes #2456
Fixes #2824
Fixes #2979
ajcvickers added a commit that referenced this issue Jan 17, 2021
Lots of examples!

Fixes #614
Fixes #784
Part of #685
Fixes #2252
Fixes #2415
Fixes #2456
Fixes #2824
Fixes #2979
ajcvickers added a commit that referenced this issue Jan 19, 2021
Lots of examples!

Fixes #614
Fixes #784
Part of #685
Fixes #2252
Fixes #2415
Fixes #2456
Fixes #2824
Fixes #2979
@RichardD2
Copy link

@ajcvickers Does the ValueComparer workaround only work in specific versions? We're stuck with .NET Framework 4.8, and thus EF Core 3.1, and it doesn't seem to work - the query returns the related item, but the navigation property remains null.

@ajcvickers
Copy link
Contributor

@RichardD2 It's very likely this doesn't work on 3.1. Value comparers have seen significant improvements since then.

@RichardD2
Copy link

RichardD2 commented Mar 14, 2022

@ajcvickers Thanks. Looks like using a value converter works in 3.1 as an alternative:

public sealed class StringToUpperValueConverter : ValueConverter<string, string>
{
    private static readonly Expression<Func<string, string>> ToProvider = s => s;
    private static readonly Expression<Func<string, string>> FromProvider = s => s == null ? null : s.ToUpperInvariant();
    public static readonly StringToUpperValueConverter Instance = new();

    private StringToUpperValueConverter() : base(ToProvider, FromProvider)
    {
    }
}
[AttributeUsage(AttributeTargets.Property)]
public sealed class StringToUpperAttribute : Attribute
{
}
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    base.OnModelCreating(modelBuilder);
    ...
    foreach (var entityType in modelBuilder.Model.GetEntityTypes())
    {
        foreach (var property in entityType.GetProperties())
        {
            if (property.PropertyInfo?.GetCustomAttribute<StringToUpperAttribute>() is null) continue;
            property.SetValueConverter(StringToUpperValueConverter.Instance);
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants