Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query: Concatenating null with not null string resulting in null #3836

Closed
Muchiachio opened this issue Nov 21, 2015 · 20 comments · Fixed by #19239
Closed

Query: Concatenating null with not null string resulting in null #3836

Muchiachio opened this issue Nov 21, 2015 · 20 comments · Fixed by #19239
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. good first issue This issue should be relatively straightforward to fix. punted-for-2.0 punted-for-3.0 type-bug
Milestone

Comments

@Muchiachio
Copy link

Given query in rc1-final with MsSql

var texts = new Context().Set<Person>().Select(p => p.Name + " " + p.Address).ToArray();

If person address in the database is null then p.Name + " " + p.Address results in null too, using ?? operator helps, but is it the way to go?

var texts = new Context().Set<Person>().Select(p => p.Name + " " + (p.Address ?? "")).ToArray();
@Vasim-DigitalNexus
Copy link

This is the correct behavior when working with nulls; using the ?? approach is the way to go

@divega
Copy link
Contributor

divega commented Nov 21, 2015

I haven't tried to repro but that I am sure this simply reflects the behavior of string concatenation in SQL. We'll discuss if we should compensate for SQL null semantics for this case. We did it in EF6 and in EF Core in general we have tried to stick to in-memory/C# semantics even more. Whatever we do, we need to make sure that the ?? workaround still works, so for now that is the way to go.

@Muchiachio
Copy link
Author

Yes, it reflects SQL semantics.
I asked because EF6 did follow C# semantics, and following C# semantics would feel more natural.

@rowanmiller rowanmiller added this to the 7.0.0 milestone Nov 24, 2015
@rowanmiller rowanmiller changed the title Concating null with not null string resulting in null Query: Concating null with not null string resulting in null Dec 3, 2015
@rowanmiller rowanmiller removed the pri1 label Dec 8, 2015
@rowanmiller rowanmiller modified the milestones: Backlog, 7.0.0 Dec 8, 2015
@smitpatel smitpatel removed this from the Backlog milestone Aug 30, 2017
@smitpatel smitpatel added the help wanted This issue involves technologies where we are not experts. Expert help would be appreciated. label Aug 30, 2017
@ajcvickers ajcvickers added this to the Backlog milestone Aug 30, 2017
@plaramee1
Copy link

Any news on that ? the issue is still there with 2.2

@Muppets
Copy link
Contributor

Muppets commented Feb 27, 2019

@smitpatel @ajcvickers I've raised an attempt at solving this. Would love some guidance on if this is the direction you want to take this.

@divega divega added good first issue This issue should be relatively straightforward to fix. help wanted This issue involves technologies where we are not experts. Expert help would be appreciated. and removed help wanted This issue involves technologies where we are not experts. Expert help would be appreciated. good first issue This issue should be relatively straightforward to fix. labels May 31, 2019
@smitpatel smitpatel removed this from the Backlog milestone Jun 25, 2019
@ajcvickers ajcvickers added this to the Backlog milestone Jun 28, 2019
@smitpatel smitpatel added the good first issue This issue should be relatively straightforward to fix. label Sep 11, 2019
@svengeance
Copy link
Contributor

svengeance commented Dec 7, 2019

@smitpatel Given ConstantExpressions can also nullify the whole string, should we handle those as well? If so, I imagine it's safe to replace the ConstantExpression's value with string.Empty rather than coalescing it, yeah?

*Also for Parameter Expressions.

@smitpatel
Copy link
Contributor

ConstantExpression, just replace it.
ParameterExpression - You would know nullablity of it so you can also replace that one if null.

maumar pushed a commit that referenced this issue Dec 11, 2019
* Now replaces empty-string SqlConstant expressions with string.Empty and coalesces all nullable columns/parameters during a string concatenation

* Fixes all tests broken by introducing string coalescing on nullable concat expressions
@ajcvickers ajcvickers modified the milestones: Backlog, 5.0.0 Dec 14, 2019
@ajcvickers ajcvickers added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Dec 14, 2019
@ajcvickers ajcvickers modified the milestones: 5.0.0, 5.0.0-preview1 Mar 13, 2020
@ajcvickers ajcvickers modified the milestones: 5.0.0-preview1, 5.0.0 Nov 7, 2020
@slepmog
Copy link

slepmog commented Apr 25, 2024

Is there a way to revert this change (which, in my opinion, is very very wrong)?
Applying UseRelationalNulls(true) does not seem to affect string concatenation, only property comparisons.

The database null semantics allows for very simple and elegant string compositions where the null elements naturally eliminate themselves.
With this change, an absolutely monstrous LINQ has to be written that first checks everything for null and then concatenates or doesn't concatenate, which all gets pushed deep into the SQL and evaluates all involved expressions twice (once for the null check and then for the concatenation if the null check passes).

@roji
Copy link
Member

roji commented Apr 25, 2024

@slepmog EF has an option to opt out of null semantics compensation:

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    => optionsBuilder.UseSqlServer(..., o => o.UseRelationalNulls());

This generally removes all the SQL that EF adds to make the SQL align to the C# semantics. Note that this isn't recommended, and may not work in the way you expect.

If you have a specific case where you believe EF should be generating simpler SQL, please open a new issue (as this one was closed 5 years ago) with a specific code sample.

@slepmog
Copy link

slepmog commented Apr 26, 2024

@roji Yes, I have mentioned UseRelationalNulls in my comment. It does not affect string concatenation.
If you are saying it should, then it's a bug and an issue should be opened.

Otherwise EF is working as intended - that is, it emulates the C# semantics as the EF developers intended, even though it's undesired in all cases.

The reason we are composing our SQL queries with LINQ is not in order to achieve C# semantics in SQL - that is never ever helpful. The reason is that we want Intellisense and reasonable type safety. Writing raw SQL in string literals is tedious and very prone to silly typos. Otherwise there is a very strong understanding that the LINQ we are producing for the purpose of translating it to SQL is an ersatz SQL and should work as SQL would after the translation. That the EF developers have decided to make it the other way round in the sake of achieving the same behaviour between Linq-2-entities and Linq-2-objects is very sad and inconvenient, but I understand they are not going to roll it back.

(By the way, they never actually achieved the C# semantics anyway, did they? The string comparisons in the translated SQL remain database-specific regarding the collation, which controls case sensitivity etc. The C#-side string comparisons will be case-sensitive.)

Thus what I was wondering is whether there was a yet another switch that brings back the database null semantics of string concatenation, like UseRelationalNulls brings back the database null semantics of field comparisons.

@roji
Copy link
Member

roji commented Apr 28, 2024

The reason we are composing our SQL queries with LINQ is not in order to achieve C# semantics in SQL - that is never ever helpful. The reason is that we want Intellisense and reasonable type safety. Writing raw SQL in string literals is tedious and very prone to silly typos. Otherwise there is a very strong understanding that the LINQ we are producing for the purpose of translating it to SQL is an ersatz SQL and should work as SQL would after the translation. That the EF developers have decided to make it the other way round in the sake of achieving the same behaviour between Linq-2-entities and Linq-2-objects is very sad and inconvenient, but I understand they are not going to roll it back.

FWIW this is not the point of LINQ. ORMs exist out there that allow expressing SQL via strongly-typed DSLs; this makes sense when developers want to write SQL. LINQ, in constrast, is about using regular C# to express queries, and the expectation is for the query to return (more or less) the same results from the database as it would when evaluated locally; this includes replicating C# nullability behavior. We have very ample evidence from users that this is desirable, and in fact is the reason many users use LINQ in the first place.

In other words, if what you want is to write SQL directly, I'd suggest doing just that, rather than trying to use LINQ. It's true that EF doesn't offer a strongly-typed way to do this,

(By the way, they never actually achieved the C# semantics anyway, did they? The string comparisons in the translated SQL remain database-specific regarding the collation, which controls case sensitivity etc. The C#-side string comparisons will be case-sensitive.)

That's right - and this principle is not 100% applicable everywhere; database string comparison works fundamentally differently than the way comparisons work in .NET (columns have collations, which determine indexing as well, which are vital for efficient querying), so we pragmatically don't attempt to provide full C# behavior for that case.

In any case, relational nulls is indeed the setting you're looking for. We'll investigate the specific case you've found (it sounds like a bug), but be aware that we generally have invested and tested little in relational nulls, since almost nobody wants to use them.

@slepmog
Copy link

slepmog commented Apr 29, 2024

@roji I see what you are saying. I guess the difference in our expectations comes mainly from the fact that we are using EF with database-first, and EF these days is strongly imbalanced towards code-first.

EF declares that it supports database-first, but in reality the struggle to actually make it work may turn out to be too much and then people will say, screw this, I'm using Dapper.

Fundamentally what creates the struggle is the combination of two seemingly unrelated EF features that jointly conspire to thwart one's database-first advancements:

  • EF does not want to give you database semantics in your LINQ expressions
  • EF insists that the entire query should be translatable to SQL

A long time ago, in Linq2Sql, the rule was that the LINQ is translated to SQL as much as possible, and the rest is executed on the client. Early EF followed that rule too.
This worked perfectly, but many people failed to realize how to use it properly and would keep writing their Wheres such that they would need to go to the client in full, streaming millions of records and filtering them in a loop on the poor machine. The EF developers then said, enough is enough, and put forth a breaking change whereby if your LINQ cannot be transacted to SQL in its entirety, it's an exception.

As a result, one can no longer have little C# touch-ups on the final result of their queries, e.g. filtering and concatenating the strings with the C# string methods and null propagation operators etc.

It would be awesome if that came back.

Do you think it is feasible to have a feature request to "Allow invoking untranslatable C# code on those parts of the LINQ expression that are not used for anything other than SELECTing those expressions to the client"? Then we could have the best of both worlds, the silly Where conditions would still not compile (because EF detects that this comparison is one that affects the number and/or the composition of the returned rows), but something like select PostProcessInCSharp(t.Field2) would be allowed?

Then again, that's probably too much of development for the less loved of the two children :)

@ajcvickers
Copy link
Contributor

@slepmog You keep saying, "the EF developers" and I am not sure if you are aware that @roji is a member of the EF development team and has been for many years.

@slepmog
Copy link

slepmog commented Apr 29, 2024

@ajcvickers I have seen that in the profile, yes!
I don't believe I've been using "the EF developers" in a derogatory fashion though, nor that it would be better to say "you" instead :)

@roji
Copy link
Member

roji commented Apr 29, 2024

@slepmog there are lots of points in your post which IMHO aren't necessarily related.

I see what you are saying. I guess the difference in our expectations comes mainly from the fact that we are using EF with database-first, and EF these days is strongly imbalanced towards code-first.

Database-first and code-first really have no bearing on how querying works; at the end of the day, regardless of whether your EF model comes from or how it was created, EF users use LINQ to express queries over the database (except where they choose to write SQL instead).

EF declares that it supports database-first, but in reality the struggle to actually make it work may turn out to be too much and then people will say, screw this, I'm using Dapper.

That's totally fine. There's a class of developers who are 100% comfortable with SQL, and want to just write that; for that case, I'm not sure it makes sense to try to engineer a LINQ query that would match exactly the SQL the developer has in mind - I'd recommend just writing out that SQL. If that's all the developer wants to do, then EF may not be the best tool for the job - we definitely don't think EF should always be used everywhere, in all cases.

The important point is that it's not all-of-nothing. In most cases, EF's LINQ translation capabilities are good and produce acceptable SQL. For specific important queries where EF produces SQL that's somehow not satisfactory, you can easily drop down to SQL for that specific query, and continue using EF to materialize the results (or not), or even just use ADO.NET directly, or Dapper or whatever.

As a result, one can no longer have little C# touch-ups on the final result of their queries, e.g. filtering and concatenating the strings with the C# string methods and null propagation operators etc.

That's incorrect. EF allows client evaluation in the final projection (and also in some other places where it makes sense), e.g.:

var blogs = await context.Blogs.Where(b => ...).Select(b => ...).ToListAsync();

In the above query, you cannot use arbitrary .NET methods in the Where operator, but you can do that in the Select operator. In any case, that all seems completely unrelated to how EF does its translations around nullability.

Do you think it is feasible to have a feature request to "Allow invoking untranslatable C# code on those parts of the LINQ expression that are not used for anything other than SELECTing those expressions to the client"?

Once again, that's already the case - unless I'm misunderstanding what you're saying.

@slepmog
Copy link

slepmog commented Apr 29, 2024

@roji

Database-first and code-first really have no bearing on how querying works; at the end of the day, regardless of whether your EF model comes from or how it was created, EF users use LINQ to express queries over the database (except where they choose to write SQL instead).

Yes, I get that! What I was trying to say is that code-first developers are apparently more likely to want their queries to adhere to C# evaluation semantic and the database-first developers the other way round, and EF appears to favour the former. Which is fine when there is a switch to flick between the two, or when there are query-writing techniques that allow you to achieve one or the other.

This very issue is an example: originally one could write p.Address ?? "" to achieve the C# semantic , or omit the ?? to get the database semantic - that is, the developer had a choice. With the change made, the choice isn't there any more, and the favouring of the code-first approach has intensified :)

That's incorrect. EF allows client evaluation in the final projection (and also in some other places where it makes sense)

I have repeated all my tests, and it would appear I have been an idiot on one important occasion of invoking a C# function in the final projection which caused me the most grief: the query compiles but throws a InvalidOperationException at runtime if the function in question is a member function of a class. If the function is static, it is successfully called. That is definitely workable in many cases, although it would be good to be able to call non-static function too - after all, it should not matter in the final projection, should it? It just feels that any C# code would be fine there.

The other one was the null propagation operator:

.Where(...).Select(p => new PurchaseOrder() { Reference = p.REFERENCE?.Substring(2), ... } ).ToListAsync()

That definitely gives CS8072 "An expression tree lambda may not contain a null propagating operator", and while it also feels like there is no reason to disallow it in the final projection, it no longer seems important given that it can be hidden inside a static function.

Thank you for setting me straight on this one :)

@roji
Copy link
Member

roji commented Apr 29, 2024

What I was trying to say is that code-first developers are apparently more likely to want their queries to adhere to C# evaluation semantic and the database-first developers the other way round, and EF appears to favour the former.

I'm not sure what this is based on, and contradicts my intuition. Again, if a developer wants to write SQL, they can do that; if they've chosen to use LINQ to write their queries, I don't think they'd want their LINQ to behave differently because they're more "database-first".

[...] although it would be good to be able to call non-static function too [...]

I'm really not sure what you're talking about here - EF doesn't care if the function is static or not. Try to put together a minimal, runnable repro which shows the problem, and if you succeed, please open a new issue with that repro.

The other one was the null propagation operator [...] That definitely gives CS8072 "An expression tree lambda may not contain a null propagating operator"

Here you're conflating another, unrelated problem - the C# compiler itself has limitation with regards to syntax that's allows in expression trees. Although it's definitely something I believe should be improved (and hope it will happen), this has nothing to do with EF, and also has nothing to do with nullability. For example, the null propagation operator (?.) isn't supported because it was introduced to C# late, but neither are switch expressions - for exactly the same reason.

In any case, whether the null propagation operator isn't supported, that has nothing to do with what kind of null semantics EF uses when translating constructs which are supported. You can work around the lack of the null propagation operator by using a conditional expression (i.e. instead of x?.y, you can do x == null ? y : x).

@roji roji changed the title Query: Concating null with not null string resulting in null Query: Concatenating null with not null string resulting in null Apr 29, 2024
@slepmog
Copy link

slepmog commented Apr 29, 2024

@roji I'm happy to open an issue, however before I do that, can you please validate that it's not caused by another misunderstanding of mine.
Repro: https://github.com/slepmog/EFInstanceMethodTest/blob/master/EFInstanceMethodTest/Program.cs
Result:

Calling a static method from final projection. Result:
Value from table!

Calling an instance method from final projection. Result:
System.InvalidOperationException: The client projection contains a reference to a constant expression of 'EFInstanceMethodTest.Program+Executor' through the instance method 'ImAnInstanceMethod'. This could potentially cause a memory leak; consider making the method static so that it does not capture constant in the instance. See https://go.microsoft.com/fwlink/?linkid=2103067 for more information and examples.
   at Microsoft.EntityFrameworkCore.Query.ShapedQueryCompilingExpressionVisitor.ConstantVerifyingExpressionVisitor.VisitMethodCall(MethodCallExpression methodCallExpression)
   at ...

@roji
Copy link
Member

roji commented Apr 30, 2024

@slepmog in your code, the instance on which ImAnInstanceMethod is invoked is captured; as the error message says, this could cause a memory leak since EF caches the query shaper, which contains the client evaluation bits and therefore that instance.

This isn't a problem with instance methods per se, but with capturing the this of the instance method (e.g. try invoking the method on a parameter instead). I'll look into this a bit, but at the moment that's the expected behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-query closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. good first issue This issue should be relatively straightforward to fix. punted-for-2.0 punted-for-3.0 type-bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.