-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OR clause allowing NULL is generated when joining on NullableProperty.Value #27071
Comments
related issue: #18808 This is by design. EF follows the join semantics of linq to objects here - if join keys are anonymous objects, the nulls actually get matched in the join results. Linq to objects code: var entity1s = new List<Entity1>
{
new Entity1 { Id = 1, Name = "1" },
new Entity1 { Id = 2, Name = "2" },
new Entity1 { Id = 3, Name = null }
};
var entity2s = new List<Entity2>
{
new Entity2 { Id = 1, Name = "1" },
new Entity2 { Id = 2, Name = null },
new Entity2 { Id = 3, Name = "3" }
};
var l2o_null_match = from e1 in entity1s
join e2 in entity2s on new { x = e1.Name } equals new { x = e2.Name }
select new { e1, e2 };
// matched: 1 and null
var result_null_match = l2o_null_match.ToList();
var l2o_null_not_match = from e1 in entity1s
join e2 in entity2s on e1.Name equals e2.Name
select new { e1, e2 };
// matched: 1
var result_null_not_match = l2o_null_not_match.ToList(); |
wrt second example ( wrt third example: EF has two "modes" of expanding null semantics - optimized and full. Optimized doesn't distinguish between null and false, whereas full expansion completely gets rid of nulls from the result (we convert everything to 2-value logic). Full expansion is used in the projection or when the comparison is negated. Simplified expansion can be used in regular predicates like: entities.Where(e => e.SomeNullValue == e.SomeOtherNullValue)
entities.Where(e => e.SomeNullValue == e.SomeNonNullValue) That second case is what you have in your example. Since one side can be null but the other never can, we can skip the null checks completely. If the left side value is null, the result of comparison is null, which is good enough for us. We don't need false here, since null will also filter out the results. For join predicate comparison we are using the simplified expansion that's why you don't see the null check. if the comparison was negated: entities.Where(e => e.SomeNullValue != e.SomeNonNullValue) we need to do full expansion and it would look something like this: WHERE e.SomeNullValue <> e.SomeNonNullValue OR e.SomeNullValue IS NULL. If both sides of the comparison are nullable the expansion is more complicated. |
Hi Maurycy, thank you for looking at this issue. Based on your comments please allow me to add addtional details. Let us agree on the correct and desired behavior, which you stated in your reply:
The small queries I have included below do not indicate that the desired behavior is being applied consistently. Please explain how this behavior is being applied in the example queries I have provided. Your reference to #18808 is not clear to me as we are not comparing object references here.
The issue I am raising in this ticket is that I am joining on
By using
Both sides are nullable in the query below but no null check is generated:
This query comparing
However this query that is expected to compare values will allow nulls:
This is the same join as the query above however in this query no null check is generated:
Tests: I find no consistent pattern of how null checks are generated.
|
Related: #27072
|
This is not how EF Core is currently designed. One important thing to note in this context, is that there's no analogous gesture for reference types - there's no way to explicitly tell EF Core that the field cannot be null. The new bang (!) operator does that logically, but is pure compiler syntax which does not exist in the expression tree which EF Core receives. Note also that you can disable null compensation entirely by using relational null semantics; when using that, EF Core doesn't add any additional null checks, and your C# code gets translated to SQL as-is (nullability-wise). However, this is not a mode we generally recommend. |
1.) My choice of words "the field cannot be null" is arguably poor. Most generally I want to know if there is there a way to tell EF to not generate 2.) If the answer to 1.) is no, is it possible in a future release (edit) to not generate the 3.) Can you please explain the existing logic and intent as demonstrated by the examples I have provided? |
@sam-wheat when it generates a SQL equality ( Are you seeing a case where both sides of the SQL expression aren't nullable, but EF Core is still adding the checks? If so, can you provide the clear, runnable repro for that (including the model)? Otherwise, if you don't want the extra checks even when the sides are nullable, then as I wrote below you can opt into relational null semantics.
We could do an optimization like that, but would that really be significantly better? That would remove the extra SQL check from the JOIN clause, only to add one in a new WHERE clause...
See my explanation above in 1 - by default EF Core recreates C# null semantics via the SQL it generates, so that users' LINQ queries return the same results from the database that they would if they were executed in-memory. |
@sam-wheat one additional comment... Are you looking to remove the extra nullability checks because of performance? Because generally speaking they should not have a significant impact there (though that can always vary across databases, query shapes...). If so, have you measured performance with and without those checks, and confirmed there's an issue? |
Query 2 runs in 2:12 and returns 7.6m rows. I manually commented out the OR clause.
I beleive it would be better in terms of control given to the developer. The following are very different queries. AFAIK it is not possible to achieve the latter when using a complex join:
Runnable...
|
The question if these two SQLs are (significantly) different in terms of performance - is this something you've verified? |
One more thing that could be helpful here, is the query plan for the various SQLs being discussed - could you post these? An easy way of getting a query plan is to execute |
Understood. The difference of the queries is below.
|
Triage discussion: @maumar to take a look if there's a possible workaround for avoiding the null compensation in the above join condition. |
best way we currently have to avoid null compensation is to use relational null semantics like @roji pointed out, and manually add null compensation terms for the comparisons we actually want to behave like c#. e.g.: ///ctx is set up with UseRelationalNulls option enabled
ctx.Customers.Where(c => c.Relational1 == c.Relational2 && (c.CSharp1 == c.CSharp2 || (c.CSharp1 == null && c.CSharp2 == null))) In case of join key comparison, relational null semantics are used as long as keys are not wrapped in anonymous objects (but this doesn't obviously work for composite key comparisons) wrt consistency in @sam-wheat 's examples above, EF has several "dimensions" which decide if/how comparisons are translated:
1.1 by default we mimic c# semantics so we will add extra terms so that null == null is not filtered out.
2.1 simplified expansion is the most common case, happens inside predicates where we don't care between false and null result. In that case when only one side is nullable and the other one is not we can completely skip the extra terms, if the nullable side is null, the result of comparison is null, but that's just as good as false for us
3.1 3.2. 3.3 for full expansion the translation is a bit tricky, you can see them all (along with truth tables) starting here, in case you are interested: https://github.com/dotnet/efcore/blob/main/src/EFCore.Relational/Query/SqlNullabilityProcessor.cs#L1804 now looping back to the cases you provided earlier: // Both sides nullable
var query1 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Name equals e2.Name select new { e1, e2 };
var sql1 = query1.ToQueryString(); // INNER JOIN [Entity2s] AS [e0] ON [e].[Name] = [e0].[Name]
//maumar:
//1.3 join key comparison -> using relational semantics -> no need for the extra terms // Both sides nullable
var query21 = from e1 in db.Entity1s join e2 in db.Entity2s on new { Name = e1.Name } equals new { Name = e2.Name} select new { e1, e2 };
var sql21 = query21.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON ([e].[Name] = [e0].[Name]) OR ([e].[Name] IS NULL AND [e0].[Name] IS NULL)
//maumar:
//1.4 join but wrapped in anonymous type ->
//2.1 need to perform expansion, but its done in predicate so can do simple one
//3.1 both terms nullable so we do a == b || (a == null && b == null) // One side nullable
var query22 = from e1 in db.Entity1s join e2 in db.Entity2s on new { Name = e1.Name } equals new { Name = e2.Address } select new { e1, e2 };
var sql22 = query22.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Name] = [e0].[Address]
//maumar:
//1.4 join but wrapped in anonymous type ->
//2.1 need to perform expansion, but its done in predicate so can do simple one
//3.2 only one side is nullable, no need for the extra term // One side nullable
var query23 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Name equals e2.Address select new { e1, e2 };
var sql23 = query23.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Name] = [e0].[Address]
//maumar:
//1.3 join key comparison -> using relational semantics -> no need for the extra terms // Both sides not nullable
var query2 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Address equals e2.Address select new { e1, e2 };
var sql2 = query2.ToQueryString(); // INNER JOIN [Entity2s] AS[e0] ON[e].[Address] = [e0].[Address]
//maumar:
//1.3 join key comparison -> using relational semantics -> no need for the extra terms // Both sides not nullable (compare to query21)
var query24 = from e1 in db.Entity1s join e2 in db.Entity2s on new { ID = e1.Address } equals new { ID = e2.Address } select new { e1, e2 };
var sql24 = query24.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Address] = [e0].[Address]
//maumar:
//1.4 join but wrapped in anonymous type ->
//2.1 need to perform expansion, but its done in predicate so can do simple one
//both terms are non-nullable so no need to do anything here // Both sides nullable
var query3 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Count equals e2.Count select new { e1, e2 };
var sql3 = query3.ToQueryString(); // INNER JOIN [Entity2s] AS [e0] ON [e].[Count] = [e0].[Count]
//maumar:
//same as query1 // Both sides nullable
var query32 = from e1 in db.Entity1s join e2 in db.Entity2s on new { ID = e1.Count } equals new { ID = e2.Count } select new { e1, e2 };
var sql32 = query32.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON ([e].[Count] = [e0].[Count]) OR ([e].[Count] IS NULL AND [e0].[Count] IS NULL)
//maumar:
//same as query21 // Both sides not nullable (compare to query24)
var query33 = from e1 in db.Entity1s join e2 in db.Entity2s on new { ID = e1.Count.Value } equals new { ID = e2.Count.Value } select new { e1, e2 };
var sql33 = query32.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON ([e].[Count] = [e0].[Count]) OR ([e].[Count] IS NULL AND [e0].[Count] IS NULL)
//maumar:
//per earlier comment, EF ignores the .Value and still treats Count as nullable property, so this is the same as query21 and query32 // Both sides not nullable
var query34 = from e1 in db.Entity1s join e2 in db.Entity2s on e1.Count.Value equals e2.Count.Value select new { e1, e2 };
var sql34 = query34.ToQueryString(); // INNER JOIN [SAP].[Entity2s] AS [e0] ON [e].[Count] = [e0].[Count]
//maumar:
//same as query1 - we ignore .Value, so both sides are nullable, but its inside join key comparion and not wrapped in anonymous object, so we can do relational semantics |
closing this in favor of doc issue |
Linq below joins h.ModuleId (long?) to cmc.ModuleId (string). Both IDs can be null.
Generated SQL:
Linq below joins h.ModuleId.Value (long) to cmc.ModuleId (string). Linq does not allow nulls since h.ModuleId.Value cannot be null. However generated SQL will still select nulls: "OR ([g].[ModuleId] IS NULL AND [c].[ModuleId] IS NULL)"
Generated SQL:
Noteably, no null handling is generated for the following Linq. Because cmc.M_Number is not nullable? This seems to be inconsistent with the above. If consistent, EF would handle NULL condition where nullable fields are compared:
Generated SQL:
Classes (all strings are nullable):
Versions:
dotnet 5.0
EF 5.0.13
The text was updated successfully, but these errors were encountered: