-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate table joins instead of subquery joins #17622
Comments
Is there any way for a provider writer to override this? I'm developing an EF 2.2 provider for an older database, and it doesn't support subqueries in a join clause at all. So currently the generated SQL is invalid. In my case, I'm just executing the git BuiltInDataTypesBase test: var entity = context
.Set<StringKeyDataType>()
.Include(e => e.Dependents)
.Where(e => e.Id == "Gumball!")
.ToList().Single(); That generates this SQL statement: SELECT "e.Dependents"."Id", "e.Dependents"."StringKeyDataTypeId"
FROM "StringForeignKeyDataType" "e.Dependents"
INNER JOIN(
SELECT "e0"."Id"
FROM "StringKeyDataType" "e0"
WHERE "e0"."Id"=N'Gumball!'
) AS "t" ON "e.Dependents"."StringKeyDataTypeId"="t"."Id"
ORDER BY "t"."Id" However it is invalid for the particular DB vendor, and it must instead be: SELECT "e.Dependents"."Id", "e.Dependents"."StringKeyDataTypeId"
FROM "StringForeignKeyDataType" "e.Dependents"
INNER JOIN ("StringKeyDataType" "t")
ON "e.Dependents"."StringKeyDataTypeId"= "t"."Id"
WHERE "t"."Id"=N'Gumball!'
ORDER BY "t"."Id" I've been digging into the code, and it's hard to find much information on how to change the query generation engine at that level. Should I open a separate question for this? |
@Gwindalmir your LINQ query doesn't produce a subquery for me, either on 2.2 and on 3.1: Repro for 2.2class Program
{
static void Main(string[] args)
{
using var ctx = new BlogContext();
ctx.Database.EnsureDeleted();
ctx.Database.EnsureCreated();
var results = ctx
.Set<StringKeyDataType>()
.Include(e => e.Dependents)
.Where(e => e.Id == "Gumball!")
.ToList();
}
}
public class BlogContext : DbContext
{
public DbSet<StringKeyDataType> StringKeyDataTypes { get; set; }
#pragma warning disable 618
public static readonly LoggerFactory ContextLoggerFactory
= new LoggerFactory(new[] { new ConsoleLoggerProvider((_, __) => true, true) });
#pragma warning restore 618
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
=> optionsBuilder
.UseSqlServer(@"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0")
.EnableSensitiveDataLogging()
.UseLoggerFactory(ContextLoggerFactory);
}
public class StringKeyDataType
{
public string Id { get; set; }
public List<Dependent> Dependents { get; set; }
}
public class Dependent
{
public string Id { get; set; }
} Repro for 3.1class Program
{
static void Main(string[] args)
{
using var ctx = new BlogContext();
ctx.Database.EnsureDeleted();
ctx.Database.EnsureCreated();
var results = ctx
.Set<StringKeyDataType>()
.Include(e => e.Dependents)
.Where(e => e.Id == "Gumball!")
.ToList();
}
}
public class BlogContext : DbContext
{
public DbSet<StringKeyDataType> StringKeyDataTypes { get; set; }
static ILoggerFactory ContextLoggerFactory
=> LoggerFactory.Create(b => b.AddConsole().AddFilter("", LogLevel.Information));
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
=> optionsBuilder
.UseSqlServer(@"Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0")
.EnableSensitiveDataLogging()
.UseLoggerFactory(ContextLoggerFactory);
}
public class StringKeyDataType
{
public string Id { get; set; }
public List<Dependent> Dependents { get; set; }
}
public class Dependent
{
public string Id { get; set; }
} Out of curiosity, which database are you trying to develop for? This issue is about removing a subquery join in a very particular case, but there are quite a few others where doing so isn't possible. Subquery joins are a standard SQL feature, and a database which doesn't support them is likely to have many issues as an EF Core relational provider... Finally, note that EF Core 2.2 is no longer supported - 2.1 and 3.1 are the current long-term support versions. Any new development should probably happen against 3.1. |
Just to answer how to do it, add a custom implementation |
Thanks, at the time I started, 3.1 wasn't released, and supporting .NET Framework is a requirement, so I went with 2.2. I'm not sure why you don't see it, as the SQLite driver included in this source constructs the same query. As for the DB in question, I'm not sure I should reference it, as I work for the company that makes it. I will say it supports primarily SQL-92 standard, with a few SQL-99 additions. |
@Gwindalmir the best way would be to open a new issue and include a short, runnable code sample with SQLite that shows it happening. If you're still in development, I'd strongly recommend considering switching to 3.1 - it's the LTS version for years to come, whereas 2.2 is already out of support. |
@Gwindalmir I don't see the subquery with a single include or ThenInclude, using SQLite. It took two ThenIncludes for me to generate the subquery (see the example in issue #19418 linked above). That was with .Net Core 3.1. |
I'm going to migrate to 3.1, and test again. If the issue is resolved there, then that's great. If not, I'll open a new issue here. |
Just as a follow-up, in case anyone else has the same problem: |
Good to hear, thanks @Gwindalmir. |
Any updates on an ETA for the original issue in this thread to be resolved? :) |
@Webreaper no update at the moment - this issue is "consider-for-next-release", which means it's a stretch goal for 5.0. While it's considered important, we don't think it's as important as the other issues have have triage into the 5.0 milestone (but it may still get done). |
Totally understand. Thanks for the update! Looking forward to .Net 5! |
Errm, looking forward to this in .Net 6? ;) |
This is a 6-monthly reminder - my queries are taking 950ms when they could be taking under 200ms due to having to workaround this bug. Any chance of a fix in .Net 6 previews 6-10? |
@Webreaper this is still in the plan for EF Core 6.0, I do hope we'll manage to get it in. |
Great! Thanks! |
…d out Part of #17622 Avoids pushdown for scenario when we cause pushdown due to order by but then we erase order by without skip take giving us simple and lift-able query again
Note from triage: putting this in 7.0 to consider doing it in some cases. It is unlikely we will implement this in all cases due to complexity. |
…d out Part of #17622 Avoids pushdown for scenario when we cause pushdown due to order by but then we erase order by without skip take giving us simple and lift-able query again
The basic case when the subquery doesn't have any additional operations (including joins), are converted to table joins in 7.0. This is the case where it is mathematically correct to transform. Leaving up to @roji to determine if there are additional cases which are equivalent in all cases. |
For reference, @smitpatel's PR for the above is #26476. At some point I'll take a look and think if I can think of other cases. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Could you please provide more details on how to do this? I'm working on a project with a complex DB structure. There are queries that are joining two dozen tables using Include().ThenIncude(). It also uses global query filters. So one of these things is causing each table in a join to be replaced with a subquery that does a "select *". Even though this is a small DB (none of the tables have more than 1000 records, each query is taking extremely long. Please help. |
Just to make it clear, the instructions you quoted above are about modifying EF itself to implement this, not something to be done in your own application. Unless you intend to contribute this to EF (and it isn't a trivial change to make), this would be relevant for you. Regardless, can you please put together a minimal database and code sample which shows the significant perf difference with and without query filters? This issue is still lacking a minimal repro which clearly shows the subquery join as being a significant perf issue. Also, in many cases where users encounter such a perf difference, the actual cause lies elsewhere; so it would be good to be sure what's going on. |
Oh, I thought it was something I could do in my project. |
Hi @roji - this is some initial thoughts on this issue. Hope to follow up later with some more thorough analysis. Was brought here by a coworker who was stumped by slow queries in both EFCore and EF6, and blamed this issue.
and
and @roji If this issue is for cases where there are equivalent mathematical transforms, then I think I have a regression test to add to the test suite that I discovered today: For example, if I have a schema like: 1 Transaction ... Has Many Account Allocations return Set<AccountAllocation>()
.Where(a => !a.Deleted)
.Where(a => a.TransactionType.AffectsPosition)
.Where(a => a.Account.IsActive)
.Where(a => a.Account.Group.Id == accountGroupId)
.Where(a => a.Transaction.TransactDate > cutoffDate)
.Where(a => a.Transaction.Currency.Id == currencyCode)
.Where(a => !a.Transaction.Warehouse.IsTransferHub)
.Take(1) // new command inbetween - this forces a TOP (1) which alleviates some table scan pressure on the query
.Any(); CLICK ME FOR SQLexec sp_executesql N'SELECT
CASE
WHEN (
EXISTS (
SELECT 1 AS [C1]
FROM
(
SELECT TOP (1) [Extent3].[AccountGroupId] AS [AccountGroupId], [Extent4].[TransactDate] AS [TransactDate], [Extent4].[WarehouseId] AS [WarehouseId], [Extent4].[CurrencyId] AS [CurrencyId]
FROM [dbo].[AccountAllocations] AS [Extent1]
INNER JOIN [dbo].[TransactionTypes] AS [Extent2] ON [Extent1].[TransactionTypeId] = [Extent2].[TransactionTypeId]
INNER JOIN [dbo].[Accounts] AS [Extent3] ON [Extent1].[AccountId] = [Extent3].[AccountId]
INNER JOIN [dbo].[Transactions] AS [Extent4] ON [Extent1].[TransactionID] = [Extent4].[TransactionID]
WHERE ([Extent1].[Deleted] <> 1) AND ([Extent2].[AffectsPosition] = 1) AND ([Extent3].[IsActive] = 1)
) AS [Filter1]
LEFT OUTER JOIN [dbo].[Warehouses] AS [Extent5] ON [Filter1].[WarehouseId] = [Extent5].[WarehouseId]
WHERE ([Filter1].[AccountGroupId] = @p__linq__0) AND ([Filter1].[TransactDate] > @p__linq__1) AND (([Filter1].[CurrencyId] = @p__linq__2) OR (1 = 0)) AND ([Extent5].[IsTransferHub] <> 1)
)
)
THEN cast(1 as bit) ELSE cast(0 as bit) END AS [C1]
FROM ( SELECT 1 AS X ) AS [SingleRowTable1]',N'@p__linq__0 int,@p__linq__1 datetime2(7),@p__linq__2 nvarchar(4000)',@p__linq__0=88,@p__linq__1='2021-11-03 00:00:00',@p__linq__2=N'USD'
GO then, in EF6, I get a query where the If you see my point, there seems to be broader pattern of how transitive navigation paths are handled. It looks like the previous merges by Smit linked to this issue deal with other scenarios, like OrderBy (query) and HasQueryFilter (model builder configuration) and Includes. If the broader pattern is transitive navigation paths handling, then I see why @msmolka added a comment here as well. It's also not particularly clear to me why placement of free variables should alter the query, but it seems like it may. There's probably a good reason I do not see why this is happening. |
@jzabroski - You are posting a SQL generated by EF6. We have no plans to make any changes to that. |
@smitpatel Should this be fixed in EFCore 7? I'm using EFCore7 with Npgsql 7.0.3 I have a reproduction repo here: https://github.com/CharlieDigital/efcore-m2m
The generated query is as follows: SELECT p.id, p.created_utc, p.name, t.product_id, t.media_id, t.id, t.created_utc, t.name
FROM products AS p
LEFT JOIN (
SELECT p0.product_id, p0.media_id, m.id, m.created_utc, m.name
FROM product_media AS p0
INNER JOIN media AS m ON p0.media_id = m.id
) AS t ON p.id = t.product_id
ORDER BY p.id, t.product_id, t.media_id There's a branch with no explicit Result is the same. |
This issue is in the Backlog milestone. This means that it is not planned for the next release (EF Core 8.0). We will re-assess the backlog following the this release and consider this item at that time. However, keep in mind that there are many other high priority features with which it will be competing for resources. Make sure to vote (👍) for this issue if it is important to you. |
@roji That's a shame; this issue has serious performance implications beyond toy datasets and it may not be obvious to users unless they are dumping their SQL that this is the cause of the performance issues. |
@CharlieDigital can you provide a clear, simple comparison and query plan showing the performance issue? That would help bump the priority of this. Note that I'm not saying we shouldn't do this - it would just help to have a clear, side-by-side comparison showing an exact difference. |
Would also like to see this fixed. Besides the performance - it makes reading the generated sqls really hard. :-/ |
@roji on simple queries, the performance is as expected. We've observed the issue when performing deep The workaround that we ended up using is to carry a discriminator field down as deep as practical and then performing a filter on the We would add a I've moved off of the project so I'll see if I can find some time to produce a sample project that demonstrates this issue, but we saw queries go from 16s down to sub-20ms on our dataset. |
That sounds like it could intersect with #29171, which is about removing the orderings from the SQL altogether. I definitely think there's a potential perf issue here (simple joins instead of subquery joins), but one of the problems is that we haven't yet seen a clear, minimal sample backed by server plans and/or benchmarked query runtime that prove the difference. I definitely intend to investigate this at some point, but such a clear repro would certainly help prioritize this. |
If you need a demonstration, I just ran an SQLite test, if anyone is interested. I simplified a structure we commonly use in production so it's not an adversarial example or anything like that, just fake data. I can simplify even more, formalize the test and pack it up if someone can provide me with requirements. We very often have a three-level hierarchy of one-to-many mappings: each Lot has many Parts, and each of those Parts has a few Inspections. If you want to fetch a lot, you could just naively use I populated my test DB with 100 lots totaling about 500 MB and that took over 3 seconds on my laptop. When I refactor the SQL to flat joins, it's done in less than 100 ms. The outputs are identical. On SSDs this is a nuisance, but on HDDs it's completely unusable. In the past, to work around this, we resorted to hacks like manually querying just the lot, then the min and max IDs of parts inside the lot, then fetching all the parts between this ID range which belong to the lot. Original SQL with SQLite's query plan:
Refactored SQL with SQLite's query plan:
|
For queries with includes, we currently generate joins with a subquery:
We could simplify this to:
We should measure the execution perf difference between the above two. Even if there is no (significant) difference, we could still decide to do this for SQL simplicity.
Originally raised in #17455.
The text was updated successfully, but these errors were encountered: