You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BasketEntries is a single foreign-key relation to Image.
This gives me a many-to-many relationship between image and tag, via ImageTag. Since BasketEntry and Image are one-to-one, starting with db.BasketEntries acts as a filter (there is only one record in BasketEntries).
The performance characteristics are the issue - I noticed that when I get to 500k images and 1.2m ImageTag records, the above code takes more than 3 seconds to pull back the single record from BasketEntries, and hence a single image (and around 10 related imageTags and tags).
Looking at the SQL generated, I can see that this creates this SQL:
SELECT "b"."ImageId", "b"."DateAdded", "i"."ImageId", "i"."BasketId", "i"."CameraId", "i"."Caption", "i"."DateIndexed", "i"."DateTaken", "i"."Description", "i"."FileCreationDate", "i"."FileLastModDate", "i"."FileName", "i"."FileSize", "i"."FolderId", "f"."FolderId", "f"."ParentId", "f"."Path", "t0"."ImageId", "t0"."TagId", "t0"."TagId0", "t0"."Keyword", "t0"."Type"
FROM "BasketEntries" AS "b"
INNER JOIN "Images" AS "i" ON "b"."ImageId" = "i"."ImageId"
INNER JOIN "Folders" AS "f" ON "i"."FolderId" = "f"."FolderId"
LEFT JOIN (
SELECT "i0"."ImageId", "i0"."TagId", "t"."TagId" AS "TagId0", "t"."Keyword", "t"."Type"
FROM "ImageTags" AS "i0"
INNER JOIN "Tags" AS "t" ON "i0"."TagId" = "t"."TagId"
) AS "t0" ON "i"."ImageId" = "t0"."ImageId"
ORDER BY "b"."ImageId", "i"."ImageId", "f"."FolderId", "t0"."ImageId", "t0"."TagId", "t0"."TagId0"
I think the slow performance is caused by the fact that there's a sub-select being executed within the left-join, but because there's no where-clause in that sub-select, it'll pull back all 1.2 million rows from the ImageTags table, before the left-join filters the ones that match via imageID. I think it's that which is making the query so slow.
Next I changed it to the following, to remove the ImageTags eager-loading from the first query, and then iterating on the resultant entities to enrich them via ImageTags/Tags using explicit loading:
Then the whole thing takes around 200ms - so 95% faster. This is because the SQL is now two very fast queries:
SELECT "b"."ImageId", "b"."DateAdded", "i"."ImageId", "i"."BasketId", "i"."CameraId", "i"."Caption", "i"."DateIndexed", "i"."DateTaken", "i"."Description", "i"."FileCreationDate", "i"."FileLastModDate", "i"."FileName", "i"."FileSize", "i"."FolderId", "f"."FolderId", "f"."ParentId", "f"."Path"
FROM "BasketEntries" AS "b"
INNER JOIN "Images" AS "i" ON "b"."ImageId" = "i"."ImageId"
INNER JOIN "Folders" AS "f" ON "i"."FolderId" = "f"."FolderId"
SELECT "i"."ImageId", "i"."TagId", "t"."TagId", "t"."Keyword", "t"."Type"
FROM "ImageTags" AS "i"
INNER JOIN "Tags" AS "t" ON "i"."TagId" = "t"."TagId"
WHERE "i"."ImageId" = u/__p_0
This is super-fast, as the inner joins reduce the resultset quickly. Lastly, I tried this - just pulling in the imageTags entities, but not their child Tag entities:
just to see what SQL would be generated. This is where it gets interesting:
SELECT "b"."ImageId", "b"."DateAdded", "i"."ImageId", "i"."BasketId", "i"."CameraId", "i"."Caption", "i"."DateIndexed", "i"."DateTaken", "i"."Description", "i"."FileCreationDate", "i"."FileLastModDate", "i"."FileName", "i"."FileSize", "i"."FolderId", "f"."FolderId", "f"."ParentId", "f"."Path", "i0"."ImageId", "i0"."TagId"
FROM "BasketEntries" AS "b"
INNER JOIN "Images" AS "i" ON "b"."ImageId" = "i"."ImageId"
INNER JOIN "Folders" AS "f" ON "i"."FolderId" = "f"."FolderId"
LEFT JOIN "ImageTags" AS "i0" ON "i"."ImageId" = "i0"."ImageId"
ORDER BY "b"."ImageId", "i"."ImageId", "f"."FolderId", "i0"."ImageId", "i0"."TagId"
It's still doing the left join - but crucially not on a subselect, so the resultset from that left join will be filtered efficiently. This query takes around 300ms - so an order of magnitude faster than when the left join is done on the subselect.
I think what needs to happen is that instead of creating a sub-select and left-joining, having the two ThenInclude clauses should just keep adding left-joins to the query, but without the sub-select pulling back the entire table first.
I've tried finding examples of how to eager-load many-to-many related data, to see if there's some other convention, pattern or syntax I should be using to optimise this query, but there don't appear to be any good examples. Most of the ones out there are with datasets of a few hundred test records, so nobody probably notices the massive slow-down that two subsequent ThenInclude methods cause When they're chained together.
Further technical details
EF Core version: 3.1.0
Database provider: SQLite
Target framework: .NET Core 3.1
Operating system: OSX (and same behaviour seen on Ubuntu)
IDE: Visual Studio for Mac v8.4 build 2653
The text was updated successfully, but these errors were encountered:
I have the following code:
The data model is:
Image
- contains a list of ImageTagsImageTag
- foreign key join between Image and TagTag
- has a primary key and a keywordBasketEntries
is a single foreign-key relation to Image.This gives me a many-to-many relationship between image and tag, via ImageTag. Since
BasketEntry
and Image are one-to-one, starting withdb.BasketEntries
acts as a filter (there is only one record inBasketEntries
).The performance characteristics are the issue - I noticed that when I get to 500k images and 1.2m
ImageTag
records, the above code takes more than 3 seconds to pull back the single record from BasketEntries, and hence a single image (and around 10 related imageTags and tags).Looking at the SQL generated, I can see that this creates this SQL:
I think the slow performance is caused by the fact that there's a sub-select being executed within the left-join, but because there's no where-clause in that sub-select, it'll pull back all 1.2 million rows from the ImageTags table, before the left-join filters the ones that match via
imageID
. I think it's that which is making the query so slow.Next I changed it to the following, to remove the ImageTags eager-loading from the first query, and then iterating on the resultant entities to enrich them via
ImageTags
/Tags
using explicit loading:Then the whole thing takes around 200ms - so 95% faster. This is because the SQL is now two very fast queries:
This is super-fast, as the inner joins reduce the resultset quickly. Lastly, I tried this - just pulling in the imageTags entities, but not their child Tag entities:
just to see what SQL would be generated. This is where it gets interesting:
It's still doing the left join - but crucially not on a subselect, so the resultset from that left join will be filtered efficiently. This query takes around 300ms - so an order of magnitude faster than when the left join is done on the subselect.
I think what needs to happen is that instead of creating a sub-select and left-joining, having the two
ThenInclude
clauses should just keep adding left-joins to the query, but without the sub-select pulling back the entire table first.I've tried finding examples of how to eager-load many-to-many related data, to see if there's some other convention, pattern or syntax I should be using to optimise this query, but there don't appear to be any good examples. Most of the ones out there are with datasets of a few hundred test records, so nobody probably notices the massive slow-down that two subsequent
ThenInclude
methods cause When they're chained together.Further technical details
EF Core version: 3.1.0
Database provider: SQLite
Target framework: .NET Core 3.1
Operating system: OSX (and same behaviour seen on Ubuntu)
IDE: Visual Studio for Mac v8.4 build 2653
The text was updated successfully, but these errors were encountered: