-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy/upsert data across tables in bulk (INSERT ... SELECT) #27320
Comments
Updated the above with both fetching back generated columns (RETURNING/OUTPUT), and how UPSERT would fit into this API ("bulk upsert"). UPSERT implementations across databases generally accept the following information:
|
Looks good. We'll probably need overloads for types without a ctx.Blogs1.Where(...).InsertInto<Blogs2>(b => b.Name); ctx.Blogs1.Where(...).InsertInto<Blogs2>("STETBlog", b => b.Name); |
Just to add for cmnt list EFCore.BulkExtensions now has SourceTable feature with config |
Another use case that may be covered here is having client-provided data as the source, e.g. provide an array of .NET instances and upsert them into the target table in the database. With MERGE this looks like this: MERGE TheTable t
USING @data d
ON t.Id = d.Id
WHEN MATCHED THEN
UPDATE SET [Text] = d.[Text]
WHEN NOT MATCHED THEN
INSERT (Id, [Text]) VALUES (d.[Id], d.[Text]); The API shape above may not be suitable for this, since it starts with a DbSet as the source, but we want to send client data. An interesting idea is to add an API to create a (temporary) DbSet out of a collection of client-side instances, which we can then e.g. upsert into a table in the database. Note that for SQL Server, the above requires a custom type, which is a problem (but PG should support this without one, as tables have types created for them implicitly). |
Question, would this proposed API allow using this to insert items (from any linq query) in bulk without tracking them? I have ran into a use case of such a thing when running my Discord Bot in a 2 GB RAM ubuntu VPS and it ends up OOMing fast (which I think the change tracker contributes to the problem on top of Remora.Discord's caching). |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Greetings all! How do we upvote this? This makes a lot of our app database provider agnostic. We no longer need to do ExecuteSql when this is ready. How will this work with the non-sql providers? (or no support for those?) Leaving this comment to upvote this. |
@stevenpua you upvote issues by upvoting the top-most comment above (the issue description) - posting comments is not something that helps us prioritize, and creates needless notifications and churn. |
Re no-sql providers, the API here will likely be abstract and would be implementable by any provider type, so at least in principle, any database that supports a copy-across-tables mechanism would be covered. However, this is typically something that no-sql databases don't necessarily implement. |
This issue tracks introducing an API to copy data inside the database, from one table (or several) into a destination table. Note this is different from bulk importing data from the client into the database (#27333).
This was split off from #795 (bulk update/delete).
Basic API
All SQL database support a variant of the
INSERT
statement which accepts a query instead of a list of values:The column list can be specified (
INSERT INTO x (a, b) SELECT ...
) or omitted (INSERT INTO x SELECT ...
). If it's omitted, the subquery must return the exact number of columns in the destination table, with the correct type. Since it's problematic to rely on table column ordering (e.g. can't be changed after creation), we should probably force the user to always explicitly provide the column list.Basic proposals:
Static column compatibility checking
It would be great to statically enforce that the column list matches the incoming columns from the source table, e.g. with the following signature:
This works great for a single column:
With multiple columns, this fails if the anonymous type's field names differ:
Requiring the source's and column's anonymous types to have the same field names seems... problematic (we really do want to project across different columns).
If we had value tuples in expression trees (yet again), this would work quite well:
In any case, if we don't want this to depend on value tuple syntax, we could give up static-time enforcing with the following signature:
... and the query would fail at runtime if things are mismatched.
Finally, note that if we want to, we could have a specific overload for copying between tables mapped to shared type entity types - in this case no column list is necessary:
Fancier examples
Additional notes
INSERT ... EXECUTE
for copying the results of a sproc.WITH ... AS x INSERT INTO Y SELECT * FROM x
) - this is important for recursive WITH queries. PostgreSQL even allows capturing the results of anUPDATE ... RETURNING
with WITH, and inserting them into a table.Documentation
Community implementations
The text was updated successfully, but these errors were encountered: