Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up insert operation (BulkSaveChanges) #15059

Closed
MaklaCof opened this issue Mar 18, 2019 · 17 comments
Closed

Speed up insert operation (BulkSaveChanges) #15059

MaklaCof opened this issue Mar 18, 2019 · 17 comments

Comments

@MaklaCof
Copy link

Currently inserting thousands (1000, 10000, 100000) of records is really slow. I think there is room for improvement, so I am creating feature request for bulk SaveChanges.

Inspiration came form (EntityFramework Extensions).

In my case, the factor between SaveChanges and BulkSaveChanges is even greater then:
chrome_2019-03-18_12-29-32

I think this is somehow related with #9270 and #795, but not duplicated. (I might be mistaken.)

Use case:

  • import entire table from another source (excel, web ...)
  • custom reports - query some data from multiple tables, process results and write it to (temp) table
@sdanyliv
Copy link

You can do that with linq2db extension right now if you can't wait EF Core implementation.

With this extension you will be able to insert any query in any table, even in temporary and then reuse it in linq query. And for reporting you will quickly find that you need more SQL extensions which are provided by linq2db out of the box.

@ajcvickers
Copy link
Member

@MaklaCof When you say, "In my case, the factor between SaveChanges and BulkSaveChanges is even greater then," can you provide some details? What have you tested and what were the results?

@ajcvickers
Copy link
Member

@MaklaCof Also, can you post the version of EF Core that you are using, and the database provider that you are using?

@MaklaCof
Copy link
Author

MaklaCof commented Mar 18, 2019

Of course. I will need a day or two. I am on the tight schedule until Wednesday.

Out of my head, version was 2.1 (currently I am on 2.2), and provider was SQL Server express 2016 (currently I use SQL Server Developer 2017).

@MaklaCof
Copy link
Author

Hi,

I made a demo in which ratio is not so much, but the case is simple and repeatable.
I think that these numbers already indicate that there is room for improvement in performance.

The difference I mentioned occurs when the table has 20-50 columns, and it has several relational data and not just one as in a demo.

RC: Records count
SC: Save changes
BSC: Bulk save changes
F: Factor betwen SaveChanges / BulkSaveChanges
DSC: Save changes with one detail table
DBSC: Bulk save changes with one detail table
DF: Factor betwen SaveChanges / BulkSaveChanges with one detail table

RC           SC     BSC      F     DSC    DBSC    DF
      10    0.1s    0.1s    0.5    0.1s   0.1s   0.4
     100    0.0s    0.1s    0.3    1.0s   0.4s   2.9
    1000    0.5s    0.2s    3.2   10.8s   2.6s   4.2
   10000    5.2s    1.3s    4.1  106.8s  22.7s   4.7
  100000   51.5s   11.5s    4.5      /      /      /
 1000000  514.5s  129.2s    4.0      /      /      /

As we can see, with 10000 rows, BulkSaveChanges is 4-5 times faster.

Database is SQL Server Developer 2017 in local network on HP server with two 1Gb network card.

Here is source code.

@roji
Copy link
Member

roji commented Mar 25, 2019

This seems to be comparing EF Core's regular SaveChanges() to a 3rd-party product, which, apart from providing bulk CUD operations (#795), may be implementing inserts in another way (e.g. SqlBulkCopy?). If that's the case then note similarity with npgsql/efcore.pg#113, which would be about implementing SaveChanges() with PostgreSQL's optimized COPY protocol.

@ajcvickers
Copy link
Member

@MaklaCof Thanks for the additional information. A few comments:

  • These numbers are a lot closer than the numbers originally reported, which makes sense because those numbers look like they are for EF6 which does not do batch updates. EF Core on the other hand does do batching anyway.
  • It's not clear that the extensions are doing exactly the same thing as EF Core, especially with regard to things like propagating identity values back to the entity instances. It's also possible that the extensions are functionally equivalent with the default options. The point is that without further investigation it's hard to know if the comparison is really apples to apples.
  • Regardless, this doesn't mean that there isn't room for improvement in the EF Core batching implementation. Therefore, putting this on the backlog to look at perf improvements here.

@MaklaCof
Copy link
Author

Hi,
maybe just one small thing. There are scenarios where you don't need to track inserted entities. Not even get back Id (identity). Maybe some options to SaveChanges to skip tracking this entities, and not receiving back Id would be legitimate use for imports ...

I will update thread when I get back to the real life scenario, where I experience larger factor.

@Mikilll94
Copy link

Are there any plans for implementing this? Inserting large numbers of rows in EFCore is painful.

@ajcvickers
Copy link
Member

@Mikilll94 Can you share the numbers you are seeing using the current EF Core batching implementation?

@AndriySvyryd
Copy link
Member

AndriySvyryd commented Aug 23, 2019

Implementing #9118 and #10443 might be enough to make the difference much smaller

@roji
Copy link
Member

roji commented Feb 20, 2022

Note #27372 which will substantially improve our standard insert perf on SQL Server in 7.0.

We also have #27333 for adding a dedicated API for bulk importing (e.g. SqlBulkCopy); we can probably close this as a dup of that.

One last idea... We also have #9118 for optionally not tracking after SaveChanges; which would unlock using SqlBulkCopy for standard SaveChanges. At that point, the advantage of a dedicated bulk import API (#27333) becomes bypassing the change tracking machinery, which may or may not be worth it.

@borisdj
Copy link

borisdj commented Aug 15, 2022

EFCore.BulkExtensions as of recently has BulkSaveChanges as well, among other CRUD Bulk ops.

@roji
Copy link
Member

roji commented Aug 15, 2022

@borisdj yeah, we're aware of EFCore.BulkExtensions. For bulk update/delete, EF Core 7.0 is introducing ExecuteUpdate/ExecuteDelete (see #795). Bulk import from the client is tracked by #27333.

@roji roji closed this as not planned Won't fix, can't repro, duplicate, stale Aug 15, 2022
@borisdj
Copy link

borisdj commented Aug 15, 2022

I know, have been following Execute issue.
This one was about BulkSaveChanges, at least by the title, which is a different use case.

@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
@allesandrogallo
Copy link

@borisdj if you promote your library, be transparent and put a disclaimer in all your post saying it's not free + link to your pricing https://www.codis.tech/efcorebulk/. I lost my time by trying your solution... I had to undo my code because our company doesn't want to pay $1000 for your license!!!

@borisdj
Copy link

borisdj commented Feb 27, 2023

License was recently updated from fully free to semi free and the post is much older (plus the issue is closed)
Also it is clearly stated (condition both for free and paid usage) in the project in LICENSE.txt as well as in the README.
And I see you have written the same comment on another issue #4526 (comment), no need to spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants