Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query compilation performance #430

Merged
merged 3 commits into from
Jun 26, 2019

Conversation

cjpearson
Copy link
Contributor

(This is more of an issue than a pull request, but I wanted to share some code along with it. It's certainly not ready to merge.)

We've been having performance issues with large LINQ queries and EntityFramework. Some large or complex queries take a very long time to compile. They're cached so we only encounter this penalty once but with some queries taking several minutes to compile it's a very noticeable impact. The best workaround we've found so far is splitting the queries into several smaller queries and combining the results. This brings the total compilation time down to several seconds.

I've tried running the compilation through a profiler to find out what's causing the slowness and made a few changes to improve the compilation speed. I haven't tested them much beyond the unit tests, but based on my test query the results are promising.

51,000 node query
Before changes: 2+ minutes
With changes: ~10 seconds
With changes and lower (10,000) MaxNodeCountForTransformations: ~1 second

They're in a very rough state currently, but I'd like to get your thoughts to see if any of them are worth pursuing further.

Lower MaxNodeCountForTransformations

This one definitely had the biggest impact. Even with all the other changes, the compilation was still taking over ten seconds with 90% of the time spent applying these transformations. The most expensive part of the transformations were the IsNull rules. I wasn't comfortable with changing the logic of these transformations so I stuck to lower-level changes, but if there are any algorithmic changes to these transformations it would be a big help.

Changes to VarVec to operate directly on the underlying int[]

With VarVec sizes in the thousands, a lot of the CPU time was being spent in MoveNext and Subsumes in the VarVec class. I had some success optimizing these functions by doing bit operations on the underlying ints instead of iterating through bits. In this example I've added BitVec, a modified version of the BitArray class, but the same changes could also be done using reflection.

Store the Reversed VarMap

Instead of iterating through the dictionary each time GetReverseMap is called, store a reversed version. As part of this change, VarMap implements IDictionary instead of inheriting from Dictionary.

Other possibilities:

There were a couple other places I looked at but wasn't able to find good solutions for.

Change SubTreeId hashing

Each call to insert averaged about a thousand calls to SubTreeId.Equals. This suggested there were a lot of hash collisions for the query. I'm not sure if there's better way to hash while making sure equivalent SubTreeIds always have the same hash.

NodeInfo GetHashValue

A lot of time was also spent in the GetHashValue method of NodeInfo. Perhaps there may be efficient implementation of this method.

Do any of these options seem like candidates to be added to EF? Or are there any other ways to improve the query compilation step? Thanks!

@dnfclas
Copy link

dnfclas commented Dec 26, 2017

CLA assistant check
All CLA requirements met.

@cjpearson
Copy link
Contributor Author

@ajcvickers @divega is it possible to get some feedback on this PR? Thanks!

@divega
Copy link
Contributor

divega commented May 10, 2018

@cjpearson I am sorry that we haven't been responsive on this PR or on any of the other PRs for EF6. The team has been focusing completely on EF Core releases, but we expect to be able to do a push for making some progress on EF6.3 soon.

@divega
Copy link
Contributor

divega commented May 10, 2018

The most expensive part of the transformations were the IsNull rules

@cjpearson this reminds me of 'context.Configuration.UseDatabaseNullSemantics'. Does setting this flag to true affect the compilation cost for your queries?

@cjpearson
Copy link
Contributor Author

Thanks for the update. I ran some quick tests with context.Configuration.UseDatabaseNullSemantics = true and it seems to help performance although not to the same degree. I haven't profiled it yet.

@ajcvickers
Copy link
Member

@cjpearson Some thoughts on each of the points above:

  • MaxNodeCountForTransformations: Note sure we can change this without breaking existing scenarios that work. However, making this something that can be configured by the application does not seem unreasonable.
  • The VarVec changes look promising to me. How confident are you that the functionality is not changed? Would more tests help improve that confidence?
  • The SubTreeId/NodeInfo hashing also look like possible candidates to improve. Would be interested in seeing any impovements you manage to make here.

@ajcvickers ajcvickers self-assigned this Aug 14, 2018
@cjpearson
Copy link
Contributor Author

@ajcvickers Thanks for the feedback.

Making MaxNodeCountForTransformations configurable sounds like a good change to me. It would solve our issues and seems like the least risky option.

The BitVec implementation is almost exactly the same as the .NET implementation, so I'm pretty confident of its behavior. The only major change besides exposing the bits was trying to use an ArrayPool, but I'm not sure the benefits were worth the risk there. Subsumes and MoveNext were the only parts of VarVec that changed so more tests there might help.

Unfortunately the hashing turned into a dead end for me. Any improvements I found also broke the compilation. I don't think there's any quick improvements here.

Do you have any thoughts on the VarMap changes?

@chrfin
Copy link
Contributor

chrfin commented Mar 12, 2019

I have/had the same problem. I merged your changes into the 6.2.0 branch and my query compile time went down from ~4 minutes to ~10s :-)

@chrfin
Copy link
Contributor

chrfin commented May 9, 2019

@ajcvickers Will any of this be available in EF 6.3?

@ajcvickers
Copy link
Member

@chrfin I plan to see what I can merge, so I hope so, but that's not a guarantee.

@chrfin
Copy link
Contributor

chrfin commented May 10, 2019

@ajcvickers Thanks for your reply. At least the configuration option for MaxNodeCountForTransformations would already help me A LOT. Is this planned from your side or would it make sense to create a pull-request with this property configurable?

@chrfin
Copy link
Contributor

chrfin commented Jun 14, 2019

@ajcvickers Any update on this with EF 6.3 Preview 6?

@ajcvickers
Copy link
Member

@chrfin No update for preview 6.

@ajcvickers ajcvickers merged commit 2483303 into dotnet:master Jun 26, 2019
@ajcvickers
Copy link
Member

@cjpearson Thanks for the PR and sorry for the delay in merging. This is now in.

@chrfin
Copy link
Contributor

chrfin commented Jun 27, 2019

@ajcvickers Is this available in a (signed) nightly somewhere? https://www.myget.org/gallery/aspnetwebstacknightly is 6 month old...

@ajcvickers
Copy link
Member

@bricelam Nightly builds for EF6?

@bricelam
Copy link
Contributor

https://dotnetfeed.blob.core.windows.net/aspnet-entityframework6/index.json
(may also need https://dotnetfeed.blob.core.windows.net/dotnet-core/index.json)

smitpatel added a commit that referenced this pull request Sep 9, 2019
smitpatel added a commit that referenced this pull request Sep 10, 2019
@chrfin chrfin mentioned this pull request Oct 30, 2019
ajcvickers pushed a commit that referenced this pull request Nov 1, 2019
* lower MaxNodeCountForTransformations
* use bitwise operations for VarVec.MoveNext and VarVec.Subsumes
* Store reverseMap instead of calculating it each time
@drauch
Copy link

drauch commented Jun 15, 2023

Is this released yet?

@Suchiman
Copy link

Is this released yet?

This has been released as part of the 6.4.4 release in 2020-05-12.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants