-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove unnecessary *last column* in ORDER BY when joining for collection #19828
Comments
FYI I have a repro case where I get a timeout (> 40s) retrieving a single line (amongst 10) solely due to this (removing the ORDER BY causes the query to fetch data instantly). I have a few owned entities which have owned collections. |
We encountered the same issue here, we're using Polemo MySQL provider, in our case, the auto generated ORDER BY of Include() forces MySQL to use temp table which had significant performance impact, the query usually take less than 1ms to finish without ORDER BY and it took 14s when ORDER BY applied, and we also caught several exceptions from MySQL reporting 'Can't change size of file (OS errno 28 - No space left on device)', turns out MySQL tries to allocate space for temp tables and the file exceeds 16MB which is the default settings on our MySQL server. Please do take this seriously as ORDER BY is performed differently on different SQL products, we had to split the LINQ query into three separate queries to workaround it, but I'm not sure how this will affect other queries in the system, it's basically impossible to reproduce this in the testing environment without extensive load tests as the performance impact is closely related to the complexity of the query as well as the result set data volume. Hopefully the fix comes sooner. Generated SQL:
Execution Results Comparison (first 3 with ORDER BY on, last 3 without ORDER BY)MySQL Execution Plan (note the red circle around ORDER) |
@LiangZugeng @PaulARoy thanks for posting here - note that this is only about removing the last ORDER BY; previous ones are necessary in order to properly materialize results. What could be helpful for us to evaluate the priority of this, is a runnable code sample which clearly shows that the last ORDER BY has a significant impact on perf. |
I'm sorry but it's been 2 month, we fixed it and switched to postgres in between. The last order by was the only cause of perf downgrade. |
I will spend some time to write some code to reproduce the perf degrade issue this weekend, I will be targeting MySQL as it’s where we found the issue. Will post a Github repo here later for this. |
What is the status about this issue? |
@grietine the issue is in our backlog, which means it won't be part of the upcoming 5.0 release. We'll probably consider it for 6.0. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Our team is using Pomelo and MySql and this is causing real performance issues. Are there any workarounds for the time being beyond just loading the related entities and join them manually? |
@rabberbock please note that this issue on only about removing the last ORDER BY when joining; if you're concerned about non-last ORDER BY clauses, see #19571 (and also consider trying out split queries). If you're convinced you're seeing a perf issue because of the last ORDER BY, it would be useful to have a code sample that shows this - it would help us bump up the priority of this issue. |
@roji Thanks for the pointers! I can try to put something together. The gist of it is as follows: The execution plan with the last ORDER BY is the following: Without the last ORDER BY it's: So with the last ORDER BY it uses tmp table, which in our case really degrades performance. Without the last ORDER BY it runs in miliseconds, with the ORDER BY the same query takes a minute. Here is a link to the MySql docs that explain when a tmp table is used. https://dev.mysql.com/doc/refman/8.0/en/internal-temporary-tables.html. What would you like to see in a code sample beyond the above execution plans? |
Thanks for this info @rabberbock! I think the above (and the link to the MySQL docs) should be sufficient IMHO. |
@mikhail-khalizev - Love your work-around, but it does not work for me. It is raising the following exception:
For a straight-forward query:
Any thoughts? I copied your code 'as-is'... |
@mikhail-khalizev @dariusonsched removing all orderings from a query as the above RemoveOrdering does (#19828 (comment)) is dangerous and could result in incorrect results. EF Core generates the orderings it needs in order to perform its job, interfering with that may have various undefined consequences. |
@roji thank you for the insight, though I must then not be understanding how this is/was resolved in EF 6 (we are still on EF 5). |
@dariusonsched this issue tracked removing the last ORDER BY, which indeed isn't necessary. The others are necessary. |
I would clarify further that - as the title now states - it’s about removing the last column in the ORDER BY and not the last ORDER BY as a whole (there could possibly be more than one set of ORDER BY statements in the whole query and the workaround in #19828 (comment) was predicated on this misunderstanding) |
Strange thing happens to me with this new feature. Surprisingly it significantly slows down my query. Described here: Does anybody know what may cause this? |
@vdolek it's very unlikely that removing an ORDER BY makes a query slower... I'd first confirm this by executing the SQL directly against the database, outside of your EF Core application. This would remove other changes in your application, the .NET version, the EF version... In any case, if you see a problem, please open a new issue with the full repro details. |
I m on EF 6.0.7 with Pomelo.EntityFrameworkCore.MySql 6.0.2 and the order by is still getting appended when doing an Order by. It causes huge performance degradation on my query. If I run the query without the order by it is super super fast. Adding it back causes slow perf. Do I have to do anything to remove the order by when running INNER and LEFT Join? |
@roji I have tried that and really when I run those two queries directly, the one without tle last |
@pantonis this issue only removed the very last ordering (see #19571 for details), and only if it's added by EF for internal materialization purposes. If you're convinced you're seeing an unnecessary last ordering, please open a new issue with a full repro and we'll investigate. @vdolek here as well, please post the full query, as well as a the database schema with data so we can repro this here. If you'd like to share that privately, you can send an email to the address on my github profile. |
@vdolek Can you provide the output using |
@vdolek and others, please open a new issue rather than continue posting here - that would be better. |
I'm still having issues with the default ORDER BY found at the end of the query when handling selections with a lot of inclusions (mainly for serving complex objects from a REST endpoint).
/// <summary>
/// Workaround to avoid MySql out of memory on queries with bigger field lists<br/>
/// Inspired by <a href="https://github.com/dotnet/efcore/issues/19828#issuecomment-847222980">chazt3n workaround</a>
/// </summary>
/// <remarks>
/// The interceptor works on the assumption of the first field of the main SELECT statement being also the first field
/// in the ORDER BY statement after the explicitly requested OrderBy expressions
/// </remarks>
public class RemoveLastOrderByInterceptor : DbCommandInterceptor {
public const string QueryTag = "RemoveLastOrderBy";
public override ValueTask<InterceptionResult<DbDataReader>> ReaderExecutingAsync(
DbCommand command, CommandEventData eventData, InterceptionResult<DbDataReader> result,
CancellationToken token = new()) {
try {
TryApplyPatch(command);
} catch (Exception ex) { // The exception handling can be different based on your requirements
// _logger.LogError(ex, "Failed to intercept query.");
Console.WriteLine(ex);
throw; // Fails forcefully to avoid unexpected silent behaviours
}
return base.ReaderExecutingAsync(command, eventData, result, token);
}
private static bool TryApplyPatch(DbCommand command) {
const string orderBy = "ORDER BY";
const string select = "SELECT ";
var query = command.CommandText;
int idx;
if (!query.StartsWith("-- "))
// Check if the command is tagged
return false;
var separatorIdx = query.IndexOf("\n\n", StringComparison.Ordinal);
if (separatorIdx < 0
|| query.IndexOf(QueryTag, 0, separatorIdx + 1, StringComparison.Ordinal) < 0)
// Efficiently checks if the tags block contains the required QueryTag
return false;
if ((idx = query.LastIndexOf(orderBy, StringComparison.Ordinal)) < 0)
// The query doesn't have an ORDER BY statement
return false;
// Using StringBuilder to avoid string allocation issues
// While using early versions of .NET Framework there would be buffer allocation exceptions so it's
// necessary to remove part of the pre-allocated string before appending (or specify capacity explicitly)
var cmd = new StringBuilder(query);
// Identify first SELECT field
var start = query.IndexOf(select, StringComparison.Ordinal);
if (start >= 0) {
var nextIdx = query.IndexOf(",", start, StringComparison.Ordinal);
var fromIdx = query.IndexOf("FROM", start, StringComparison.Ordinal);
// Support both selection with only one value and multi value selection
var end = nextIdx < 0 ? fromIdx : Math.Min(nextIdx, fromIdx);
var from = start + select.Length;
// Assemble first selected field query
var firstField = cmd.ToString(from, end - from);
// Check if the ORDER BY starts with a different field than the first selected field and, in such
// case, identifies the index where the "redundant" ORDER BY begins
var orderStart = query.IndexOf(firstField, idx, StringComparison.Ordinal);
if (orderStart > idx + orderBy.Length + 1)
idx = orderStart - 2 /* Subtract 2 characters to take account of the trailing comma */;
}
// Cut ORDER BY statement or remove it entirely
command.CommandText = cmd
.Remove(idx, query.Length - idx)
.Append(';')
.ToString();
return true;
}
} |
When loading related one-to-many entities, EF adds ORDER BY clauses to make sure all related entities for a given entity are grouped together.
However, we can remove the last ORDER BY clause as it is unnecessary and causes more work (see #19571).
The text was updated successfully, but these errors were encountered: