-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query: flaky test - ToListAsync_with_canceled_token #27033
Comments
I also ran into this. |
Will look into it. BTW does it fail consistently for you (when running all tests at once)? |
That's to be expected, with it not clearing the log (unlike efcore/test/EFCore.Specification.Tests/Query/NorthwindMiscellaneousQueryTestBase.cs Lines 5652 to 5661 in 343233a
|
@vonzshik can you elaborate? Fixture (along with ListLoggerFactory) is an xunit class fixture, and should never get used concurrently (that's how it works in xunit, not the same as nunit). The clearing in ToListAsync_can_be_canceled is because that test performs the same operation in a loop, and we don't want log messages from different iterations to leak - it's not there to guard against concurrency issues. |
@roji Agreed there are no concurrency issues. However, not clearing the log means that it will contain arbitrary context depending on the order that the tests in the fixture are run. And that can be any order, and different from run to run. |
@ajcvickers but we clear the logs in the constructor of the test class, which gets executed before every test, right? Put another way, our tests almost never clear the logs themselves - that's handled by our infrastructure, no? |
@maumar @smitpatel BTW on which provider does it fail (SQL Server/SQLite)? It doesn't repro for me here (though Linux...). |
@roji Only on SQL Server, right? |
We do this consistently everywhere AFAIK. If some provider neglects to do this in their test classes, wouldn't all their tests start failing immediately, as logs from previous tests appear for later ones? |
Never mind, we do it in SQLite too. But no, I don't think tests will generally fail. We don't always look at logs. |
OK. Certainly most query test classes assert on SQL (on SQL Server at least). One thing I'm not clear about, is why we don't simply clear the logs in the test base classes (Fixture.ListLoggerFactory.Clear) instead of doing it on each concrete test class... @maumar any idea? |
Ahhh, maybe retries again? If the first attempt fails because of some transient problem, the second attempt will already have a QueryIterationFailed message in the logs. Since we can't clear the logs between execution retries, best to just remove the assertion against QueryIterationFailed. |
Remove assertions that QueryIterationFailed doesn't appear in the logs, since it may be there because of transient failures from previous retries. Fixes dotnet#27033
Welp, I was wrong on both accounts, teaches me to always check beforehand whether everything works exactly as I thought :D
But what's the reason for a retry, especially locally? This might indicate some other test suite issue... |
Spoken like a true PostgreSQLer. 😜 SQL Server/SqlClient sometimes just fails... |
Very normal for someone not yet very familiar with EF Core internals, xunit, etc... That's how one learns!
Ahahaha indeed :) |
BTW I wonder if all the SQL Server/SqlClient reliability issues are actually LocalDB issues... it's possible that it wasn't meant for the kind of massive, concurrent load that our tests do (it's hard to imagine that real SQL Server is this unreliable in the world). We could simply use real SQL Server in our CI pipeline (and also on local dev machines), though there's also an advantage in having these reliability issues in the CI pipeline - it's good coverage for our retry logic etc. (chaos monkey!). |
See #25448 (comment) But LocalDb is easy for contributors, and in the past we have had trouble getting images for Helix/AzDo with SQL Server set up appropriately. That may not be an issue anymore. |
Yeah. We can still leave the default test setup to be LocalDB (for contributors), but configure our CI tests to run on real SQL Server (assuming Helix/AzDo can handle that today). But I still there's some value in the chaos flakiness introduced by LocalDB. |
I wonder why LocalDB should be flaky, it is the same engine as SQL Server Full. I assume attachdbfilename is not used. Only difference would be named pipes communication. Also instant file initialization rights could play a role. |
I have suspected for many years that this might be the issue. Which also explains why it never get fixed; nobody would use named pipes for anything real. |
Remove assertions that QueryIterationFailed doesn't appear in the logs, since it may be there because of transient failures from previous retries. Fixes #27033
@roji even taking into account SqlClient/SqlServer issues, the flaky test passes an already cancelled token. At no point there should be any IO or retries at all... |
That should be true, although... whenever EF executes a query, it opens and closes the underlying DbConnection. As you yourself pointed out, the login sequence seems like it's synchronous, so it's possible the token isn't even consulted at that point... Basically at this point I don't want to make any assumptions about the driver which I don't have to 😉 |
Message:
Assert.DoesNotContain() Failure
Found: Microsoft.EntityFrameworkCore.Query.QueryIterationFailed
In value: SelectListIterator<ValueTuple<LogLevel, EventId, String, Object, Exception>, EventId> [Microsoft.EntityFrameworkCore.Query.QueryCompilationStarting, Microsoft.EntityFrameworkCore.Query.QueryExecutionPlanned, Microsoft.EntityFrameworkCore.Query.QueryCompilationStarting, Microsoft.EntityFrameworkCore.Query.QueryExecutionPlanned, Microsoft.EntityFrameworkCore.Query.QueryCompilationStarting, ...]
fails on my box when I run all tests at once, passes if ran individually
The text was updated successfully, but these errors were encountered: