Query: flaky test - ToListAsync_with_canceled_token #27033

maumar · 2021-12-17T19:37:47Z

Message:
Assert.DoesNotContain() Failure
Found: Microsoft.EntityFrameworkCore.Query.QueryIterationFailed
In value: SelectListIterator<ValueTuple<LogLevel, EventId, String, Object, Exception>, EventId> [Microsoft.EntityFrameworkCore.Query.QueryCompilationStarting, Microsoft.EntityFrameworkCore.Query.QueryExecutionPlanned, Microsoft.EntityFrameworkCore.Query.QueryCompilationStarting, Microsoft.EntityFrameworkCore.Query.QueryExecutionPlanned, Microsoft.EntityFrameworkCore.Query.QueryCompilationStarting, ...]

fails on my box when I run all tests at once, passes if ran individually

smitpatel · 2021-12-17T20:00:19Z

I also ran into this.

roji · 2021-12-17T22:41:05Z

Will look into it.

BTW does it fail consistently for you (when running all tests at once)?

vonzshik · 2021-12-18T07:22:00Z

That's to be expected, with it not clearing the log (unlike ToListAsync_can_be_canceled does) and potentially running in parallel...

efcore/test/EFCore.Specification.Tests/Query/NorthwindMiscellaneousQueryTestBase.cs

Lines 5652 to 5661 in 343233a

 [ConditionalFact] 

 public virtual async Task ToListAsync_with_canceled_token() 

 { 

 using var context = CreateContext(); 

 await Assert.ThrowsAsync<OperationCanceledException>(() => context.Employees.ToListAsync(new CancellationToken(true))); 

 Assert.Contains(CoreEventId.QueryCanceled, Fixture.ListLoggerFactory.Log.Select(l => l.Id)); 

 Assert.DoesNotContain(CoreEventId.QueryIterationFailed, Fixture.ListLoggerFactory.Log.Select(l => l.Id)); 

 }

roji · 2021-12-18T10:46:20Z

@vonzshik can you elaborate? Fixture (along with ListLoggerFactory) is an xunit class fixture, and should never get used concurrently (that's how it works in xunit, not the same as nunit). The clearing in ToListAsync_can_be_canceled is because that test performs the same operation in a loop, and we don't want log messages from different iterations to leak - it's not there to guard against concurrency issues.

ajcvickers · 2021-12-18T10:51:26Z

@roji Agreed there are no concurrency issues. However, not clearing the log means that it will contain arbitrary context depending on the order that the tests in the fixture are run. And that can be any order, and different from run to run.

roji · 2021-12-18T10:55:49Z

@ajcvickers but we clear the logs in the constructor of the test class, which gets executed before every test, right? Put another way, our tests almost never clear the logs themselves - that's handled by our infrastructure, no?

roji · 2021-12-18T10:57:22Z

BTW does it fail consistently for you (when running all tests at once)?

@maumar @smitpatel BTW on which provider does it fail (SQL Server/SQLite)? It doesn't repro for me here (though Linux...).

ajcvickers · 2021-12-18T10:58:41Z

@roji Only on SQL Server, right?

roji · 2021-12-18T10:59:53Z

We do this consistently everywhere AFAIK. If some provider neglects to do this in their test classes, wouldn't all their tests start failing immediately, as logs from previous tests appear for later ones?

ajcvickers · 2021-12-18T11:01:01Z

Never mind, we do it in SQLite too.

But no, I don't think tests will generally fail. We don't always look at logs.

roji · 2021-12-18T11:03:25Z

OK. Certainly most query test classes assert on SQL (on SQL Server at least).

One thing I'm not clear about, is why we don't simply clear the logs in the test base classes (Fixture.ListLoggerFactory.Clear) instead of doing it on each concrete test class... @maumar any idea?

roji · 2021-12-18T11:11:40Z

Ahhh, maybe retries again? If the first attempt fails because of some transient problem, the second attempt will already have a QueryIterationFailed message in the logs. Since we can't clear the logs between execution retries, best to just remove the assertion against QueryIterationFailed.

Remove assertions that QueryIterationFailed doesn't appear in the logs, since it may be there because of transient failures from previous retries. Fixes dotnet#27033

vonzshik · 2021-12-18T11:28:53Z

@vonzshik can you elaborate? Fixture (along with ListLoggerFactory) is an xunit class fixture, and should never get used concurrently (that's how it works in xunit, not the same as nunit). The clearing in ToListAsync_can_be_canceled is because that test performs the same operation in a loop, and we don't want log messages from different iterations to leak - it's not there to guard against concurrency issues.

@ajcvickers but we clear the logs in the constructor of the test class, which gets executed before every test, right? Put another way, our tests almost never clear the logs themselves - that's handled by our infrastructure, no?

Welp, I was wrong on both accounts, teaches me to always check beforehand whether everything works exactly as I thought :D

Ahhh, maybe retries again? If the first attempt fails because of some transient problem, the second attempt will already have a QueryIterationFailed message in the logs.

But what's the reason for a retry, especially locally? This might indicate some other test suite issue...

ajcvickers · 2021-12-18T11:32:18Z

@vonzshik

But what's the reason for a retry, especially locally? This might indicate some other test suite issue...

Spoken like a true PostgreSQLer. 😜 SQL Server/SqlClient sometimes just fails...

roji · 2021-12-18T12:32:56Z

Welp, I was wrong on both accounts, teaches me to always check beforehand whether everything works exactly as I thought :D

Very normal for someone not yet very familiar with EF Core internals, xunit, etc... That's how one learns!

Spoken like a true PostgreSQLer.

Ahahaha indeed :)

roji · 2021-12-18T13:01:50Z

BTW I wonder if all the SQL Server/SqlClient reliability issues are actually LocalDB issues... it's possible that it wasn't meant for the kind of massive, concurrent load that our tests do (it's hard to imagine that real SQL Server is this unreliable in the world). We could simply use real SQL Server in our CI pipeline (and also on local dev machines), though there's also an advantage in having these reliability issues in the CI pipeline - it's good coverage for our retry logic etc. (chaos monkey!).

ajcvickers · 2021-12-18T13:24:30Z

See #25448 (comment)

But LocalDb is easy for contributors, and in the past we have had trouble getting images for Helix/AzDo with SQL Server set up appropriately. That may not be an issue anymore.

roji · 2021-12-18T13:26:41Z

Yeah. We can still leave the default test setup to be LocalDB (for contributors), but configure our CI tests to run on real SQL Server (assuming Helix/AzDo can handle that today). But I still there's some value in the chaos flakiness introduced by LocalDB.

ErikEJ · 2021-12-18T14:59:35Z

I wonder why LocalDB should be flaky, it is the same engine as SQL Server Full.

I assume attachdbfilename is not used.

Only difference would be named pipes communication.

Also instant file initialization rights could play a role.

ajcvickers · 2021-12-18T15:15:07Z

@ErikEJ

Only difference would be named pipes communication.

I have suspected for many years that this might be the issue. Which also explains why it never get fixed; nobody would use named pipes for anything real.

Remove assertions that QueryIterationFailed doesn't appear in the logs, since it may be there because of transient failures from previous retries. Fixes #27033

vonzshik · 2021-12-18T23:07:01Z

@roji even taking into account SqlClient/SqlServer issues, the flaky test passes an already cancelled token. At no point there should be any IO or retries at all...

roji · 2021-12-18T23:18:49Z

That should be true, although... whenever EF executes a query, it opens and closes the underlying DbConnection. As you yourself pointed out, the login sequence seems like it's synchronous, so it's possible the token isn't even consulted at that point...

Basically at this point I don't want to make any assumptions about the driver which I don't have to 😉

maumar added the area-test label Dec 17, 2021

roji added a commit to roji/efcore that referenced this issue Dec 18, 2021

Fix cancellation test flakiness

02e03c7

Remove assertions that QueryIterationFailed doesn't appear in the logs, since it may be there because of transient failures from previous retries. Fixes dotnet#27033

roji mentioned this issue Dec 18, 2021

Fix cancellation test flakiness #27036

Merged

roji self-assigned this Dec 18, 2021

roji added this to the 7.0.0 milestone Dec 18, 2021

roji added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Dec 18, 2021

roji closed this as completed in #27036 Dec 18, 2021

roji added a commit that referenced this issue Dec 18, 2021

Fix cancellation test flakiness (#27036)

5c4d159

Remove assertions that QueryIterationFailed doesn't appear in the logs, since it may be there because of transient failures from previous retries. Fixes #27033

ajcvickers added the type-bug label Jan 4, 2022

ajcvickers added the area-query label Jan 31, 2022

ajcvickers modified the milestones: 7.0.0, 7.0.0-preview1 Feb 14, 2022

ajcvickers modified the milestones: 7.0.0-preview1, 7.0.0 Nov 5, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query: flaky test - ToListAsync_with_canceled_token #27033

Query: flaky test - ToListAsync_with_canceled_token #27033

maumar commented Dec 17, 2021

smitpatel commented Dec 17, 2021

roji commented Dec 17, 2021

vonzshik commented Dec 18, 2021

roji commented Dec 18, 2021

ajcvickers commented Dec 18, 2021 •

edited

Loading

roji commented Dec 18, 2021

roji commented Dec 18, 2021

ajcvickers commented Dec 18, 2021

roji commented Dec 18, 2021

ajcvickers commented Dec 18, 2021

roji commented Dec 18, 2021

roji commented Dec 18, 2021

vonzshik commented Dec 18, 2021

ajcvickers commented Dec 18, 2021

roji commented Dec 18, 2021

roji commented Dec 18, 2021

ajcvickers commented Dec 18, 2021

roji commented Dec 18, 2021 •

edited

Loading

ErikEJ commented Dec 18, 2021

ajcvickers commented Dec 18, 2021

vonzshik commented Dec 18, 2021

roji commented Dec 18, 2021

Query: flaky test - ToListAsync_with_canceled_token #27033

Query: flaky test - ToListAsync_with_canceled_token #27033

Comments

maumar commented Dec 17, 2021

smitpatel commented Dec 17, 2021

roji commented Dec 17, 2021

vonzshik commented Dec 18, 2021

roji commented Dec 18, 2021

ajcvickers commented Dec 18, 2021 • edited Loading

roji commented Dec 18, 2021

roji commented Dec 18, 2021

ajcvickers commented Dec 18, 2021

roji commented Dec 18, 2021

ajcvickers commented Dec 18, 2021

roji commented Dec 18, 2021

roji commented Dec 18, 2021

vonzshik commented Dec 18, 2021

ajcvickers commented Dec 18, 2021

roji commented Dec 18, 2021

roji commented Dec 18, 2021

ajcvickers commented Dec 18, 2021

roji commented Dec 18, 2021 • edited Loading

ErikEJ commented Dec 18, 2021

ajcvickers commented Dec 18, 2021

vonzshik commented Dec 18, 2021

roji commented Dec 18, 2021

ajcvickers commented Dec 18, 2021 •

edited

Loading

roji commented Dec 18, 2021 •

edited

Loading