-
Notifications
You must be signed in to change notification settings - Fork 18k
runtime/trace: test failures increased as of 2020-05-01 #38794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I was thinking to skip this in case of the inconsistent ordering error (like many other execution trace tests due to #29707. But if we are observing the failure rate increasing, I am afraid users will face the similar issue more frequently. Are there any changes during this cycle that possibly caused the timestamp info to be less accurate? |
https://storage.googleapis.com/go-build-log/2449e0ae/android-amd64-emu_518ac08d.log
|
https://build.golang.org/log/d14354a0476763a9e3f92118417547d2859351d7
|
This has so far caused two consecutive SlowBot failures on https://golang.org/cl/233526. I'm going to start a |
This comment has been minimized.
This comment has been minimized.
Rats, looks like I need a larger
Restarting bisection. 😞 At least I have a tighter upper bound now! |
New
That was CL 207998, for #35788. (CC @mknyszek @aclements) |
This comment has been minimized.
This comment has been minimized.
Oh, that CL isn't much too old after all! It was authored back in November, but wasn't merged until April 30, and apparently Git+Gerrit preserve the date of the first iteration of the commit rather than updating it on |
@bcmills There's a separate commit date. Huh... I wonder if there's an edge missing in the trace somewhere. The scavenger is now sometimes awoken via sysmon, which is a fairly indirect way of unparking a goroutine. |
@bcmills I forgot to say: thanks for bisecting. I'll dig into it. |
Some progress: it's definitely the scavenger holding it up some of the time. I have a trace where I've observed a I'm not totally sure how this is possible. Scavenger wake-ups are protected by a lock. The first thing Anyway, will continue to investigate. I feel close to the source of the issue, but the mechanism still eludes me. |
Getting closer. Looks more like now there's a chance there's a missing |
I think I figured it out (and maybe the cause of many of our recent trace ordering inconsistencies). The main problem is missing events generated by
The recent change to have the scavenger occasionally awoken by Not sure what the fix is here. Stopping CC @aclements @hyangah @ianlancetaylor Possibly related: #29707 |
By "a lot of work" I mean "a lot of work to make efficient." Because |
Interesting result. I would like to believe that there is a limited number of trace events that |
What I've been observing is that they come from (or rather should come from) |
Obviously I'm missing something, but I don't see how |
Here's a more concrete scenario:
We now have a trace which has goroutines going from |
Thanks @mknyszek for investigating it. What will happen if we relax the trace ordering requirement, and allow |
cc/ @dvyukov @pjweinbgo who know better about the ordering algorithm. |
@hyangah That will probably work? The |
If the issue is specific to (By the way, |
That's what I was getting at in an earlier comment. I tried to actually implement this and it's a bit tricky to do so. Currently |
I get that it's hard to suspend |
I think we're on the same page here, I don't mean anything more than for Even with a simple lock it's non-trivial because you need to wait until Maybe I'm missing something and there is a simple, obviously-correct way to do this, but I tried pretty much what you suggested and it was deadlocks galore. |
It doesn't seem that bad to me since |
@ianlancetaylor Sorry, you're totally right. I was overthinking it. A lock around all the things |
Change https://golang.org/cl/234617 mentions this issue: |
After this change, no failure in |
2020-05-01T15:02:37-f092be8/windows-amd64-longtest
2019-11-22T21:09:43-9f3c2b6/windows-amd64-longtest
CC @hyangah @aclements
The text was updated successfully, but these errors were encountered: