-
Notifications
You must be signed in to change notification settings - Fork 18k
Ticker change in behavior from 1.0.3 to 1.1 when GOMAXPROCS=1 #5324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
select is NOT a scheduling operation. I don't know where you got the idea that it would always allow other goroutines to run. Instead, it is similar to a send or recv statement which are not scheduling statements either, except if channels are empty/full. Also, the behaviour of your program is correct. I don't think you have any guarantee except in the case where: "if that goroutine didn't exist, the program would deadlock". |
Let's try a different tack. Please define in a clear way, like you were writing a section for the "The Go Programming Language Specification" what a programmer has to do when GOMAXPROCS=1 to ensure that Tickers will tick. Or, more generally, what one has to do to ensure that other goroutines will be scheduled and run when GOMAXPROCS=1. Since you used previously used words like "scheduling operation" and "scheduling statements" perhaps you could start by defining those terms. |
I would say in the "Program execution" section: "Program execution begins by initializing the main package and then invoking the function main. A running program runs one or more goroutines concurrently. Goroutines that reach a statement that blocks (see 'receive statement' and 'select statement') cannot be run anymore until the unblocking condition is satisfied." The reason why your program makes you unhappy is that your select statement is not "a statement that blocks" according to the specification. |
Actually I don't agree that blocking is the key issue here. The key issues is what guarantees, if any, does Go provide when GOMAXPROCS=1, that forward progress will be made on all Goroutines. Having said that, responding to you comment above: "select" is certainly a blocking statement most of the time. From the spec. "if there is no default case, the statement blocks until one of the communications can complete". The word "blocks" is used in two other places. Do you agree that if there was no "default" case "select" is a blocking statement? If so, then the question is does adding a "default" turn a blocking "select" into a non blocking "select"? The spec really doesn't answer that question clearly. My belief is that would be very inconsistent and confusing to be blocking sometimes and non-blocking other times. Even if it was the spec would have so say select is blocking since it would block sometimes. I understand that the current scheduler is not preemptive and therefore context switching has to be arranged at certain points in the code flow. I also understand that at some point the goal might be to eliminate GOMAXPROCS. |
The spec in general does not and will not discuss which language constructs can enter the scheduler. This is an implementation detail and different implementations will behave differently. In the current gc implementation there is no assumption that the select statement will ever enter the scheduler, whether or not there is a default case. The use of the work "blocks" in the language spec should not be taken as an indication that any other goroutine will get a chance to run. In particular, "blocks until one of the communications can complete" does not imply that the select statement will block at all, since it is possible that one of the communications can complete at the time the select statement is entered. |
Thanks for considering the issue. I agree that the spec shouldn't indicate which constructs cause the scheduler to run as that is subject to change and implementation details. I wasn't expecting the spec to be amended. It was just a technique to generate some discussion about the issue. As I mentioned above, this example was culled from a much larger program that generates metrics based on a Go Ticker. The program has existed for +6 months and it's very stable and reliable. Is uses a number of goroutines, does compute and I/O. I normally run it with GOMAXPROC > 1. This week I happened to test it on a new machine where I forgot to modify .profile to have GOMAXPROC > 1 and it exhibited some bizarre behavior as mentioned above. So it's just a fluke that I noticed this at all. I'm glad it happened because it made me think more about my own model of how the runtime works and I appreciate the discussion and all the comments. Having said all that I am not sure how I could rewrite the program to work properly when GOMAXPROC=1. If 1000+ lines of code with two goroutines doing I/O, channels sends and receives, a stats goroutine, and so on aren't enough to trigger the scheduler, what is? I am sure I can figure out a way to trigger the scheduler, but even that is likely subject to change subject to implementation. I think it is kind of sad that there is no way to ensure that a timer based program, where the core, main loop, is an infinite loop wrapping "select" and the default case is the callout to all the real work, will function properly when GOMAXPROC=1. Something has been lost from 1.0.3 to 1.1 and while it may that "The 1.1 behaviour is within spec" you should consider what has been lost. One final question (someone from Google) suggested I call runtime.Gosched. That was refuted by Rémy. Is that really the case? Gosched seems to be the one documented way to try and force the scheduler to run? If it isn't guaranteed to run other go routines then there appears to be no documented way to fix the GOMAXPROC=1 case. That leads to many build-in library constructs may not function properly when GOMAXPROC=1. |
In a select statement the default case is always available and can always run. It does not make sense to write an endless loop whose only element is a select statement with a default case. I don't know what your program is trying to do. I know that your example is only a toy. In the toy program, you should decide how often you want to print an "M". You shouldn't print one as often as possible--that is never going to make sense. Even in the best possible case that is going to burn CPU time to no purpose. So when you ask how you can rewrite the program to work properly when GOMAXPROC=1 I think you need to define what "work properly" means. If you are asking how to write a program that includes a CPU-burning endless loop but periodically interrupts the loop, then I think the answer is that you need to periodically check for something else within the loop. What you should check for depends on how your program is supposed to work. To put it another way, you are writing as though the endless loop is some low priority operation that should only happen if there is nothing else to do. But Go doesn't work that way--it doesn't provide a facility to do something only when there is nothing else to do. A more appropriate way to look at your program is that you have an endless loop, and you want to check other things every so often. When do you want to check them, and what do you want to do? |
I agree that Go 1.1 is within spec, but the sad part is that he could previously write a CPU-consuming loop with a periodic timer interrupt using higher-level Go constructs (a timer.Ticker + its channel + select), whereas now he has to write his loop as checking time.Now() and doing his own time math. Or cheating and using somewhat-gross runtime.Gosched. I doubt many people realize that a time.Ticker is implemented with a goroutine being scheduled in the future. That's why I proposed our implementation (still within spec) could cause a Gosched in the nbselect every N iterations and/or having had so much time elapsed since the last "default" case, thus at least allowing a pending timer send to schedule and future selects to not hit the default case. I'm not saying this is a Go 1.1 issue, but it's an issue. I'm not sure it's "Working as Intended" as much as it is a dup of some other existing scheduler issue (if one is already filed). |
Sorry the toy code is just that, toy code, and isn't that representative of the real code. I added the printing of the "M" and the last minute. Maybe that was mistake. The real code looks more like this: // measure something for { // do something you can measure including I/O , system calls, CPU, network (lots of code here that calls out to other code) and count it select { case <-ticker.C: m := snap_metric&m) stats <- calc_stats(m) default: } } Is this a valid idiom? I hope so, because it makes a vary nice way to collect metrics in Go. I get the feeling the answer is going to be no. I just want to point out that I am not trying to write infinite loops but rather collect metrics at a certain rate. Also, I am not trying to beat this to death. I thought I had a nice idiom for collecting metrics but it's not clear that it is. So I am seeking some feedback. |
Go1.1 does not consider non-block (short) syscalls as preemption points. Short roughly means ~20us. If you do writes to the console, this is most likely not considered as preemption point now. However I am not sure why the goroutine is not preempted if you do network IO. It should. We should think about preemptive scheduling. Non-preemptive scheduler is too unintuitive and fragile. |
>That's why I proposed our implementation (still within spec) could cause a Gosched in the nbselect every N iterations and/or having had so much time elapsed since the last "default" case, thus at least allowing a pending timer send to schedule and future selects to not hit the default case. You can do it manually to resolve the problem for now. Or just remove the select and check current time instead. |
Status changed to Duplicate. Merged into issue #543. |
@13: I prefer the non-preemptive goroutine scheduler a lot. Advantages - It works perfectly/properly for correctly designed programs. - It correctly fails to work for incorrectly designed programs. That's IMO as valuable as is e.g. a nil dereference panic, or closing a closed channel etc. |
The test that failed was doing ~50 reads/sec to a raw disk device in one goroutine and ~ 250 reads/sec to a different raw device in another goroutine. I have't seen network I/O fail. Both goroutines were reading from the same Ticker and both wake up every 1/sec. I just realized that both routines read from the same Ticker. Does Ticker deliver the same tick to multiple readers of it's channel? It appears to. |
This issue was closed.
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The text was updated successfully, but these errors were encountered: