Skip to content

Ticker change in behavior from 1.0.3 to 1.1 when GOMAXPROCS=1 #5324

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tildeleb opened this issue Apr 21, 2013 · 19 comments
Closed

Ticker change in behavior from 1.0.3 to 1.1 when GOMAXPROCS=1 #5324

tildeleb opened this issue Apr 21, 2013 · 19 comments

Comments

@tildeleb
Copy link

See code here:
http://play.golang.org/p/50TI5VuWHN

What is the expected output?
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMloop: tick tsecs=1.001247, secs=1.001247, cnt=32412156
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMloop: tick tsecs=2.002406, secs=1.001159, cnt=34743305
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMloop: tick tsecs=3.001494, secs=0.999088, cnt=3168060

What do you see instead?
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM ...


Which compiler are you using (5g, 6g, 8g, gccgo)?
6g


Which operating system are you using?
MacOS X 10.7.5


Which version are you using?  (run 'go version')
1.0.3
1.1 (go version devel +13e00572ed0b Thu Apr 18 17:37:21 2013 -0700 darwin/amd64)

Please provide any additional information below.
With Go 1.0.3 this program works the same with GOMAXPROCS=1 or GOMAXPROC > 1
With Go 1.1 this program works the same as 1.0.3 when GOMAXPROC > 1
With Go 1.1 when GOMAXPROC=1 the Ticker never ticks.

This represents a change in behavior from 1.0.3 to 1.1.

In my mental model of the Go runtime, at least when GOMAXPROCS=1, the "select"
statement *must* call the scheduler to allow other goroutines to run. If this is not the
case, then it should be documented what one needs to do in order to get scheduling to
occur.

Questions:

1. Is it really the case that I must insert a call to runtime.Gosched() to get the
correct behavior in Go 1.1? I find that a very unappealing solution to say the least.

2. Are there any guarantees at the language reference specification level of what one
must do to prevent goroutine scheduling starvation, especially when GOMAXPROCS=1? If
not, should there be?

This simple program was distilled down from a larger program that does I/O and compute
in the spot where the example does cnt++.. The larger program is collecting I/O metrics
and the Ticker is used to decide when to collect the metrics and reset the them. With Go
1.1 and GOMAXPROCS=1 I saw some bizarre behavior. The Ticker (set at 1s duration) would
tick fine for 300 seconds and then it would stop for a long time. Occasionally it would
send something down the channel. Really it was very strange.
@remyoudompheng
Copy link
Contributor

Comment 1:

select is NOT a scheduling operation. I don't know where you got the idea that it would
always allow other goroutines to run. Instead, it is similar to a send or recv statement
which are not scheduling statements either, except if channels are empty/full.
Also, the behaviour of your program is correct. I don't think you have any guarantee
except in the case where: "if that goroutine didn't exist, the program would deadlock".

@tildeleb
Copy link
Author

Comment 2:

Are you really saying that if I want the Ticker to tick when GOMAXPROCS=1 I have to
sprinkle calls to runtime.Gosched() throughout my code?

@remyoudompheng
Copy link
Contributor

Comment 3:

runtime.Gosched is not guaranteed to solve your issue. You have to write different code
that *prevents* the default branch from running if you want to receive a tick.

@tildeleb
Copy link
Author

Comment 4:

Let's try a different tack. Please define in a clear way, like you were writing a
section for the "The Go Programming Language Specification" what a programmer has to do
when GOMAXPROCS=1 to ensure that Tickers will tick. Or, more generally, what one has to
do to ensure that other goroutines will be scheduled and run when GOMAXPROCS=1. Since
you used previously used words like "scheduling operation" and "scheduling statements"
perhaps you could start by defining those terms.

@remyoudompheng
Copy link
Contributor

Comment 5:

I would say in the "Program execution" section:
"Program execution begins by initializing the main package and then invoking the
function main. A running program runs one or more goroutines concurrently. Goroutines
that reach a statement that blocks (see 'receive statement' and 'select statement')
cannot be run anymore until the unblocking condition is satisfied."
The reason why your program makes you unhappy is that your select statement is not "a
statement that blocks" according to the specification.

@tildeleb
Copy link
Author

Comment 6:

Actually I don't agree that blocking is the key issue here. The key issues is what
guarantees, if any, does Go provide when GOMAXPROCS=1, that forward progress will be
made on all Goroutines.
Having said that, responding to you comment above:
"select" is certainly a blocking statement most of the time. From the spec. "if there is
no default case, the statement blocks until one of the communications can complete". The
word "blocks" is used in two other places.
Do you agree that if there was no "default" case "select" is a blocking statement?
If so, then the question is does adding a "default" turn a blocking "select" into a non
blocking "select"? The spec really doesn't answer that question clearly. My belief is
that would be very inconsistent and confusing to be blocking sometimes and non-blocking
other times. Even if it was the spec would have so say select is blocking since it would
block sometimes.
I understand that the current scheduler is not preemptive and therefore context
switching has to be arranged at certain points in the code flow. I also understand that
at some point the goal might be to eliminate GOMAXPROCS.

@ianlancetaylor
Copy link
Member

Comment 7:

The spec in general does not and will not discuss which language constructs can enter
the scheduler.  This is an implementation detail and different implementations will
behave differently.
In the current gc implementation there is no assumption that the select statement will
ever enter the scheduler, whether or not there is a default case.
The use of the work "blocks" in the language spec should not be taken as an indication
that any other goroutine will get a chance to run.  In particular, "blocks until one of
the communications can complete" does not imply that the select statement will block at
all, since it is possible that one of the communications can complete at the time the
select statement is entered.

@ianlancetaylor
Copy link
Member

Comment 8:

The 1.1 behaviour is within spec.

Status changed to WorkingAsIntended.

@tildeleb
Copy link
Author

Comment 9:

Thanks for considering the issue.
I agree that the spec shouldn't indicate which constructs cause the scheduler to run as
that is subject to change and implementation details.  I wasn't expecting the spec to be
amended. It was just a technique to generate some discussion about the issue.
As I mentioned above, this example was culled from a much larger program that generates
metrics based on a Go Ticker. The program has existed for +6 months and it's very stable
and reliable. Is uses a number of goroutines, does compute and I/O. I normally run it
with GOMAXPROC > 1. This week I happened to test it on a new machine where I forgot to
modify .profile to have GOMAXPROC > 1 and it exhibited some bizarre behavior as
mentioned above. So it's just a fluke that I noticed this at all.
I'm glad it happened because it made me think more about my own model of how the runtime
works and I appreciate the discussion and all the comments.
Having said all that I am not sure how I could rewrite the program to work properly when
GOMAXPROC=1. If 1000+ lines of code with two goroutines doing I/O, channels sends and
receives, a stats goroutine, and so on aren't enough to trigger the scheduler, what is?
I am sure I can figure out a way to trigger the scheduler, but even that is likely
subject to change subject to implementation.
I think it is kind of sad that there is no way to ensure that a timer based program,
where the core, main loop, is an infinite loop  wrapping "select" and the default case
is the callout to all the real work, will function  properly when GOMAXPROC=1.
Something has been lost from 1.0.3 to 1.1 and while it may that "The 1.1 behaviour is
within spec" you should consider what has been lost.
One final question (someone from Google) suggested I call runtime.Gosched. That was
refuted by Rémy. Is that really the case? Gosched seems to be the one documented way to
try and force the scheduler to run? If it isn't guaranteed to run other go routines then
there appears to be no documented way to fix the GOMAXPROC=1 case.
That leads to many build-in library constructs may not function properly when
GOMAXPROC=1.

@ianlancetaylor
Copy link
Member

Comment 10:

In a select statement the default case is always available and can always run.  It does
not make sense to write an endless loop whose only element is a select statement with a
default case.  I don't know what your program is trying to do.
I know that your example is only a toy.  In the toy program, you should decide how often
you want to print an "M".  You shouldn't print one as often as possible--that is never
going to make sense.  Even in the best possible case that is going to burn CPU time to
no purpose.
So when you ask how you can rewrite the program to work properly when GOMAXPROC=1 I
think you need to define what "work properly" means.  If you are asking how to write a
program that includes a CPU-burning endless loop but periodically interrupts the loop,
then I think the answer is that you need to periodically check for something else within
the loop.  What you should check for depends on how your program is supposed to work.
To put it another way, you are writing as though the endless loop is some low priority
operation that should only happen if there is nothing else to do.  But Go doesn't work
that way--it doesn't provide a facility to do something only when there is nothing else
to do.  A more appropriate way to look at your program is that you have an endless loop,
and you want to check other things every so often.  When do you want to check them, and
what do you want to do?

@bradfitz
Copy link
Contributor

Comment 11:

I agree that Go 1.1 is within spec, but the sad part is that he could previously write a
CPU-consuming loop with a periodic timer interrupt using higher-level Go constructs (a
timer.Ticker + its channel + select), whereas now he has to write his loop as checking
time.Now() and doing his own time math.  Or cheating and using somewhat-gross
runtime.Gosched.
I doubt many people realize that a time.Ticker is implemented with a goroutine being
scheduled in the future.
That's why I proposed our implementation (still within spec) could cause a Gosched in
the nbselect every N iterations and/or having had so much time elapsed since the last
"default" case, thus at least allowing a pending timer send to schedule and future
selects to not hit the default case.
I'm not saying this is a Go 1.1 issue, but it's an issue.  I'm not sure it's "Working as
Intended" as much as it is a dup of some other existing scheduler issue (if one is
already filed).

@tildeleb
Copy link
Author

Comment 12:

Sorry the toy code is just that, toy code, and isn't that representative of the real
code.  I added the printing of the "M" and the last minute. Maybe that was mistake.
The real code looks more like this:
// measure something
for {
     // do something you can measure including I/O , system calls, CPU, network (lots of code here that calls out to other code) and count it
     select {
      case <-ticker.C:
          m := snap_metric&m)
      stats <- calc_stats(m)
       default:
    }
}
Is this a valid idiom? I hope so, because it makes a vary nice way to collect metrics in
Go. I get the feeling the answer is going to be no.
I just want to point out that I am not trying to write infinite loops but rather collect
metrics at a certain rate.
Also, I am not trying to beat this to death. I thought I had a nice idiom for collecting
metrics but it's not clear that it is. So I am seeking some feedback.

@dvyukov
Copy link
Member

dvyukov commented Apr 22, 2013

Comment 13:

Go1.1 does not consider non-block (short) syscalls as preemption points. Short roughly
means ~20us. If you do writes to the console, this is most likely not considered as
preemption point now. However I am not sure why the goroutine is not preempted if you do
network IO. It should.
We should think about preemptive scheduling. Non-preemptive scheduler is too unintuitive
and fragile.

@dvyukov
Copy link
Member

dvyukov commented Apr 22, 2013

Comment 14:

>That's why I proposed our implementation (still within spec) could cause a Gosched in
the nbselect every N iterations and/or having had so much time elapsed since the last
"default" case, thus at least allowing a pending timer send to schedule and future
selects to not hit the default case.
You can do it manually to resolve the problem for now.
Or just remove the select and check current time instead.

@dvyukov
Copy link
Member

dvyukov commented Apr 22, 2013

Comment 15:

Status changed to Duplicate.

Merged into issue #543.

@cznic
Copy link
Contributor

cznic commented Apr 22, 2013

Comment 16:

@13: I prefer the non-preemptive goroutine scheduler a lot.
Advantages
- It works perfectly/properly for correctly designed programs.
- It correctly fails to work for incorrectly designed programs. That's IMO as valuable
as is e.g. a nil dereference panic, or closing a closed channel etc.

@tildeleb
Copy link
Author

Comment 17:

The test that failed was doing ~50 reads/sec to a raw disk device in one goroutine and ~
250 reads/sec to a different raw device in another goroutine. I have't seen network I/O
fail.
Both goroutines were reading from the same Ticker and both wake up every 1/sec.
I just realized that both routines read from the same Ticker. Does Ticker deliver the
same tick to multiple readers of it's channel? It appears to.

@dvyukov
Copy link
Member

dvyukov commented Apr 25, 2013

Comment 18:

>I just realized that both routines read from the same Ticker. Does Ticker deliver the
same tick to multiple readers of it's channel?
No, it does not.

@tildeleb
Copy link
Author

Comment 19:

Sorry I was wrong. Each metric test uses it's own Ticker. Geez and I wrote this code,
you think I would know.

@golang golang locked and limited conversation to collaborators Jun 24, 2016
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants