-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: cmd/vet: add check for sync.WaitGroup abuse #18022
Comments
I don't like the idea. It's IMHO perhaps a task for |
I think this has been proposed before (probably on the mailing lists)
and rejected.
|
A func main() {
var wg sync.WaitGroup
defer wg.Wait()
go func() {
wg.Add(1)
defer wg.Done()
}()
} and In terms of vet's requirements:
@dominikh , |
As far as the proposal goes, I don't like how it limits the func signature to |
@dsnet The check in staticcheck has no (known) false positives. It shouldn't have a significant number of false negatives, either. The implementation is a simple pattern-based check, detecting I'm -1 on the proposed |
@dominikh I don't see any extra level of nesting here, though (assuming any func signature is allowed). To be nitpicky, another thing that stands out from the proposal is how |
@mvdan The extra level of nesting would come from a predicted usage that looks something like this:
as opposed to
Admittedly the same level of indentation, but syntactically it's one extra level of nesting. |
Ah yes, I was thinking indentation there. |
The problematic case is that
|
@cznic if you mean without the extra |
@mvdan Do you mean by allowing something like the following?
IMHO that's way too much |
panic is recovered? |
@dominikh true; I was simply pointing at the issue without contemplating a solution :) |
The API change here has the problems identified above with argument evaluation. Also, in general the libraries do not typically phrase functionality in terms of callbacks. If we're going to start using callbacks broadly, that should be a separate decision (and not one to make today). For both these reasons, it seems like .Go is not a clear win. It would be nice to have a vet check that we trust (no false positives). Perhaps it is enough to look for two statements
back to back and reject that always. Thoughts about how to make vet catch this reliably? |
I agree that |
It sounds like we are deciding to make go vet check this and not add new API here. Any arguments against that? |
SGTM |
I've added this proposal to the proposal process bin, but it's blocked on someone figuring out how to implement a useful check. Is anyone interested in doing that? |
Staticcheck has a fairly trivial check: for a GoStmt of a FuncLit, if the first statement in the FuncLit is a call to The check could be trivially hardened by
Edit: which is pretty much what you have suggested in #18022 (comment) |
The addition of WaitGroup.Go in the standard library has been repeatedly proposed and rejected. See golang/go#18022, golang/go#23538, and golang/go#39863 In summary, the argument for WaitGroup.Go is that it avoids bugs like: go func() { wg.Add(1) defer wg.Done() ... }() where the increment happens after execution (not before) and also (to a lesser degree) because: wg.Go(func() { ... }) is shorter and more readble. The argument against WaitGroup.Go is that the provided function takes no arguments and so inputs and outputs must closed over by the provided function. The most common race bug for goroutines is that the caller forgot to capture the loop iteration variable, so this pattern may make it easier to be accidentally racy. However, that is changing with golang/go#57969. In my experience the probability of race bugs due to the former still outwighs the latter, but I have no concrete evidence to prove it. The existence of errgroup.Group.Go and frequent utility of the method at least proves that this is a workable pattern and the possibility of accidental races do not appear to manifest as frequently as feared. A reason *not* to use errgroup.Group everywhere is that there are many situations where it doesn't make sense for the goroutine to return an error since the error is handled in a different mechanism (e.g., logged and ignored, formatted and printed to the frontend, etc.). While you can use errgroup.Group by always returning nil, the fact that you *can* return nil makes it easy to accidentally return an error when nothing is checking the return of group.Wait. This is not a hypothetical problem, but something that has bitten us in usages that was only using errgroup.Group without intending to use the error reporting part of it. Thus, add a (yet another) variant of WaitGroup here that is identical to sync.WaitGroup, but with an extra method. Signed-off-by: Joe Tsai <joetsai@digital-static.net>
The addition of WaitGroup.Go in the standard library has been repeatedly proposed and rejected. See golang/go#18022, golang/go#23538, and golang/go#39863 In summary, the argument for WaitGroup.Go is that it avoids bugs like: go func() { wg.Add(1) defer wg.Done() ... }() where the increment happens after execution (not before) and also (to a lesser degree) because: wg.Go(func() { ... }) is shorter and more readble. The argument against WaitGroup.Go is that the provided function takes no arguments and so inputs and outputs must closed over by the provided function. The most common race bug for goroutines is that the caller forgot to capture the loop iteration variable, so this pattern may make it easier to be accidentally racy. However, that is changing with golang/go#57969. In my experience the probability of race bugs due to the former still outwighs the latter, but I have no concrete evidence to prove it. The existence of errgroup.Group.Go and frequent utility of the method at least proves that this is a workable pattern and the possibility of accidental races do not appear to manifest as frequently as feared. A reason *not* to use errgroup.Group everywhere is that there are many situations where it doesn't make sense for the goroutine to return an error since the error is handled in a different mechanism (e.g., logged and ignored, formatted and printed to the frontend, etc.). While you can use errgroup.Group by always returning nil, the fact that you *can* return nil makes it easy to accidentally return an error when nothing is checking the return of group.Wait. This is not a hypothetical problem, but something that has bitten us in usages that was only using errgroup.Group without intending to use the error reporting part of it. Thus, add a (yet another) variant of WaitGroup here that is identical to sync.WaitGroup, but with an extra method. Signed-off-by: Joe Tsai <joetsai@digital-static.net>
The addition of WaitGroup.Go in the standard library has been repeatedly proposed and rejected. See golang/go#18022, golang/go#23538, and golang/go#39863 In summary, the argument for WaitGroup.Go is that it avoids bugs like: go func() { wg.Add(1) defer wg.Done() ... }() where the increment happens after execution (not before) and also (to a lesser degree) because: wg.Go(func() { ... }) is shorter and more readble. The argument against WaitGroup.Go is that the provided function takes no arguments and so inputs and outputs must closed over by the provided function. The most common race bug for goroutines is that the caller forgot to capture the loop iteration variable, so this pattern may make it easier to be accidentally racy. However, that is changing with golang/go#57969. In my experience the probability of race bugs due to the former still outwighs the latter, but I have no concrete evidence to prove it. The existence of errgroup.Group.Go and frequent utility of the method at least proves that this is a workable pattern and the possibility of accidental races do not appear to manifest as frequently as feared. A reason *not* to use errgroup.Group everywhere is that there are many situations where it doesn't make sense for the goroutine to return an error since the error is handled in a different mechanism (e.g., logged and ignored, formatted and printed to the frontend, etc.). While you can use errgroup.Group by always returning nil, the fact that you *can* return nil makes it easy to accidentally return an error when nothing is checking the return of group.Wait. This is not a hypothetical problem, but something that has bitten us in usages that was only using errgroup.Group without intending to use the error reporting part of it. Thus, add a (yet another) variant of WaitGroup here that is identical to sync.WaitGroup, but with an extra method. Signed-off-by: Joe Tsai <joetsai@digital-static.net>
The addition of WaitGroup.Go in the standard library has been repeatedly proposed and rejected. See golang/go#18022, golang/go#23538, and golang/go#39863 In summary, the argument for WaitGroup.Go is that it avoids bugs like: go func() { wg.Add(1) defer wg.Done() ... }() where the increment happens after execution (not before) and also (to a lesser degree) because: wg.Go(func() { ... }) is shorter and more readble. The argument against WaitGroup.Go is that the provided function takes no arguments and so inputs and outputs must closed over by the provided function. The most common race bug for goroutines is that the caller forgot to capture the loop iteration variable, so this pattern may make it easier to be accidentally racy. However, that is changing with golang/go#57969. In my experience the probability of race bugs due to the former still outwighs the latter, but I have no concrete evidence to prove it. The existence of errgroup.Group.Go and frequent utility of the method at least proves that this is a workable pattern and the possibility of accidental races do not appear to manifest as frequently as feared. A reason *not* to use errgroup.Group everywhere is that there are many situations where it doesn't make sense for the goroutine to return an error since the error is handled in a different mechanism (e.g., logged and ignored, formatted and printed to the frontend, etc.). While you can use errgroup.Group by always returning nil, the fact that you *can* return nil makes it easy to accidentally return an error when nothing is checking the return of group.Wait. This is not a hypothetical problem, but something that has bitten us in usages that was only using errgroup.Group without intending to use the error reporting part of it. Thus, add a (yet another) variant of WaitGroup here that is identical to sync.WaitGroup, but with an extra method. Signed-off-by: Joe Tsai <joetsai@digital-static.net>
Do we need to establish that wg.Wait() is called to issue a diagnostic? That would be what this "races" with. We possibly could just always issue a report when the first statement in the function is |
No. If someone is using WaitGroup without calling Wait, they have bigger problems. |
Change https://go.dev/cl/632915 mentions this issue: |
CL 632915 contains a port of staticcheck's SA2000 checker, whose results on the module mirror corpus had no false positives. I move to accept this proposal and (eventually) add the check to cmd/vet. |
Oops, closed by mistake. The new analyzer is enabled in gopls, but we need to wait for the tree to reopen and this proposal to be approved before adding it to cmd/vet. |
Change https://go.dev/cl/633704 mentions this issue: |
This proposal has been added to the active column of the proposals project |
since there is an active discussion here, I would like to bring to your attention that SA2000 in staticcheck misses real bugs. If you run staticcheck against the following, it fails to find the bug. If you comment the fmt.Print in front of wg.Add(), it finds the bug. In general, as long as there is some code between
$ go install honnef.co/go/tools/cmd/staticcheck@latest SA2000 code is self-explanatory on why this happens. I wrote a linter to address this. Will move to staticcheck github repo to first discuss whether such improvement is welcome or negligible. |
Yes, SA2000 and the port of it in https://go.dev/cl/633704 are intentionally very limited in the patterns they match, to avoid false positives. I suspect that relaxing the "wg.Add must be the first statement" rule would cause the analyzer to report a spurious diagnostic for this program: func f() error {
var wg sync.WaitGroup
wg.Add(1)
go func() {
defer wg.Done()
...
// Create another goroutine.
wg.Add(1) // waitgroup: "WaitGroup.Add called from inside new goroutine"
go func() {
defer wg.Done()
...
}()
}()
return nil
} I am sure there is room for improvement, but in general our inclination is to optimize for fewer false positives, even at the expense of true positives. If you have concrete suggestions (or code) for better algorithms, we can easily test them out on the corpus of the Module Mirror. |
@adonovan yes, you spotted a valid false positive. Here is my linter that uses the golang analysis, not based on staticcheck. It also has fixing logic btw. I scanned our code at Uber, it led to two false positives, which look exactly like what you described. All other reports are true positives. It found one more true positive that SA2000 missed, which is arguably marginal :) |
@adonovan yep, I can take a look. |
From skimming a couple as well (maybe I'll get interested enough to be exhaustive):
These are false positives, the same exception @adonovan pointed out. They could also be written like this though, and I believe it'd incorrectly trigger SA2000 as well: func f() {
var wg sync.WaitGroup
wg.Add(1)
go func() {
wg.Add(1)
defer wg.Done() // done setting up work
...
go func() {
defer wg.Done() // done doing work
...
}()
}()
} but I suspect that's less common code in practice. " I've been trying to think of a check that'd be better at allowing these, because I like "same block" a fair bit, but the closest I can come up with is something like "Add(1) plus includes a deeper nested Done() == ignore". Basically "valid inside invalid is fine". It would definitely miss some, but it might be a good enough sign of "perhaps this is complex enough code that it can be ignored". This one is a bit interesting:
It's essentially inverted, and it's... arguably fine: var wg sync.WaitGroup
wg.Add(1)
// Do some concurrent setting to make sure that it works
for i := 0; i < concurrency; i++ {
go func() {
// Wait for waitGroup so that all goroutines run at basically the same
// time.
wg.Wait()
v.Set("hi")
atomic.AddInt32(&sets, 1)
}()
}
wg.Done() I've used similar constructs to try to maximize racing, but I'm fairly sure this one is ineffective since in practice the var ready, started, complete sync.WaitGroup
ready.Add(1)
started.Add(concurrency)
complete.Add(concurrency)
for i := 0; i < concurrency; i++ {
go func() {
started.Done() // inform that this goroutine has been started
ready.Wait() // wait for all to be started
atomic.AddInt32(&sets, 1)
complete.Done() // this goroutine has finished its work
}()
}
started.Wait() // wait for all goroutines to have been started, and likely at or near Wait
ready.Done() // unblock all ~simultaneously
complete.Wait() // wait for them all to finish which has so far been the most effective way to trigger real races that I've found, by quite a large margin, across various I suspect this would also trigger the linter, on the |
The proposal committee is on board with adding this check to vet. We'll let the domain experts hash out exactly what the right heuristic is. :) |
The API for WaitGroup is very easy for people to misuse. The documentation for
Add
says:However, it is very common to see this incorrect pattern:
This usage is fundamentally racy and probably does not do what the user wanted. Worse yet, is that it is not detectable by the race detector.
Since the above pattern is common, I propose that we add a method
Go
that essentially does theAdd(1)
and subsequent call toDone
in the correct way. That is:The text was updated successfully, but these errors were encountered: