Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to revive, drop golint #3945

Merged
merged 6 commits into from
Feb 2, 2021
Merged

Switch to revive, drop golint #3945

merged 6 commits into from
Feb 2, 2021

Conversation

Groxx
Copy link
Contributor

@Groxx Groxx commented Jan 29, 2021

See the individual commits for a detailed breakdown of lints / changes and what they mean / why they are made.
I intend to merge without squashing (as a fast-forward), so they are all retained.

As a high-level tl;dr, these commits:

  • Drop golint, as it has somewhat significant annoyances
  • Add revive, as it's faster, more capable, and configurable
  • Remove a pet peeve lint of mine
  • Add many "free" lints we're already good on
  • Add a couple significant lints, with fixes
  • Add another significant lint, sadly with no fixes yet
  • Last but not least, mention every lint currently available, and briefly documents why it's disabled / if we should consider enabling it. The full list can be seen in the revive repo's readme, and there are additional docs as well: https://github.com/mgechev/revive

@coveralls
Copy link

coveralls commented Jan 29, 2021

Coverage Status

Coverage decreased (-0.07%) to 61.436% when pulling 03664be on lint into c7d2f10 on master.

Copy link
Contributor

@vytautas-karpavicius vytautas-karpavicius left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very nice! It will allow us to improve code quality in the long run.

One question on the initial rule set. As I understand this is currently roughly equivalent to golint + few more rules that you see as high value, but not do not have too much violations currently, correct?

Would it make sense to add all available rules in the config file, but comment some them out with either:

  • TODO: too much failures currently, once fixed enable it
  • We will not use this rule because of [reason]
    That way it would be more visible what we are using, what we are planning to use and what is not suitable for us and why.

@Groxx
Copy link
Contributor Author

Groxx commented Jan 29, 2021

I've kinda debated doing that, yeah. The full list is rather large though, and some of them are just kinda clearly "meh" or "no" due to kinda innately having high false-positives or applying pressures that IMO lead to gaming metrics: https://github.com/mgechev/revive
Like max-public-structs: I'm pretty sure everyone agrees that too many things in one file is undesirable, but... what value is there in an arbitrary limit? Sorta similar for the cyclomatic complexity check - simple is of course better, but artificially forcing a split of a conceptual unit can make it harder to understand.

Also you can eyeball cyclomatic complexity and some other lints easily enough during code review. tbh I'm more interested in the things that are hard to review - those are the ones that are particularly valuable to automate.
unhandled-error / errcheck is an excellent example there: code review rarely has full type signatures and tool hinting available. Catching ignored errors that have no representation in code is next to impossible by eye alone, but almost trivial to automate, and then during review you know it couldn't have been missed. Major brain-power savings.

Anyway though. Yep, I could do that - rejects really only take a few words, and it'd at least document the full list that we've seen.

@Groxx
Copy link
Contributor Author

Groxx commented Jan 29, 2021

ah, to answer the first part of your comment: yep, exactly. I'd include the unhandled-errors immediately, but having too many warnings is almost as bad as having none - newly introduced problems get lost in the noise. collectively I think this only adds ~20, and fixing the string-int-cast ones will cut that in half.

I may add a "lint-next" target to make it easy to find stuff worth picking away at gradually, before enforcing. The others new ones are a bit higher importance though imo.

@Groxx
Copy link
Contributor Author

Groxx commented Jan 29, 2021

All lints now documented and organized!

@Groxx Groxx force-pushed the lint branch 4 times, most recently from 76000c5 to b58a3ee Compare January 30, 2021 04:38
@@ -248,7 +248,7 @@ func (adh *adminHandlerImpl) DescribeWorkflowExecution(
}

shardID := common.WorkflowIDToHistoryShard(request.Execution.WorkflowID, adh.numberOfHistoryShards)
shardIDstr := string(shardID)
shardIDstr := strconv.Itoa(shardID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can drop one of shardID strings here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked into this one a bit more (and failed to make a smaller repro), and have decided to undo it / change it to a form that go vet allows (and is recommended in the github issue thread that lead to the vet check being added) but produces the same output as before.

Since this appears to change how shards in a hash ring are selected, is it safe to change at all? Or could it cause Problems™ on deploy, as different instances will compute different shards for the same keys?

Comment on lines -251 to 252
shardIDstr := string(shardID)
shardIDstr := string(rune(shardID)) // originally `string(int_shard_id)`, but changing it will change the ring hashing
shardIDForOutput := strconv.Itoa(shardID)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mantas-sidlauskas ^ this is the new version

Comment on lines +273 to +274
lint: $(BIN)/revive
$(BIN)/revive -config revive.toml -exclude './canary/...' -exclude './vendor/...' -formatter unix ./... | sort
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strangely it also doesn't exclude vendor by default. maybe if I set my go flags to include -mod=vendor...?

meh. at least you can tell it to ignore it.

@@ -18,8 +18,7 @@ require (
github.com/dmetzgar/goveralls v0.0.3
github.com/eapache/go-resiliency v1.2.0 // indirect
github.com/emirpasic/gods v0.0.0-20190624094223-e689965507ab
github.com/fatih/color v0.0.0-20181010231311-3f9d52f7176a
github.com/fatih/structtag v1.1.0 // indirect
github.com/fatih/color v1.10.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(to pick a random version change)
these changes are the result of a go get for revive, not a go get -u, so it likely has these versions as minimums.
either way though, none of these seem particularly concerning.

@@ -1048,7 +1049,7 @@ func (c *clientImpl) getClientForDomainID(domainID string) (Client, error) {
}

func (c *clientImpl) getClientForShardID(shardID int) (Client, error) {
client, err := c.clients.GetClientForKey(string(shardID))
client, err := c.clients.GetClientForKey(strconv.Itoa(shardID))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't go into any load-distribution system, just an in-memory client cache. safe to change.

Golint is borderline abandonware (all that energy seems to be on `go vet` instead), and doesn't even ignore vendor folders from `./...` despite [having a github issue since 2017](golang/lint#320).
Plus it's non-configurable by design, which IMO encourages some bad habits.  See the next commit for details.

After a brief skim of the current state of things, `revive` popped out as a possibility, and the results are pretty good.
Runtimes on a fairly beefy machine:
```sh
# time make lint (before)
make lint  8.80s user 13.14s system 154% cpu 14.206 total
# time make lint (simply swapping bins)
make lint  10.85s user 16.89s system 149% cpu 18.545 total
# time make lint (new call)
make lint  21.72s user 4.22s system 2555% cpu 1.015 total
```
So that's a **14x reduction** in runtime, and now it's at the point where it's fine to run all the time (~1 second).
Locally, perhaps due to having fewer cores, this goes from 14s -> 2s.  Still substantially better, and still comfortably fast.

This default list also finds an additional 14 items, none major, and they basically match newer `golint` versions / what IDEs will often highlight:
```
# new lint, potentially worth following.
# there are many instances of this in the canary/ folder, but this was ignored earlier, so it is ignored now as well.
service/worker/failovermanager/starter.go:91:9: [context-keys-type] should not use basic type string as key in context.WithValue
# new lint, potentially worth following
service/worker/scanner/shardscanner/scanner_workflow.go:176:2: [if-return] redundant if ...; err != nil check, just return error instead.
# 3 new comment-formatting / exported-type-without-comment additions
```
Nothing significant, but (ignoring the new comment lints) possibly worth tackling, and not excessively noisy.

The following commits add some new lints + fixes, this one just aims for parity.
This does not cover all of the comment-related lints, but it does cover all of the ones that currently produce warnings.
After this, 10 go away, bringing the total to 18.

**Rationale:**

Many of the comment-requiring and comment-formatting lints are focused on both:
1. Encouraging documenting public APIs.  But this repo is effectively entirely private / not intended for people to import.
2. Producing "nice" documentation in `go doc` output.  Whether this makes more readable docs or not is debatable, but I understand the goal.

When *either or both* of those don't apply, the lints just kinda suck.  Their mere existence encourages pointless stuff like:
```go
// ConvertPretty ...
<snip>
func (e *ESql) ConvertPretty(sql string, pagination ...interface{}) (dsl string, sortField []string, err error) {
```
Or this, which appears in 30 places:
```go
// Ptr is a helper function for getting pointer value
func ... Ptr() ...
```
Or this TBD pattern, in **OVER 2000** places:
```go
// WorkflowExecutionCanceledEventAttributes is an internal type (TBD...)
```

It's a waste of time, space, and mental-energy.

In my experience it also implies the code is in a "higher quality" state than it may actually be, as good code often has documentation.
While that's obviously not *actually true*, I've frequently seen people stop reading when they find a documented func regardless of what the documentation actually says, on the assumption that it's either 1) functioning correctly, or 2) intended for broader use.
2 in particular often manifests as code importing some random unrelated package because it happens to have the only documented + commonly-used pointer-conversion-func.  I've seen this in dozens of repositories, it's a plague.

Even if we (think we) aren't vulnerable to that kind of thinking when writing code, it seems worth avoiding since it clearly brings no benefit.

Document what needs documentation.
Don't add documentation that offers nothing beyond the type name (or less).
And let's delete bad docs when we see them.
This finds 21 failures, though only 3 of which are outside of tests.
In practice, these look like:
```go
# service/history/shard/controller.go:381:55: [string-of-int] dubious convertion of an integer into a string, use strconv.Itoa
info, err := c.GetHistoryServiceResolver().Lookup(string(shardID))
```
And they are *nearly always* a bug.

The correlation is so strong, `go test` on 1.15 and newer will by default refuse to compile a package with this code.
They have a pretty good bit on this in the release notes: https://golang.org/doc/go1.15#vet
To copy a relevant snippet from that:
> `string(9786)` does not evaluate to the string `"9786"`; it evaluates to the string `"\xe2\x98\xba"`, or `"☺"`.
So yeah.  That's basically never what is intended.

If you are on 1.15 or newer, you can reproduce these with `go vet -stringintconv ./...`.
Or just run `go test` on an affected package, it'll error rather than run tests.

We should *absolutely* prevent *all* of these.  We can't rely on `go` though, since this repo targets 1.12, so I'm glad this lint exists.

---

There is one spot I *did not* change (behaviorally), because I'm not sure what doing so will do.
The `shardIDstr` in service/frontend/adminHandler.go:251 is used as a hash-ring lookup key.
Changing this will change how shards hash, which could cause issues when deployed.

So, for now at least, I've converted it to the equivalent that the Go github issue recommends: `string(rune(int))`.
This produces the same values as before.

---

Strangely, this misses a fairly trivial case that `go vet` catches:
```go
--- a/service/frontend/adminHandler.go
+++ b/service/frontend/adminHandler.go
@@ -248,7 +248,7 @@ func (adh *adminHandlerImpl) DescribeWorkflowExecution(
        }

        shardID := common.WorkflowIDToHistoryShard(request.Execution.WorkflowID, adh.numberOfHistoryShards)
-       shardIDstr := string(shardID)             // misses this one
+	shardIDstr := string(rune(shardID)) // ...
        shardIDForOutput := strconv.Itoa(shardID) // correct, not changed in this diff

        historyHost, err := adh.GetMembershipMonitor().Lookup(common.HistoryServiceName, shardIDstr)
```
This might be due to the variable being handled correctly once?  I dunno.
I haven't been able to reproduce this with a smaller test, unfortunately.
Thankfully, this repo is open source, so it can serve as the reproduction.
Finds 3 failures that are partially false positives:
```
common/log/panic.go:37:17: [defer] recover must be called inside a deferred function
common/persistence/persistence-tests/executionManagerTestForEventsV2.go:51:7: [defer] recover must be called inside a deferred function
service/history/execution/cache.go:285:14: [defer] recover must be called inside a deferred function
```

These are not *all* incorrect.  This kind of construct is safe, but it will still warn:
```go
recov := func() { recover() }
defer recov()
```
This is the equivalent of `common/log/panic.go` plus lines like this in `service/frontend/adminHandler.go`:
```go
defer log.CapturePanic(adh.GetLogger(), &retError)
```

But this kind is not correct:
```go
recov := func() { recover() }
defer func() { recov() }()
// or
recov()
```
When the containing func is not *directly* executed by `defer`, the `recover()` call *silently does nothing*, which is why this lint exists.
And the non-deferred case is clearly ineffective as well.

The first incorrect example is the same as `service/history/execution/cache.go`, when returned by `GetAndCreateWorkflowExecution`, and deferred in `service/history/historyEngine.go` in `DescribeMutableState`.
There are more examples of this though, the signature of the recover-ing func encourages misuse because it accepts an `error` argument.
The second is the same as `service/frontend/dcRedirectionHandler.go`.  This one is likely not intended to catch anything though.

Even though there is a safe way to do this, and much of the code does so, IMO this is simply too risky to even approach.
So I would like to change ALL uses to get rid of the false positives.
That change will involve making all `recover()` calls explicit, so they can be linted, e.g.:
```go
defer func() {
	recov(recover(), &err)
}()
``
This is noticeably more verbose, but it's trivially linted, and (when enforced) is practically impossible to mess up.

---

This is all pretty easily verified with the following code (comment out one of the `recov()` strategies):
```go
var missed interface{}
var caught interface{}
defer func() { fmt.Println("caught:", caught, "missed:", missed) }()
defer func() { missed = recover() }()
recov := func() { caught = recover() }
defer recov()              // this works
defer func() { recov() }() // this does not
panic("oops")
```
This prints:
```
// defer recov():
caught: oops missed: <nil>
// defer func() { recov() }():
caught: <nil> missed: oops
```
Good: https://play.golang.org/p/4D4Absnk7U7
Bad: https://play.golang.org/p/cX9XnqQgZJz

I honestly have no idea how the Go authors think this is *even remotely* a good idea.  Especially without warnings / errors.
Maybe it's for performance reasons.......?
This set seems useful, and only has one violation:
```
common/testing/event_generator.go:420:2: [modifies-value-receiver] suspicious assignment to a by-value method receiver
```
That has been fixed in this commit.
It appears unused, and is likely just a mistake from copying the value getters.
And I've kept this lint especially because I can't see a legitimate reason to allow it - it's fairly error-prone, due to implicit copies when passed by value.

The other lints are pretty obvious by their name, have zero results currently, and have no significant impact on runtime.
They seem worth maintaining going forward.
With this, `revive.toml` mentions all currently-available lints.

Some seem desirable, but they have more lint failures than I feel like tackling at the moment :)
So they're just documented for now.

---

`unhandled-error` I'd *love* to enable immediately, but ~300 warnings is far too many.
Newly-introduced violations of other lints would just be lost in the noise.

Reducing (or rarely whitelisting) these seems necessary before enabling, which IMO should be considered fairly high priority.
Missing error handling is pretty frequently a source of hard-to-identify misbehavior, and a lint like this is the only feasible way to catch many cases.
It's just far too hard to reliably catch during code review (as evidence: 300 cases!), especially when the returned value(s) are not assigned to anything.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants