Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect parsing of version into url hostname #4625

Closed
jeffwidman opened this issue Jan 11, 2022 · 8 comments · Fixed by #4910
Closed

Incorrect parsing of version into url hostname #4625

jeffwidman opened this issue Jan 11, 2022 · 8 comments · Fixed by #4910
Labels
L: go:modules Golang modules T: bug 🐞 Something isn't working

Comments

@jeffwidman
Copy link
Member

Saw a weird error about cannot reach the URL for a particular repo. Digging further, I found the following in the log:

updater | INFO <job_254312325> Updating github.com/company/redactedrepo from 0.49.0 to 0.50.0
  proxy | 2022/01/11 09:37:01 [094] GET https://github.com:443/company/redactedrepo/info/refs?service=git-upload-pack
  proxy | 2022/01/11 09:37:01 [094] * authenticating git server request (host: github.com)
  proxy | 2022/01/11 09:37:01 [095] GET https://v0.50.0:443/?go-get=1
  proxy | 2022/01/11 09:37:01 [095] WARN: Cannot read TLS response from mitm'd server dial tcp: lookup v0.50.0 on 8.8.8.8:53: no such host
  proxy | 2022/01/11 09:37:01 [097] GET https://v0.50.0:443/?go-get=1
  proxy | 2022/01/11 09:37:01 [097] WARN: Cannot read TLS response from mitm'd server dial tcp: lookup v0.50.0 on 8.8.8.8:53: no such host
  proxy | 2022/01/11 09:37:01 [094] 200 https://github.com:443/company/redactedrepo/info/refs?service=git-upload-pack
updater | INFO <job_254312325> Handled error whilst updating github.com/company/redactedrepo: git_dependencies_not_reachable {:"dependency-urls"=>["github.com/company/redactedrepo v0.50.0\ngo get"]}

Looks to me like the version number was extracted and then for some reason it was used as the hostname?

I took a quick look at git history of the go modules code, and nothing changed recently. Maybe something changed in a lower layer of general URL parsing within dependabot?

@jeffwidman jeffwidman added the T: bug 🐞 Something isn't working label Jan 11, 2022
@mctofu mctofu added the L: go:modules Golang modules label Jan 11, 2022
@mctofu
Copy link
Contributor

mctofu commented Jan 11, 2022

That is pretty strange. Those requests should all be coming from go get so I think either the version was treated as a separate arg (go get -d github.com/company/redactedrepo v0.50.0 instead of go get -d github.com/company/redactedrepo@v0.50.0) or perhaps something was parsed correctly from the go.mod.

When did you first see this issue for this repo? We have been rolling out some changes to the GOPRIVATE setting (#4568) and could try disabling them for this repo in the off-chance it's causing the issue.

@jeffwidman
Copy link
Member Author

When did you first see this issue for this repo?

I first noticed them this morning, but I haven't been watching the dependabot runs closely, as most of our repos are suffering from #4536.

Here's another Job ID with similar failure but a different failing dep: 254513777

Agreed that it is weird. If I get time, I'll try to see if I can repro this locally, but that may not happen for a bit.

@mctofu
Copy link
Contributor

mctofu commented Jan 11, 2022

I see this same error has occurred back in Dec 16, 2021 so that would rule out the GOPRIVATE setting.

@jeffwidman
Copy link
Member Author

jeffwidman commented Mar 8, 2022

I dug into this a little more...

_, stderr, status = Open3.capture3(environment, command)
is returning the error string:

"go: downloading github.com/<redacted>/<private-repo-that-we-are-directly-running-dependabot-against> v0.39.0\ngo get: reading google.golang.org/grpc/go.mod at revision v1.33.0: unknown revision v1.33.0\n"

That is correctly caught as a repo resolvability error and ends up here:

repo_error_regex = REPO_RESOLVABILITY_ERROR_REGEXES.find { |r| stderr =~ r }
if repo_error_regex
error_message = filter_error_message(message: stderr, regex: repo_error_regex)
ResolvabilityErrors.handle(error_message, credentials: credentials, goprivate: @goprivate)

But then things go sideways, because this line grabs the wrong repo URL:

mod_path = message.scan(GITHUB_REPO_REGEX).last

Look again at the error message above... it's giving the name of the repo that we are running dependabot against, which is located on GitHub. But then the subdependency which is actually causing the error is not on github. The regex is looking for the last url that contains "github", so it grabs the wrong url.

At that point, the code starts working as expected... it throws a GitDependenciesNotReachable which then manifests as this red herring error message about needing access to private registries.

So there's four variants here:

  1. neither url is github -> no match as expected - there's already a test case for this.
  2. both urls are github -> matches the final one as expected, couldn't find a test case
  3. first url is github, second url is not -> unhandled
  4. first url is not github, second url is github -> matches the final one as expected, couldn't find a test case

Unfortunately, while looking at the code, it looks like case 2/4 are not handled properly within the url splitting of

mod_split = mod_path.split("/")
repo_path = if mod_split.size > 3
mod_split[0..2].join("/")
else
mod_path
end

For example, if I modify the original error message so that both URLs point at github:

"go: downloading github.com/<redacted>/<redacted> v0.39.0\ngo get: reading github.com/grpc/go.mod at revision v1.33.0: unknown revision"

results in a repo_path value of "github.com/grpc/go.mod at revision v1.33.0"... when it should probably be "github.com/grpc"

So there's a few things going on here...

The nice part is that once the three new test cases are added, it should be easy to repro/validate fixes.

@mctofu
Copy link
Contributor

mctofu commented Mar 9, 2022

Ah, nice find! The purpose of that code is to raise Dependabot::GitDependenciesNotReachable if the error is a potential access issue for a github.com repo. In that case the error presentation to the user may suggest adding it to the private repo access setting. If the access issue is for a non github hosted module then it's fine to return Dependabot::DependencyFileNotResolvable instead.

@jeffwidman
Copy link
Member Author

jeffwidman commented Mar 15, 2022

I was trying to create a test project for this, and noticed the following little nuance... this error only shows up when the go mod cache is empty:

# first run throws full error - note that repo2 is a direct dependency of repo1
[dependabot-core-dev] ~/dependabot-core/tmp/<redacted-org>/repo1 $ GOPRIVATE=* go get -d github.com/<redacted-org>/repo2@v0.39.0
go: downloading github.com/<redacted-org>/repo2 v0.39.0
go get: reading google.golang.org/grpc/go.mod at revision v1.33.0: unknown revision

# second run doesn't include the `go: downloading...` line
[dependabot-core-dev] ~/dependabot-core/tmp/<redacted-org>/repo1 $ GOPRIVATE=* go get -d github.com/<redacted-org>/repo2@v0.39.0
go get: reading google.golang.org/grpc/go.mod at revision v1.33.0: unknown revision v1.33.0

# clean the modcache
[dependabot-core-dev] ~/dependabot-core/tmp/<redacted-org>/repo1 $ go clean -modcache

# the full error shows up again
[dependabot-core-dev] ~/dependabot-core/tmp/<redacted-org>/repo1 $ GOPRIVATE=* go get -d github.com/<redacted-org>/repo2@v0.39.0
go: downloading github.com/<redacted-org>/repo2 v0.39.0
go get: reading google.golang.org/grpc/go.mod at revision v1.33.0: unknown revision

So the fix will probably be to catch / strip that go: downloading ... line if it matches the repo that is currently being go get'ed... I wonder if other errors might be hiding behind this as well?

@jeffwidman
Copy link
Member Author

Actually, thinking about this more... as a user, it's helpful to know which direct dep is triggering the failing indirect dep.
So while we need to strip out the go: downloading... line in order to know how to handle the error, ideally when surfacing the final error message back to the user it'd be helpful to add it back.

That may not be possible with the current code structure, and it's not a deal breaker if so because they can use other go cmd tooling to dig into it, but it would help.

jeffwidman added a commit to jeffwidman/dependabot-core that referenced this issue Mar 15, 2022
When `go get -d github.com/org/repo@v1.2.3` starts with an empty module
cache, it appears to prepend an additional status string (copied from dependabot#4625 (comment)):

    # first run throws full error - note that repo2 is a direct dependency of repo1
    [dependabot-core-dev] ~/dependabot-core/tmp/<redacted-org>/repo1 $ GOPRIVATE=* go get -d github.com/<redacted-org>/repo2@v0.39.0
    go: downloading github.com/<redacted-org>/repo2 v0.39.0
    go get: reading google.golang.org/grpc/go.mod at revision v1.33.0: unknown revision

    # second run doesn't include the `go: downloading...` line
    [dependabot-core-dev] ~/dependabot-core/tmp/<redacted-org>/repo1 $ GOPRIVATE=* go get -d github.com/<redacted-org>/repo2@v0.39.0
    go get: reading google.golang.org/grpc/go.mod at revision v1.33.0: unknown revision v1.33.0

    # clean the modcache
    [dependabot-core-dev] ~/dependabot-core/tmp/<redacted-org>/repo1 $ go clean -modcache

    # the full error shows up again
    [dependabot-core-dev] ~/dependabot-core/tmp/<redacted-org>/repo1 $ GOPRIVATE=* go get -d github.com/<redacted-org>/repo2@v0.39.0
    go: downloading github.com/<redacted-org>/repo2 v0.39.0
    go get: reading google.golang.org/grpc/go.mod at revision v1.33.0: unknown revision

It's currently unclear if this is only thrown on `unknown revision`
errors, or any generic error that happens from a clean module cache. I
suspect the latter, but haven't yet taken the time to dive into the `go
get` source.

Most likely this will be fixed by stripping out the status line very
early in the error handling pipeline, and then passing along the normal
error.

But the first step is getting a working test case to repro the error...
this PR is currently a WIP at doing that, and then will be gradually
updated to actually fix the issue.

Note that I don't know much Ruby, let alone Rspec, so once I hack
together a semi-workable solution this will definitely need some further
polish from actual Rubyists before being ready for merge.

Fix dependabot#4625
jeffwidman added a commit to jeffwidman/dependabot-core that referenced this issue Mar 26, 2022
…rror

When `go get github.com/org/repo@v1.2.3` starts with an empty module
cache, it appears to prepend an additional status string (copied from dependabot#4625 (comment)):

    # first run throws full error - note that repo2 is a direct dependency of repo1
    [dependabot-core-dev] ~/dependabot-core/tmp/<redacted-org>/repo1 $ GOPRIVATE=* go get -d github.com/<redacted-org>/repo2@v0.39.0
    go: downloading github.com/<redacted-org>/repo2 v0.39.0
    go get: reading google.golang.org/grpc/go.mod at revision v1.33.0: unknown revision

    # second run doesn't include the `go: downloading...` line
    [dependabot-core-dev] ~/dependabot-core/tmp/<redacted-org>/repo1 $ GOPRIVATE=* go get -d github.com/<redacted-org>/repo2@v0.39.0
    go get: reading google.golang.org/grpc/go.mod at revision v1.33.0: unknown revision v1.33.0

    # clean the modcache
    [dependabot-core-dev] ~/dependabot-core/tmp/<redacted-org>/repo1 $ go clean -modcache

    # the full error shows up again
    [dependabot-core-dev] ~/dependabot-core/tmp/<redacted-org>/repo1 $ GOPRIVATE=* go get -d github.com/<redacted-org>/repo2@v0.39.0
    go: downloading github.com/<redacted-org>/repo2 v0.39.0
    go get: reading google.golang.org/grpc/go.mod at revision v1.33.0: unknown revision

This `go: downloading abc` line causes the entire multi-line string to
match the regex for unknown revision:

```
go: .*: unknown revision
```

However, it doesn't match the single line that we care about:
```
go get: reading google.golang.org/grpc/go.mod at revision v1.33.0: unknown revision
```

So within `filter_error_message()`, the entire multi-line string is
returned, even though the first line is a simple status message, and we
only care about the second line:
https://github.com/dependabot/dependabot-core/blob/bc6162c1ae44c431ae2f394dccd270a762da6af8/go_modules/lib/dependabot/go_modules/file_updater/go_mod_updater.rb#L266-L272

Because the entire line matches the regex, it's still correctly caught
as a repo resolvability error and ends up here:
https://github.com/dependabot/dependabot-core/blob/c4f3e6eae592c5c42332719b9e597c8bd47a2db9/go_modules/lib/dependabot/go_modules/file_updater/go_mod_updater.rb#L243-L246

But then things go sideways, because this line grabs the wrong repo URL:
https://github.com/dependabot/dependabot-core/blob/c4f3e6eae592c5c42332719b9e597c8bd47a2db9/go_modules/lib/dependabot/go_modules/resolvability_errors.rb#L9

And from there it eventually manifests as the inscrutable error:
```
updater | INFO <job_254312325> Handled error whilst updating github.com/company/redactedrepo: git_dependencies_not_reachable {:"dependency-urls"=>["github.com/company/redactedrepo v0.50.0\ngo get"]}
```

While there's a number of additional defensive checks that can be added
later on in the process, ultimately the root problem is that the regex
simply needs to check for `go get:`, not just `go:`.

Even more confusingly, this problem seems to have disappeared with `go`
`1.18`. When I dug into the logs, it's because the `go get:` line now
reports itself as `go:`, so the original regex now matches.

But we've seen this `get` come and go... for example `go` `1.18`
actually added it back for something else: dependabot#4868

I don't actually know if the root cause is racy output within the `go
get` tooling (ie, intermittently appears in all versions), or if
refactoring from one `go` version to the next changes which section of
code outputs the status messages.

So in summary, we should add a non-capturing group to handle if/when the
`get` ever re-appears.

Fix dependabot#4625
@jeffwidman
Copy link
Member Author

Fixed by #4910

Defensive to prevent similar bugs: #4911

Defensive to emit the error message sooner in the pipeline if something is amiss: #4913

That should take care of everything I commented on here... My original take that the mod_path parsing needed improvement was incorrect, because I didn't realize the underlying problem was that filter_error_message() was returning a multi-line string when it should have returned a single-line string with only a single URL.

jeffwidman added a commit to jeffwidman/dependabot-core that referenced this issue Dec 16, 2023
Part of the reason
dependabot#4625 was so
difficult to debug was the original error messages were a total red
herring. They appeared to be related to IPs, but digging deeper realized
they were actually due to errors happening earlier in the processing
pipeline but not caught until several steps down the line.

So this adds some defensive checks so that if there's an upstream error,
we catch it sooner and report the proper error message(s).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
L: go:modules Golang modules T: bug 🐞 Something isn't working
Projects
None yet
2 participants