Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sandbox creation is prohibitively slow for Go first-party package analysis #13409

Closed
Eric-Arellano opened this issue Oct 28, 2021 · 2 comments
Closed
Assignees
Labels
backend: Go Go backend-related issues estimate: ~2W

Comments

@Eric-Arellano
Copy link
Contributor

A trace of running ./pants_from_sources --no-process-execution-local-cache dependencies //:root#./ in https://github.com/toolchainlabs/external-dns: pants_run_2021_10_28_14_52_47_122_d2ce1895125d4e4b8519c7445e94e4d0-trace.json.txt

Screen Shot 2021-10-28 at 3 08 01 PM

The trace reveals that it is taking 9 seconds to set up the chroot for determining FirstPartyPkgInfo, which is almost certainly because it requires copying in the GOPATH (all downloaded modules) so that go list does not complain about missing third-party packages:

input_digest = await Get(
Digest,
MergeDigests(
[pkg_sources.snapshot.digest, go_mod_info.digest, all_third_party_packages.digest]
),
)

On my M1, my machine locks up when trying to run dependencies :: because it results in too much contention from copying GOPATH once per each package. That is, each first-party package (directory) suffers from this problem.

Solution 1: our own analysis

Rather than using go list, we can write our own parser to compute this info:

@dataclass(frozen=True)
class FirstPartyPkgInfo:
"""All the info and digest needed to build a first-party Go package.
The digest does not strip its source files. You must set `working_dir` appropriately to use the
`go_first_party_package` target's `subpath` field.
"""
digest: Digest
subpath: str
import_path: str
imports: tuple[str, ...]
test_imports: tuple[str, ...]
xtest_imports: tuple[str, ...]
go_files: tuple[str, ...]
test_files: tuple[str, ...]
xtest_files: tuple[str, ...]
s_files: tuple[str, ...]

Unlike go list, our parser would not require third-party deps to be present for analysis. We only need the package's source files.

Over time, it would need to get more complex to handle metadata like this:

        // Cgo directives
        CgoCFLAGS    []string // cgo: flags for C compiler
        CgoCPPFLAGS  []string // cgo: flags for C preprocessor
        CgoCXXFLAGS  []string // cgo: flags for C++ compiler
        CgoFFLAGS    []string // cgo: flags for Fortran compiler
        CgoLDFLAGS   []string // cgo: flags for linker
        CgoPkgConfig []string // cgo: pkg-config names

This approach is what rules_go does.

Solution 2: make sandbox creation faster

For example, use #12716. Probably relates to #13390.

@stuhood
Copy link
Member

stuhood commented Oct 28, 2021

Although we will definitely be leaning in to improve sandbox creation (likely via immutable caches or FUSE), in this case, option 1 seems like the better choice. As you mentioned: it's not strictly necessary to download the thirdparty code to extract imports.

@stuhood
Copy link
Member

stuhood commented Nov 5, 2021

Resolved in #13476 (although #12716 will almost certainly be important for the JVM).

@stuhood stuhood closed this as completed Nov 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend: Go Go backend-related issues estimate: ~2W
Projects
None yet
Development

No branches or pull requests

3 participants