Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triage and improve remote cache hit rate for desktop usage #12203

Open
stuhood opened this issue Jun 14, 2021 · 5 comments
Open

Triage and improve remote cache hit rate for desktop usage #12203

stuhood opened this issue Jun 14, 2021 · 5 comments

Comments

@stuhood
Copy link
Member

stuhood commented Jun 14, 2021

When a remote cache is configured, desktop usage (even among identical platforms) gets lower cache hit rates than CI usage. This is primarily due to inconsistent PATH/env entries between differently configured boxes.

We should:


Some of the differences identified for a series of runs across multiple users:

@stuhood stuhood changed the title Triage remote cache hit rate for desktop usage Triage and improve remote cache hit rate for desktop usage Jun 14, 2021
@jsirois
Copy link
Contributor

jsirois commented Jun 14, 2021

Agreed that measurement is critical, but we know everything downstream of a PEX build will be a cache miss by "design", right?:

# NB: Building a Pex is platform dependent, so in order to get a PEX that we can use locally
# without cross-building, we specify that our PEX command should be run on the current local
# platform.
result = await Get(ProcessResult, MultiPlatformProcess({platform: process}))

@stuhood
Copy link
Member Author

stuhood commented Jun 14, 2021

Agreed that measurement is critical, but we know everything downstream of a PEX build will be a cache miss by "design", right?:
...

This ticket is intended to cover cases where multiple desktop machines are using the same platform: I'll clarify that.

@xlevus
Copy link
Contributor

xlevus commented Jun 14, 2022

Another cause for cache-misses we have theorized to cause issues are API keys. Some of our tests run against 3rd party systems and authenticated via an API key. With each developer having a unique key we believe this would effectively nullify any benefits of remote caching.

Given the design of the system, the API key is irrelevant and ignoring any security concerns a singular key could be used by all developers. Or hash(api_key) could safely return a constant.

@stuhood
Copy link
Member Author

stuhood commented Sep 13, 2022

I did a little bit of thinking about this before we decided to start #13682, so will braindump some of that for now.

I believe that a path forward to allow for "adjusting" the fingerprint that is included in a Process for certain environment variables and absolute file args is essentially to reify them into types which:

  1. Would have their actual string content applied below/after cache lookups. For example:
    • a reified PATH env var containing your HOME directory would not include the HOME portion in the Process' digest, but would use the entire value at execution time.
  2. Describe how to compute a deeper/different fingerprint for the entry which was included for cache lookups. For example:
    • a reified PATH entry might be constructed by the @rule implementer by listing which (sub)processes they thought that a Process would use. Fingerprinting would then collect versions of those processes and apply some rounding to attempt to match.

The hope would be that these types would be composable, such that they didn't add much complexity to constructing a Process.

@Eric-Arellano Eric-Arellano removed their assignment Sep 13, 2022
Eric-Arellano added a commit that referenced this issue Sep 15, 2022
…6874)

First part of #16873, and necessary for #16852

A concrete impact of this for users is that remote cache users can now never share results between different platforms. This was already very likely though due to #12203.
@tgolsson
Copy link
Contributor

I enabled remote caching when upgrading to 2.18, and am seeing similar issues -- but even on the same machine by enabling our pants.ci.toml.

You can see here; that even the fingerprints match, but we get a cache miss when running with only base config:

# WITH pants.ci.toml
11:03:10.24 [DEBUG] remote cache hit for: "Building 2 requirements for ci/emote-override_py.pex from the locks/cpu.lock resolve: coloredlogs~=15.0, ruamel.yaml~=0.16.0" digest=Digest { hash: Fingerprint<4dd1225aba792cee25d7aa09c810024d91f4cb87385044c287d63e0f78b88d34>, size_bytes: 142 } 

# WITHOUT pants.ci.toml
11:49:55.05 [DEBUG] remote cache miss for: "Building 2 requirements for ci/emote-override_py.pex from the locks/cpu.lock resolve: coloredlogs~=15.0, ruamel.yaml~=0.16.0" digest=Digest { hash: Fingerprint<4dd1225aba792cee25d7aa09c810024d91f4cb87385044c287d63e0f78b88d34>, size_bytes: 142 }

This is a pex_binary that we build that only contains two files and two dependencies. We only write the cache from CI, but I assume if I did write locally it'd at least hit it that, but I don't want all users to write to cache to avoid cache pollution. The only thing that could possibly affect this from our config is that we enable pyenv in pants.ci.toml -- we do override the default resolve, but I've made sure to include that on the command line.

I've not dug more into it yet, but I'll diff the actual process invocations when I have time and see if I can prove that it's the Python interpreter location that matters.

For completeness; this is the pants.ci.toml we use:

[GLOBAL]
colors = true
print_stacktrace = true
plugins.add = [
    "hdrhistogram",
]

backend_packages.add = [
    "pants.backend.python.providers.experimental.pyenv",
]

remote_cache_write = true

[stats]
log = true

[test]
use_coverage = true

[coverage-py]
report = ["json"]
global_report = true

[pytest]
args = ["-vv", "--no-header", "--benchmark-disable"]

[python]
default_resolve = "cpu"

[oci]
rootless = false
uid_map = ["0:0:65536"]
gid_map = ["0:0:65536"]

[pyenv-python-provider]
installation_extra_env_vars = [
    "PYTHON_CONFIGURE_OPTS=--with-lto=thin",
    "PYTHON_CFLAGS=-march=native -mtune=native",
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants