Triage and improve remote cache hit rate for desktop usage #12203

stuhood · 2021-06-14T23:20:01Z

When a remote cache is configured, desktop usage (even among identical platforms) gets lower cache hit rates than CI usage. This is primarily due to inconsistent PATH/env entries between differently configured boxes.

We should:

~~add enough information to workunits/metrics to validate that this is the case (i.e. to use a StreamingWorkunitHandler to compare cache lookups across different desktop machines)~~ (done in Add cache and runtime metadata to Process workunits #12469)
Do any sort of PATH/env filtering that we need to to improve our hitrate (potentially related to Make PATH scanning/filtering a native operation #10526 and BinaryPaths should fingerprint found paths. #10769)

Some of the differences identified for a series of runs across multiple users:

PATH/LDFLAGS/CPPFLAGS
- Leaked through in order to allow for compilation of wheels.
The PEX --python-path
The python interpreter used
- Even with an identical PATH string, multiple interpreters might be identified (see Make PATH scanning/filtering a native operation #10526 and BinaryPaths should fingerprint found paths. #10769).

The text was updated successfully, but these errors were encountered:

jsirois · 2021-06-14T23:24:55Z

Agreed that measurement is critical, but we know everything downstream of a PEX build will be a cache miss by "design", right?:

pants/src/python/pants/backend/python/util_rules/pex.py

Lines 679 to 682 in 61193e1

    
           # NB: Building a Pex is platform dependent, so in order to get a PEX that we can use locally 
        
           # without cross-building, we specify that our PEX command should be run on the current local 
        
           # platform. 
        
           result = await Get(ProcessResult, MultiPlatformProcess({platform: process}))

stuhood · 2021-06-14T23:27:29Z

Agreed that measurement is critical, but we know everything downstream of a PEX build will be a cache miss by "design", right?:
...

This ticket is intended to cover cases where multiple desktop machines are using the same platform: I'll clarify that.

xlevus · 2022-06-14T20:36:40Z

Another cause for cache-misses we have theorized to cause issues are API keys. Some of our tests run against 3rd party systems and authenticated via an API key. With each developer having a unique key we believe this would effectively nullify any benefits of remote caching.

Given the design of the system, the API key is irrelevant and ignoring any security concerns a singular key could be used by all developers. Or hash(api_key) could safely return a constant.

stuhood · 2022-09-13T16:57:02Z

I did a little bit of thinking about this before we decided to start #13682, so will braindump some of that for now.

I believe that a path forward to allow for "adjusting" the fingerprint that is included in a Process for certain environment variables and absolute file args is essentially to reify them into types which:

Would have their actual string content applied below/after cache lookups. For example:
- a reified PATH env var containing your HOME directory would not include the HOME portion in the Process' digest, but would use the entire value at execution time.
Describe how to compute a deeper/different fingerprint for the entry which was included for cache lookups. For example:
- a reified PATH entry might be constructed by the @rule implementer by listing which (sub)processes they thought that a Process would use. Fingerprinting would then collect versions of those processes and apply some rounding to attempt to match.

The hope would be that these types would be composable, such that they didn't add much complexity to constructing a Process.

…6874) First part of #16873, and necessary for #16852 A concrete impact of this for users is that remote cache users can now never share results between different platforms. This was already very likely though due to #12203.

tgolsson · 2023-11-27T10:55:39Z

I enabled remote caching when upgrading to 2.18, and am seeing similar issues -- but even on the same machine by enabling our pants.ci.toml.

You can see here; that even the fingerprints match, but we get a cache miss when running with only base config:

# WITH pants.ci.toml
11:03:10.24 [DEBUG] remote cache hit for: "Building 2 requirements for ci/emote-override_py.pex from the locks/cpu.lock resolve: coloredlogs~=15.0, ruamel.yaml~=0.16.0" digest=Digest { hash: Fingerprint<4dd1225aba792cee25d7aa09c810024d91f4cb87385044c287d63e0f78b88d34>, size_bytes: 142 } 

# WITHOUT pants.ci.toml
11:49:55.05 [DEBUG] remote cache miss for: "Building 2 requirements for ci/emote-override_py.pex from the locks/cpu.lock resolve: coloredlogs~=15.0, ruamel.yaml~=0.16.0" digest=Digest { hash: Fingerprint<4dd1225aba792cee25d7aa09c810024d91f4cb87385044c287d63e0f78b88d34>, size_bytes: 142 }

This is a pex_binary that we build that only contains two files and two dependencies. We only write the cache from CI, but I assume if I did write locally it'd at least hit it that, but I don't want all users to write to cache to avoid cache pollution. The only thing that could possibly affect this from our config is that we enable pyenv in pants.ci.toml -- we do override the default resolve, but I've made sure to include that on the command line.

I've not dug more into it yet, but I'll diff the actual process invocations when I have time and see if I can prove that it's the Python interpreter location that matters.

For completeness; this is the pants.ci.toml we use:

[GLOBAL]
colors = true
print_stacktrace = true
plugins.add = [
    "hdrhistogram",
]

backend_packages.add = [
    "pants.backend.python.providers.experimental.pyenv",
]

remote_cache_write = true

[stats]
log = true

[test]
use_coverage = true

[coverage-py]
report = ["json"]
global_report = true

[pytest]
args = ["-vv", "--no-header", "--benchmark-disable"]

[python]
default_resolve = "cpu"

[oci]
rootless = false
uid_map = ["0:0:65536"]
gid_map = ["0:0:65536"]

[pyenv-python-provider]
installation_extra_env_vars = [
    "PYTHON_CONFIGURE_OPTS=--with-lto=thin",
    "PYTHON_CFLAGS=-march=native -mtune=native",
]

stuhood added the remoting and caching label Jun 14, 2021

stuhood changed the title ~~Triage remote cache hit rate for desktop usage~~ Triage and improve remote cache hit rate for desktop usage Jun 14, 2021

benjyw removed the remoting and caching label Sep 9, 2021

stuhood mentioned this issue May 16, 2022

Javascript backend's nodejs should used named cache in Pants instead of system #15489

Closed

stuhood mentioned this issue Jul 6, 2022

WIP/RFC: "Coalesced" process batching #15648

Draft

stuhood assigned Eric-Arellano Jul 6, 2022

stuhood mentioned this issue Aug 18, 2022

Python import parser does not support subprocess_environment or some other way to leak env vars. #16565

Open

Eric-Arellano removed their assignment Sep 13, 2022

stuhood mentioned this issue Sep 14, 2022

New mechanism for marking process outputs as platform agnostic #16873

Open

Eric-Arellano mentioned this issue Sep 14, 2022

Every Process is now platform-specific (impacts remote caching) #16874

Merged

stuhood mentioned this issue Sep 27, 2022

go: cgo fixes plus add tests for each supported language #17018

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triage and improve remote cache hit rate for desktop usage #12203

Triage and improve remote cache hit rate for desktop usage #12203

stuhood commented Jun 14, 2021 •

edited

Loading

jsirois commented Jun 14, 2021 •

edited

Loading

stuhood commented Jun 14, 2021 •

edited

Loading

xlevus commented Jun 14, 2022

stuhood commented Sep 13, 2022

tgolsson commented Nov 27, 2023

Triage and improve remote cache hit rate for desktop usage #12203

Triage and improve remote cache hit rate for desktop usage #12203

Comments

stuhood commented Jun 14, 2021 • edited Loading

jsirois commented Jun 14, 2021 • edited Loading

stuhood commented Jun 14, 2021 • edited Loading

xlevus commented Jun 14, 2022

stuhood commented Sep 13, 2022

tgolsson commented Nov 27, 2023

stuhood commented Jun 14, 2021 •

edited

Loading

jsirois commented Jun 14, 2021 •

edited

Loading

stuhood commented Jun 14, 2021 •

edited

Loading