Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an intrinsic for capturing Snapshots from absolute paths #10842

Open
stuhood opened this issue Sep 23, 2020 · 3 comments
Open

Add an intrinsic for capturing Snapshots from absolute paths #10842

stuhood opened this issue Sep 23, 2020 · 3 comments

Comments

@stuhood
Copy link
Member

stuhood commented Sep 23, 2020

There are a few usecases for "absolute" file watching+capturing: #10769, #10360, #10837, etc. A likely end-user API would be to add PathGlobsAndRoot->(Paths|Digest) intrinsics.


Currently Pants' file watching+capturing intrinsics (PathGlobs->(Paths|Digest)) operate relative to the buildroot, for two reasons:

  1. the vast majority of captured files will/should be located inside the buildroot, and that should be the happy path
  2. we used to use watchman for file watching, and it did not (easily) support watching files at a more fine-grained level than the buildroot

The second point is now somewhat historical: since we switched to the notify crate, we can more easily watch more locations. But OSX still places bounds on how many locations you might reasonably watch (see), and so it's possible that:

  1. the API should be constrained to ensure that we don't end up watching too many directories
  2. we should add polling, but only for paths outside the buildroot on OSX

The paths that would be explored via #10769 in particular will generally involve chains of symlinks: PathGlobs expansion is aware of those symlinks (and additionally tries to traverse their parents...), and the result would be watches installed in various places throughout /etc, /usr, and ... etc. It's possible that the total number of watches would be small enough that this would be a non-issue though.

@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Sep 27, 2020

#10870 describes a method for making mutable caches remote-friendly. In that issue, it currently describes placing a digest_hint file to avoid having to re-snapshot the entire cache each time. However, in #10864, I realized there are use cases (like the MyPy non-append-only cache) which need to re-snapshot the cache dir anyway.

I think that using polling (or the notify crate) for directories outside the buildroot as you've described here is likely a better approach than placing digest_hint files everywhere, and could solve the same problem of making mutable caches remote-friendly. I am aware we already have a method to solve that with platform properties, but it requires support from the remexec backend which doesn't work yet.

In summary, I think that you've described a capability which could be extended to make the existing mutable cache feature remote-friendly without upstream support, and is I think better more generally useful than #10870.

I would also possibly add #10864 to the list of issues this could fix then -- while parenting the MyPy daemon would solve the local execution case, we could also keep track of the MyPy cache directory to make it remote-friendly.

@stuhood
Copy link
Member Author

stuhood commented Jul 13, 2021

Both our existing pyenv scraping and the upcoming ASDF integration at #12028 should be using this API, but aren't.

@stuhood
Copy link
Member Author

stuhood commented Sep 8, 2022

This is likely related to #16800, in service of #13682.

In the context of the work on #13682, the intrinsic described on this ticket would be environment-specific: i.e., when in a __local__ environment, it would execute directly against the filesystem. But when in a docker or remote environment, it would execute inside the image (using whichever implementation was most efficient for that case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants