Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype switching to the notify crate for file invalidation #4999

Closed
stuhood opened this issue Oct 19, 2017 · 17 comments · Fixed by #9714
Closed

Prototype switching to the notify crate for file invalidation #4999

stuhood opened this issue Oct 19, 2017 · 17 comments · Fixed by #9714
Assignees

Comments

@stuhood
Copy link
Member

stuhood commented Oct 19, 2017

Watchman no longer gives us useful events when the children of directories are created/deleted, which is a large and surprising change in behaviour that means we need to over-invalidate (or add additional syscalls to differentiate creates/updates/deletes).

We should explore whether using something like the notify crate would result in fewer assumptions... we already know that it would result in one fewer daemon and less memory usage (as we have no need to track anything that isn't warm in the graph).

The primary uncertainty is whether consuming a queue of events in process would require filtering to decide which events to ignore, and which require additional syscalls to determine their actual meaning: for example:

  • could we easily filter out accesstime updates?
  • could we watch modification/creation time of directories (which sidesteps the need to invalidate parents based on children)?

Also, even if this looks like a good approach, it should likely be enabled behind a pants global option, so that both watchman and the notify crate can be used at once (EDIT at @illicitonion's suggestion) for a while.

@kwlzn
Copy link
Member

kwlzn commented Oct 19, 2017

thinking that this is probably a longer-term goal in the face of our current priorities, but agree it would be nice to fold filesystem event monitoring directly into the daemon if we can do it as good or better than watchman.

we should probably also attempt to make a case upstream for feature flagging the behavior we want if that's the primary driving factor here.

stuhood pushed a commit that referenced this issue Oct 19, 2017
### Problem

Watchman `4.9.0` does not send events for a parent when a child is created or deleted, which causes issues like #4662.

### Solution

Revert to explicitly invalidating the parent directory when a child has changed. More precision is possible here, but it's possible that re-gaining the precision by using the `notify` crate would be preferable (see #4999).

### Result

Fixes #4662.
@stuhood
Copy link
Member Author

stuhood commented Dec 5, 2017

Another relevant bit here is that watchman walks the entire filesystem during initialization, which is strictly more* (*fixed) than we need to do, afaict.

@stuhood
Copy link
Member Author

stuhood commented Dec 6, 2017

Assigning @jsirois : updated the description to make it clear that even if we decide to ship this, we'll want to support toggling between watchman and native-notify via a global flag.

@stuhood
Copy link
Member Author

stuhood commented Feb 21, 2018

@illicitonion : Assuming you're not taking this one, let's leave it assigned to @jsirois .

@jsirois : If you're able to work on this one next week, it would be much appreciated. I can lay some groundwork this week to unblock you (basically, moving the "fork lock" to the rust side in order to make the background thread that @illicitonion determined would be necessary here viable).

@illicitonion
Copy link
Contributor

Yeah, enjoy @jsirois!

@stuhood
Copy link
Member Author

stuhood commented Feb 25, 2018

Actually @jsirois, I think we need you on #5426 with higher priority... but perhaps you can start this later in the week.

@jsirois
Copy link
Contributor

jsirois commented Feb 25, 2018

Ack. I'll be back on it tomorrow morning.

@stuhood

This comment has been minimized.

@illicitonion

This comment has been minimized.

@stuhood

This comment has been minimized.

@kwlzn kwlzn added this to the Daemon Post-Beta milestone Mar 1, 2018
@jsirois
Copy link
Contributor

jsirois commented Mar 14, 2018

Notes about API: It would be nice to just be able to ask for changes since X for a root. Even better would be where X is not a time but a token and an initial scan of a root returns a token. This would allow any weirdness with time to be held behind the API.

@stuhood
Copy link
Member Author

stuhood commented Dec 12, 2019

I kicked off a small branch for this over the holiday: master...twitter:stuhood/notify-crate ... it does not yet work, but it compiles.

It has a number of important TODOs before it is landable:

  1. The InvalidationWatcher::watch method is blocking, and (likely) makes syscalls in order to accomplish what it is doing. We call it in the branch from an async context here: master...twitter:stuhood/notify-crate#diff-11db197014e1e80c9551e9b61aba6aa0R1224 ... that is almost certainly making things slower than they should be. We should probably give the InvalidationWatcher a reference to the Core's Executor in here (by just cloning it: it is internally reference counted), and then having the fn watch method return an impl Future that we would consume in Node::run instead.
  2. It's not clear that we're actually receiving all of the watch events that we're expecting: should continue to add logging (and confirm that logging is actually working from the background thread that the InvalidationWatcher starts). Should run with something like MODE=debug ./pants --enable-pantsd list 3rdparty:: and watch .pants.d/pantsd/pantsd.log to see invalidation log messages: the goal would be to receive similar log messages for this codepath as we do in the existing watchman invalidation path here.

The first thing is purely a performance improvement, but I'd suggest looking at it first because it will make debugging the second thing easier not to have to wait long periods for pantsd to start up.

Fixing those two things would leave us in a place where we are using two systems to invalidate, but hopefully in a way that has no noticeable impact for end users.


Then, in a second PR, we should allow for watchman to be disabled, most likely by making edits to the python-side FSEventService

  1. We need to add code there that would be able to cleanly shutdown if file watching failed, likely by polling the InvalidationWatcher::running method.
  2. We should add an option to allow for disabling watchman, and skip launching it in/around the FSEventService codepaths.

@stuhood
Copy link
Member Author

stuhood commented Feb 19, 2020

Keith got a draft of this out here: #9089: it contains parts of the first two items above. Thanks Keith!

@hrfuller hrfuller self-assigned this Mar 12, 2020
@hrfuller
Copy link
Member

The implementation of notify watcher in #9318 leads to some user noticeable performance hits for users who don't have pantsd enabled. It's possible that there are some improvements we can make by ignoring files that are already gitignored that don't need to be watched for events. This would be an ad-hoc solution to make notify faster for us but does nothing for consumers, who would likely have to do the same thing. (Link to issue about ignoring based on project .gitignore?)

Questions that will affect how we move forward:

  1. Are we planning to enable pantsd by default in the near term? If so it may be acceptable to enable notify for everyone now. If not, then we likely need a way to disable it that is coupled to whether pantsd is disabled.
  2. Corollary to 1: If we need a feature gate for notify, do we need it in Add notify fs watcher to engine. #9318 or can it wait for the follow up PR described in "Next Steps".

Next Steps
The plan to complete the migration to notify away from watchman is to follow up #9318 with a PR which allows opt-in to disable watchman, and in the longer term, deprecating it entirely. However this assumes that the performance of using notify is good enough for everyone that we can enable notify by default. If it isn't, and we don't plan to enable pantsd by default then we may have to maintain multiple watchers until we decide to. @stuhood suggested some kind of tri-state flag which is either {notify, notify+watchman, watchman} though I don't understand the use case of the notify+watchman mode.

@Eric-Arellano
Copy link
Contributor

though I don't understand the use case of the notify+watchman mode.

Without having had looked closely at the implementation, that does indeed sound difficult and error-prone to implement. I would think either notify or watchman would be sufficient? If our goal is to remove watchman, then using notify + watchman should be redundant because notify presumably does the same thing.

@hrfuller
Copy link
Member

that does indeed sound difficult and error-prone to implement

Error prone yes, just because watchman is already error-prone, and we would have to maintain more code. Difficult, not really, currently in #9318 the default is to use both simultaneously. They both hook in to the same code path in the engine, but we get redundant invalidation.

@hrfuller
Copy link
Member

hrfuller commented Mar 25, 2020

Based on changes implemented in 3f52ee5 to improve latency of setting up file watches it seems like we are no longer between a rock and a hard place with how to move forward! Watching the entire build root at startup time adds minimal overhead to pants (#9318 (comment)). We can safely land #9318 without any penalty to users, so we don't need to put it behind a feature gate. This means the next steps are much clearer:

Revised Next Steps

  • Introduce a follow up PR after Add notify fs watcher to engine. #9318 lands to allow users to disable watchman and rely solely on the engine to invalidate nodes.
  • Follow up with another PR which optimizes node invalidation to only invalidate what we need. This means ignoring events from files that aren't relevant, essentially files that are .gitignored, for instance vim .swp files or other hidden files used by IDE's and source control tools.
  • Set watchman to be off by default.
  • Eventually deprecate and remove watchman from pants once we have experience using the engine invalidator in the wild.

stuhood pushed a commit that referenced this issue May 7, 2020
### Problem

Using the notify crate for file invalidation is now feature complete, and so we are using two file watching implementations simultaneously under `pantsd`.

### Solution

Disable watchman by default, and deprecate it. Additionally, remove the implementation of the `experimental` flag from notify. Fixes #8710, fixes #8198, fixes #7375, fixes #5487, fixes #4999, fixes #3479.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants