-
-
Notifications
You must be signed in to change notification settings - Fork 636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add notify fs watcher to engine. #9318
Add notify fs watcher to engine. #9318
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super exciting, this looks great!
It would be good to see a few tests for:
- Watching a symlink, and {deleting,repointing} it as a well as changing the contents of the file it points at
- Watching the target of a symlink
- Watching a chain of several symlinks
- Creating or deleting a file invalidating the readdir of the containing directory
- Deleting or renaming a directory invalidating the reading of files in that directory
@illicitonion : All of these are handled in the
These would be good to do though. @hrfuller : one thing to note with regard to testing is that because pants/tests/python/pants_test/engine/test_fs.py Lines 232 to 233 in 3c3aaed
|
Relativize paths returned from notify to the build_root. Refactor invalidate method to be an associated method on the InvalidationWatcher.
* Use spawn on io pool instead of custom future impl * Write python fs tests * Relativize paths to invalidate to build root * invalidate nodes with parent paths. * Comments
Make some things public so we can use them in tests. Use canonical path to build root for relativizing changed paths.
ae5c9dc
to
3bad8b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a ton! Nothing blocking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good :) I would appreciate a few more tests in rust before merging, as I suspect we may be over-invalidating a fair bit (and possibly under-invalidating in some places) - a few more tests would help set my mind at ease!
use fs::File; | ||
use testutil::make_file; | ||
|
||
#[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to see unit tests here also for:
build_root/dir/subdir/file
changing only invalidatingbuild_root/dir/subdir/file
build_root/dir/subdir/file
being created or deleted invalidatingbuild_root/dir/subdir/file
andbuild_root/dir/subdir
- A symlink target being changed invalidating exactly the symlink
- For
build_root/dir/subdir/file
,subdir
being deleted invalidatingbuild_root/dir
,build_root/dir/subdir
andbuild_root/dir/subdir/file
- build_root being a symlink
- Changing the executable bit on a file invalidating that file
Hopefully these are all nicely extractable into common code so that each test is just:
- Set up a filesystem
- Call a function
- Modify the filesystem
- Make an assertion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be at least one followup patch in this series: I think investing more time in testing there would be reasonable.
But I also think that some of those tests should likely be in other modules: this module should probably only be testing "point" invalidation (since that is all that it actually implements), and then a separate module could test impl VFS for Context
.
Return watch errors as core::Failure all the way to user. Check notify event type when deciding whether to invalidate parents. Move task executor onto invalidation watcher. Move test_support trait impl into test_support mod.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an issue with this revision: see the comment in nodes.rs
.
To confirm that we're not missing anything else like that, please ensure that you do something like timing the average of five runs of ./pants list ::
(without pantsd) before/after the patch. If it's within 5 or 10% (we should expect some overhead), then we're good.
with file watching. Format watch and nodes files.
[ci skip-jvm-tests] # No JVM changes made.
…t interfere" This reverts commit 974b3ce. [ci skip-jvm-tests] # No JVM changes made.
# Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests] # No JVM changes made.
# Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests] # No JVM changes made.
Perf numbers
This branch
A note on these numbers. From the first entry in each list it's clear that we take a significant performance hit for startup runs when we have to watch alot of files. As far as I can tell this just comes down to it being somewhat slow to add watches to files, or possibly to the notify-rs implementation. Could we parallelize more here, if we are waiting for a lock on the watcher too long? It's also interesting that the overhead from watching files becomes negligible compared to startup time for smaller sets of targets (src/::). The situation in general gets much better when pants.d is enabled, even if significant amounts of nodes are invalidated. Interested to hear thoughts on if this is acceptable and how to proceed if its not. |
Commented in Slack: it looks like there's an issue with the first line there. |
Updated the first line, and yes unfortunately it doesn't track with the other lines. Either there are some files still hiding in the repo that are being git ignored but not pants ignored, or the way we are watching files is non-linear: we could be watching the same file multiple times, but we shouldn't be if Scandir nodes are run once per dir. |
# Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests] # No JVM changes made.
@@ -134,7 +143,7 @@ def test_walk_parent_link(self): | |||
|
|||
def test_walk_escaping_symlink(self): | |||
link = "subdir/escaping" | |||
dest = "../../this-is-probably-nonexistent" | |||
dest = "../../" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happened here? AFAICT, it should be fine to leave this behind, because we shouldn't actually get to the point of looking for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test failed because we started propagating errors from notify, in this case that the path it was trying to watch didn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dest path still tries to link outside of the build root so the test should be the same.
# Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests] # No JVM changes made.
build root at startup. On Linux watch individual directory roots. [ci skip-jvm-tests] # No JVM changes made.
Revised Perf numbers after recursive watch on OSX: Looks like watching is essentially free now! This branch:
master
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
It would be good to confirm that we don't see a substantial slowdown on Linux, but based on the investigation yesterday I'm optimistic, and ok doing that post merge.
on linux courtesy of @gshuflin we have
|
…tsbuild#9318 by default, to mitigate issues seen in pantsbuild#9415 until a fix is in place. [ci skip-jvm-tests] # No JVM changes made.
* Use the notify crate to implement an `InvalidationWatcher` for Graph operations. * Make watch async, and watcher log to pantsd.log. Relativize paths returned from notify to the build_root. Refactor invalidate method to be an associated method on the InvalidationWatcher. * Respond to feedback. * Use spawn on io pool instead of custom future impl * Write python fs tests * Relativize paths to invalidate to build root * invalidate nodes with parent paths. * Comments * Add rust tests. Make some things public so we can use them in tests. Use canonical path to build root for relativizing changed paths. * Refactor Python tests. Return watch errors as core::Failure all the way to user. Move task executor onto invalidation watcher. Move test_support trait impl into test_support mod. * use futures lock on watcher * Platform specific watching behavior. On Darwin recursively watch the build root at startup. On Linux watch individual directory roots. Co-authored-by: Stu Hood <stuhood@gmail.com>
…tsbuild#9318 (pantsbuild#9416) * Add a feature gate to disable the engine fs watcher introduced in pantsbuild#9318 by default, to mitigate issues seen in pantsbuild#9415 until a fix is in place.
* Port to tokio 0.2, and to stdlib futures for fs and task_executor (#9071) We're on an older version of tokio, which doesn't smoothly support usage of async/await. Switch to tokio 0.2, which supports directly spawning and awaiting (via its macros) stdlib futures, which is an important step toward being able to utilize async/await more broadly. Additionally, port the `fs` and `task_executor` crates to stdlib futures. Finally, transitively fixup for the new APIs: in particular, since both `task_executor` and `tokio` now consume stdlib futures to spawn tasks, we switch all relevant tests and main methods to use the `tokio::main` and `tokio::test` macros, which annotate async methods and spawn a runtime to allow for `await`ing futures inline. Progress toward more usage of async/await! * Add notify fs watcher to engine. (#9318) * Use the notify crate to implement an `InvalidationWatcher` for Graph operations. * Make watch async, and watcher log to pantsd.log. Relativize paths returned from notify to the build_root. Refactor invalidate method to be an associated method on the InvalidationWatcher. * Respond to feedback. * Use spawn on io pool instead of custom future impl * Write python fs tests * Relativize paths to invalidate to build root * invalidate nodes with parent paths. * Comments * Add rust tests. Make some things public so we can use them in tests. Use canonical path to build root for relativizing changed paths. * Refactor Python tests. Return watch errors as core::Failure all the way to user. Move task executor onto invalidation watcher. Move test_support trait impl into test_support mod. * use futures lock on watcher * Platform specific watching behavior. On Darwin recursively watch the build root at startup. On Linux watch individual directory roots. Co-authored-by: Stu Hood <stuhood@gmail.com> * Ignore notify events for pants_ignore patterns. (#9406) * Create a git ignorer on the context object. Adjust all call sites which create a posix fs to pass in an ignorer. * Ignore fsevents from files that match pants_ignore patterns. * Always pass is_dir = false to ignorer to avoid stat-ing every path the event watch thread sees. * Add a feature gate to disable the engine fs watcher introduced in #9318 (#9416) * Add a feature gate to disable the engine fs watcher introduced in #9318 by default, to mitigate issues seen in #9415 until a fix is in place. * Don't rerun uncachable nodes if they are dirtied while running. (#9452) * Don't rerun uncachable nodes if they are dirtied while running. - Retry dependencies of uncacheable nodes a few times to get a result until we are exhausted from trying too many times. - Bubble uncacheable node retry errors up to the user, tell them things were chaning to much. - Don't propagate dirtiness past uncacheable nodes when invalidating from changed roots. Otherwise dirty dependents of uncacheable nodes will need to re-run. * enable the engine fs watcher by default, now that it won't cause issues. Remove execution option override from tests. * use reference to self in stop_walk_predicate closure * invalidate often enough that test doesn't flake * Add a flag to prevent the FsEventService and watchman from starting (#9487) * add --watchman-enable flag * disable watchman when flag is false * Don't wait for the initial watchman event if we aren't using watchman. * check invalidation watcher liveness from scheduler service * Extract a `watch` crate. (#9635) The `watch` module directly accesses the `engine` crate's `Graph`, which makes it more challenging to test. Extract a `watch` crate which is used via an `trait Invalidatable` which is implemented for the engine's `Graph`, as well as independently in tests. [ci skip-jvm-tests] * Simplify Scheduler::execute and unify Graph retry (#9674) Both the `Graph` and the `Scheduler` implemented retry for requested Nodes, but the `Scheduler` implementation was pre-async-await and much more complicated. Unify the retry implementations into `Graph::get` (either roots or uncacheable nodes are retried), and simplify the `Scheduler`'s loop down to: ``` let maybe_display_handle = Self::maybe_display_initialize(&session); let result = loop { if let Ok(res) = receiver.recv_timeout(refresh_interval) { break Ok(res); } else if let Err(e) = Self::maybe_display_render(&session, &mut tasks) { break Err(e); } }; Self::maybe_display_teardown(session, maybe_display_handle); result ``` A single, more modern retry implementation (thanks @hrfuller!), and a cleaner `Scheduler::execute` loop. * Move file invalidation handling to rust (#9636) A few different kinds of file watching span the boundary between the `SchedulerService` and `FSEventService`: 1. pantsd invalidation globs - how `pantsd` detects that its implementing code or config has changed 2. pidfile - watches `pantsd`'s pidfile to ensure that the daemon exits if it loses exclusivity 3. graph invalidation - any files changing in the workspace should invalidate the engine's `Graph` 4. `--loop` - implemented directly in the `SchedulerService` Because of the semi-cyclic nature of the relationship between the `SchedulerService` and `FSEventService`, it's challenging to understand the interplay of these usecases. And, unsurprisingly, that lead to the `notify` crate implementation only satisfying one of them. The fundamental change in this PR is to add support for two new parameters to engine executions which are implemented by the `Graph`: * `poll: bool` - When `poll` is enabled, a `product_request` will wait for the requested Nodes to have changed from their last-observed values before returning. When a poll waits, an optional `poll_delay` is applied before it returns to "debounce" polls. * `timeout: Optional[Duration]` - When a `timeout` is set, a `product_request` will wait up to the given duration for the requested Node(s) to complete (including any time `poll`ing). These features are then used by: * `--loop`: uses `poll` (with a `poll_delay`, but without a `timeout`) to immediately re-run a `Goal` when its inputs have changed. * invalidation globs and pidfile watching: use `poll` (with no `poll_delay`) and `timeout` to block their `SchedulerService` thread and watch for changes to those files. The `FSEventService` and `SchedulerService` are decoupled, and each now interacts only with the `Scheduler`: `FSEventService` to push `watchman` events to the `Graph`, and the `SchedulerService` to pull invalidation information from the `Graph`. Because all events now flow through the `Graph`, the `notify` crate has reached feature parity with `watchman`. In followup changes we can remove the experimental flag, disable `watchman` (and thus the `FSEventService`) by default, and remove the dependency between `--loop` and `pantsd`. * pin tokio at exactly 0.2.13 * fix lint issues * fix mypy typing issues * Move away from the debounced notify watcher #9754 * Remove test that has raced file invalidation ever since the notify backend was added, but which will now fairly consistently lose that race. # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests] * As explained in the comment: we can no longer create duplicate parallel BUILD files and hope that pants does not notice them before we scan the directory again! # Delete this line to force CI to run the JVM tests. [ci skip-jvm-tests] Co-authored-by: Stu Hood <stuhood@gmail.com> Co-authored-by: Stu Hood <stuhood@twitter.com>
Problem
Watchman is error prone, and requires pants to maintain and communicate with another daemon.
Solution
Implement file event watching with the notify-rs crate. Hook watchman and notify invalidation strategies into the same invalidation method, with different event sources.
Result
Notify and watchman fs event systems operate in tandem. We will follow up to allow users to disable watchman. Because of the way notify can watch more granular paths than watchman, file change events result in less drastic node invalidation, which should lead to faster runs with pantsd.