Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't hash contents of every watched file #8

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

filipesilva
Copy link

From https://github.com/gmethvin/directory-watcher?tab=readme-ov-file#configuration:

By default, DirectoryWatcher will try to prevent duplicate events (...). This is done by creating a hash for every file encountered and keeping that hash in memory. This might result in slower performance, because the library has to calculate the hash of the entire file.
...
In the above example we use the last modified time hasher. This hasher is only suitable for platforms that have at least millisecond precision in last modified times from Java. It's known to work with JDK 10+ on Macs with APFS.

The default is rather slow actually. On a 230mb directory with 1700 files, the initial watch call takes 2951ms. With the changes in this PR (which are the changes shown in their docs) it goes down to 2ms.

2951ms isn't terribly bad mind you... but I came across this problem when watching a 12.5gb folder with 180k files instead. The watcher never really started in that case. With these changes it started in 5659ms.

Regarding the restrictions listed, APFS is the default on Macs since 10.13 (released late 2017). Perhaps this should not be the default, but I will leave that to your judgement.

From https://github.com/gmethvin/directory-watcher?tab=readme-ov-file#configuration:

> By default, DirectoryWatcher will try to prevent duplicate events (...). This is done by creating a hash for every file encountered and keeping that hash in memory. This might result in slower performance, because the library has to calculate the hash of the entire file.
> ...
> In the above example we use the last modified time hasher. This hasher is only suitable for platforms that have at least millisecond precision in last modified times from Java. It's known to work with JDK 10+ on Macs with APFS.

The default is rather slow actually. On a 230mb directory with 1700 files, the initial watch call takes 2951ms. With the changes in this PR (which are the changes shown in their docs) it goes down to 2ms.

2951ms isn't terribly bad mind you... but I came across this problem when watching a 12.5gb folder with 180k files instead. The watcher never really started in that case. With these changes it started in 5659ms.

Regarding the restrictions listed, APFS is the default on Macs since 10.13 (released late 2017). Perhaps this should not be the default, but I will leave that to your judgement.
@jeroenvandijk
Copy link
Contributor

@filipesilva Maybe you can bring down the loading time even more with a different fileTreeVisitor . I saw quite a big difference there as well.

@filipesilva
Copy link
Author

Need to benchmark that approach. Maybe the real PR here is to add something that supports a custom DirectoryWatcher/builder, so we can add the visitor/filehasher/whatever else without beholder having to support it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants