Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node 20 recursive file watching is misbehaving on Linux #48437

Closed
cjihrig opened this issue Jun 12, 2023 · 14 comments · Fixed by #51406
Closed

Node 20 recursive file watching is misbehaving on Linux #48437

cjihrig opened this issue Jun 12, 2023 · 14 comments · Fixed by #51406
Labels
confirmed-bug Issues with confirmed bugs. fs Issues and PRs related to the fs subsystem / file system. help wanted Issues that need assistance from volunteers or PRs that need help to proceed.

Comments

@cjihrig
Copy link
Contributor

cjihrig commented Jun 12, 2023

Version

20.3.0

Platform

linux

Subsystem

fs

What steps will reproduce the bug?

I put together a repro at https://github.com/cjihrig/recursive-watcher-bug that shows the following passing on Windows and macOS, but failing on Ubuntu. I'm not sure if this is specific to GitHub Actions.

'use strict';
const { mkdtempSync, watch, writeFileSync } = require('node:fs');
const { tmpdir } = require('node:os');
const { join } = require('node:path');
const tmpDir = mkdtempSync(join(tmpdir(), 'repro-test-'));
const filename = join(tmpDir, 'test.file');
const keepalive = setTimeout(() => {
  throw new Error('timed out');
}, 60_000);

const watcher = watch(tmpDir, { recursive: true }, (eventType, filename) => {
  clearTimeout(keepalive);
  watcher.close();
  console.log(eventType, filename);
});

writeFileSync(filename, 'foobar');

How often does it reproduce? Is there a required condition?

Always reproduces for me.

What is the expected behavior? Why is that the expected behavior?

I expect the test to pass.

What do you see instead?

The test times out.

Additional information

I noticed this while trying to update Platformatic to support Node 20 and created the minimal reproduction linked above.

I also noticed that Platformatic was passing recursive: true to the promisified version of watch() on Ubuntu on earlier versions of Node. It should have thrown ERR_FEATURE_UNAVAILABLE_ON_PLATFORM, but did not. I did see that error with the callback based watch() though, which makes me think there is some missing validation on older versions of Node in addition to this bug.

cc: @anonrig who implemented recursive file watching on Linux.

@anonrig
Copy link
Member

anonrig commented Jun 12, 2023

Does the issue resolve if you add a small timeout before writing file and after creating the watcher? I suspect the async nature of fs which does polling causes this issue.

@cjihrig
Copy link
Contributor Author

cjihrig commented Jun 12, 2023

Looks like that did help - https://github.com/cjihrig/recursive-watcher-bug/actions/runs/5247088842/jobs/9476792611.

I don't think we can realistically ask every user of recursive file watching to add that to their code though.

@anonrig
Copy link
Member

anonrig commented Jun 12, 2023

Looks like that did help - https://github.com/cjihrig/recursive-watcher-bug/actions/runs/5247088842/jobs/9476792611.

I don't think we can realistically ask every user of recursive file watching to add that to their code though.

@cjihrig I agree. It seems the issue is within this lines: https://github.com/nodejs/node/blob/main/lib/internal/fs/recursive_watch.js#L222

We can avoid using promisify, but I'm suspicious towards blocking the main thread while this is resolving. Unfortunately, I'm on parental leave and have limited access. If this task is not urgent, I can take a look at it in a week and a half.

@anonrig anonrig added confirmed-bug Issues with confirmed bugs. help wanted Issues that need assistance from volunteers or PRs that need help to proceed. fs Issues and PRs related to the fs subsystem / file system. labels Jun 12, 2023
@winedarkmoon
Copy link

Can I take on this issue?

@anonrig
Copy link
Member

anonrig commented Jun 17, 2023

Can I take on this issue?

Please do. Happy to help if you get stuck on something.

@winedarkmoon
Copy link

Can I take on this issue?

Please do. Happy to help if you get stuck on something.

So the traverse function doesn't complete its execution before the files start to be watched, thus causing a race condition? Would introducing a mechanism to 'await' the initialization of the watcher could resolve this issue?

@anonrig
Copy link
Member

anonrig commented Jun 19, 2023

So the traverse function doesn't complete its execution before the files start to be watched, thus causing a race condition? Would introducing a mechanism to 'await' the initialization of the watcher could resolve this issue?

That would be a breaking change since it will cause the function to behave differently in different operating systems.

@winedarkmoon
Copy link

So the traverse function doesn't complete its execution before the files start to be watched, thus causing a race condition? Would introducing a mechanism to 'await' the initialization of the watcher could resolve this issue?

That would be a breaking change since it will cause the function to behave differently in different operating systems.

Understood. Would it be feasible to implement a flag/event that is set once the traverse function has completed? This way, we could delay file watching until after the initial traversal?

@spacesugam
Copy link

spacesugam commented Jun 23, 2023

Looks like that did help - https://github.com/cjihrig/recursive-watcher-bug/actions/runs/5247088842/jobs/9476792611.
I don't think we can realistically ask every user of recursive file watching to add that to their code though.

@cjihrig I agree. It seems the issue is within this lines: https://github.com/nodejs/node/blob/main/lib/internal/fs/recursive_watch.js#L222

We can avoid using promisify, but I'm suspicious towards blocking the main thread while this is resolving. Unfortunately, I'm on parental leave and have limited access. If this task is not urgent, I can take a look at it in a week and a half.

Looks like that did help - https://github.com/cjihrig/recursive-watcher-bug/actions/runs/5247088842/jobs/9476792611.
I don't think we can realistically ask every user of recursive file watching to add that to their code though.

@cjihrig I agree. It seems the issue is within this lines: https://github.com/nodejs/node/blob/main/lib/internal/fs/recursive_watch.js#L222

We can avoid using promisify, but I'm suspicious towards blocking the main thread while this is resolving. Unfortunately, I'm on parental leave and have limited access. If this task is not urgent, I can take a look at it in a week and a half.

So the traverse function doesn't complete its execution before the files start to be watched, thus causing a race condition? Would introducing a mechanism to 'await' the initialization of the watcher could resolve this issue?

That would be a breaking change since it will cause the function to behave differently in different operating systems.

Regarding the race condition issue in the code you shared, I've analyzed it and come up with a solution. The problem arises from the traverse function not completing its execution before the files start being watched, causing a race condition.

I propose the following solution to address this issue:

`async function traverse(dir, files = new SafeMap(), symbolicLinks = new SafeSet()) {
const { opendir } = lazyLoadFsPromises();

const filenames = await opendir(dir);
const subdirectories = [];

for await (const file of filenames) {
const f = pathJoin(dir, file.name);

files.set(f, file);

// Do not follow symbolic links
if (file.isSymbolicLink()) {
  symbolicLinks.add(f);
} else if (file.isDirectory()) {
  subdirectories.push(traverse(f, files)); // Store the promise for each subdirectory
}

}

await Promise.all(subdirectories); // Wait for all subdirectory promises to resolve

return files;
}

// ...

async kFSWatchStart {
filename = pathResolve(getValidatedPath(filename));

try {
const file = lazyLoadFsSync().statSync(filename);

this.#rootPath = filename;
this.#closed = false;
this.#watchingFile = file.isFile();

if (file.isDirectory()) {
  this.#files.set(filename, file);

  await traverse(filename, this.#files, this.#symbolicFiles); // Await the completion of the traverse function

  for (const f of this.#files.keys()) {
    this.#watchFile(f);
  }
} else {
  this.#watchFile(filename);
}

} catch (error) {
if (error.code === 'ENOENT') {
error.filename = filename;
throw error;
}
}
}`

Here are the changes made and their explanations:

1.Modified the traverse function to be an async function and added the await keyword before the recursive call to traverse for subdirectories. This ensures that the function waits for the subdirectory traversal to complete before moving on, resolving the race condition.

2.In the [kFSWatchStart] method, added the await keyword before calling traverse for the initial directory. This ensures that the function waits for the traversal to complete before proceeding with watching the files.

@mcollina
Copy link
Member

@spacesugam are you still interested in sending a PR?

I would go for the first option.

@mcollina
Copy link
Member

@anonrig @cjihrig I took another look at this issue, and there is a race condition between traversing the file system asynchronously and watch. Ultimately we are missing an event, i.e. "the watch has been successfully set up". The alternative is to make traversing the tree synchronous on Linux, so that it can all be set up correctly.

This is not a problem on Windows or Mac because they have different low-level primitives for watching

@mcollina
Copy link
Member

The only solution to this bug is to move the exploration of the directory tree to synchronous on Linux. @anonrig do you see any specific issue with that?

@novemberborn
Copy link

Quite possibly related, calling fs.watch() in recursive mode with an already aborted signal does not prevent StatWatchers from being created and referenced. Same with synchronously closing the returned watcher.

@anonrig
Copy link
Member

anonrig commented Jan 2, 2024

The only solution to this bug is to move the exploration of the directory tree to synchronous on Linux. @anonrig do you see any specific issue with that?

Looks good to me!

Quite possibly related, calling fs.watch() in recursive mode with an already aborted signal does not prevent StatWatchers from being created and referenced. Same with synchronously closing the returned watcher.

@novemberborn Can you create a different issue on this, with a possible reproduction please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. fs Issues and PRs related to the fs subsystem / file system. help wanted Issues that need assistance from volunteers or PRs that need help to proceed.
Projects
None yet
6 participants