-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Racey Could not resolve file
errors despite files existing
#1101
Comments
This is odd... the successfully opened file is being queried as a directory:
|
Might you be able to strace all previous file descriptor names using 6 (or whatever the failing fd is in the new run)? If I had to guess, in incremental mode some goroutine thinks an old directory fd is still live. |
Thanks for all of the detail in this issue description. Unfortunately I'm not quite sure what the problem is here (although I agree Edit: It's been released in version 0.11.3. Please try it out and let me know if it provides any additional information for this problem. |
Super useful, thanks Evan! Here's a couple examples of the resolution phase failing (for two different runs):
Here's the actual file:
Very suspicious to me is the
For comparison, here's an example of a successful resolution from one of the same runs:
It would appear that the file isn't being found at that lowest level of resolution in
It is always fd 6 and 6 is used for almost every read I see in the process doing. I think (but am not certain) that the esbuild code that reads files like esbuild/internal/resolver/resolver.go Line 1231 in fb8681e
esbuild/internal/bundler/bundler.go Lines 1301 to 1302 in c62d33f
Here's the whole strace if you are curious. Also, I am equally suspicious of the new environment that I am running in as I am of esbuild -- it is the thing that is different between the case that works fine for me and the case that doesn't. It may be some other system level limit or something I haven't configured properly so apologies if this is just my fault, hopefully we can teach esbuild to emit an error message that makes it clear who is at fault though! |
The strace log mirrors that observation, albeit with a different directory from a different run. Here's the log fragment for the directory query of
Notice
So esbuild's directory resolving logic appears to be reasonable, but I suspect that the system was interrupted while interrogating the files in the directory and it may have given an incorrect result. So this might be a Go runtime issue related to interrupting |
Right. I think the reason those files are later queried is because they are passed in as entrypoints to the build explicitly -- the tool invoking esbuild does a glob for all the js/ts files on disk and passes them all in as entrypoints. So, that glob which uses fast-glob underneath discovered the files just fine. |
Can you confirm that all the files in question existed in those directories prior to running esbuild? |
I think for the glob to have found them they had to have existed, and to my knowledge there's nothing that would delete them while esbuild was running? Subsequent runs in the exact same place succeed as well so I don't think they are coming and going but hey -- this is really weird, who knows. Here's a few more full straces of failed executions: http://paste.ubuntu.com/p/JxQ9F4MrgS/ failed with
http://paste.ubuntu.com/p/QY8PWyVwR7/ failed with
http://paste.ubuntu.com/p/8ksHkVrD9N/ failed with
In the first run there this is the strace snippet for listing of the directory the missing files are in
and for the second run
Theres that SIGURG in both examples! If I look at some of the surrounding getdents64 calls I don't see that signal being sent. |
It does seem like a Go runtime issue, barring some unusual storage device or background rsync-like process running. |
Possibly related: golang/go#39237 |
Do you still see the file errors with the workaround mentioned in golang/go#39237 (comment) ?
|
🎉 That resolves the issue for me! Nice find @kzc , thank you! I talked a bit with the folks from this other CI provider (LayerCI) and they're actually using a FUSE mounted filesystem under the hood to serve the files to esbuild using In any case, I think esbuild is not the culprit here so I will close this issue. For anyone else coming along seeing strange resolve errors for files that seem like they do exist on disk, especially if running on top of a userspace filesystem via FUSE, try running esbuild with the |
Even though it's a Linux kernel bug, I think this issue should be be reopened and warrants an esbuild workaround. It's a difficult to diagnose problem that only occurs on Linux with Go applications on certain filesystems since the implementation of the The June 2020 linux kernel bug report hasn't generated any activity as of this writing. And the RedHat bug report seems to have been marked as CLOSED ERRATA. I think esbuild could have a directory-specific resolve cache invalidation strategy in the event of |
One potential solution could be to force Regardless, it would be good to create another bug on the Go issue tracker. The issue golang/go#39237 was closed even though the issue isn't fixed, and the last post says to create a new issue. I don't think it'd be appropriate for me to create one as I have no way of reproducing the problem myself. |
@evanw I think there's enough information at hand for an esbuild workaround on Linux. The esbuild/internal/fs/fs_real.go Lines 290 to 300 in 00269f3
The erroneously truncated |
I just ran into this issue using a monorepo with yarn workspaces. It causes a symlinked local dependency to be unresolvable. |
esbuild is unable to find the file, even though it does exist. This only happens for files in a directory with several other entries, so by creating a unique directory name per file on every build, we guarantee that there will only every be a single file present within the directory, circumventing the esbuild issue.
esbuild is unable to find the file, even though it does exist. This only happens for files in a directory with several other entries, so by creating a unique directory name per file on every build, we guarantee that there will only every be a single file present within the directory, circumventing the esbuild issue.
When running a working esbuild config in a new environment, I get frequent but inconsistent
Could not resolve <file>
errors despite the files in question existing just fine.I'm trying to get an esbuild configuration that works locally on my OS X machine and in a Github Actions Ubuntu runner onto another Linux based CI system. The exact same config works fine always in the other environments, but only sometimes works in the new environment. I am building a big TypeScript project for node by passing ~200 or so
.ts
entrypoint files to the JSbuild
API.The files that esbuild fails to resolve change each time, but each time it does fail, it's all of the files in a particular folder that fail to resolve:
I don't have any resolve plugins set up and I am running on
v0.11.1
.Debugging
I tried adding a resolve plugin that runs an
fs.promises.access
on each file and then returns undefined to prove that the same process invoking esbuild can access the file fine, and it indeed logged successful accesses. Because of this and because the build works sometimes I think the files really are there.Curiously, strace-ing esbuild makes it fail to resolve some files almost every time I invoke it with strce. I don't see anything amiss in the strace, but here's an example trace for a file that fails to resolve:
To me it seems like opening the file succeeds just fine, and then esbuild writes the
error: Could ...
string to stderr.I was running
strace -f -e '!futex,read' yarn <my-esbuild-task>
which ends up running esbuild with these options:see https://github.com/gadget-inc/esbuild-dev/blob/main/src/Compiler.ts#L98-L106 for the source
For contrast, here's what the strace looks like for esbuild reading a file successfully early:
I realize this isn't a super reproducible issue, but I am hoping that these are enough breadcrumbs that we could figure out what next steps to take to unearth more information. Also notable is that this seems similar to #348 , but
ulimit -n
is 65536 in this environment, and I am running as root.Any other information I could gather to help figure out what might be causing this heisenbug?
The text was updated successfully, but these errors were encountered: