-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The implementation of fd_filestat_get
in wasi-preview1-component-adapter is unsound
#8956
Comments
It's unclear to me how to provide the desired semantics ( |
On reflection, I think the intended behavior here is to maintain a mapping from This would still be problematic, in a subtler way: nothing in the documentation for |
On more reflection, since How was this supposed to be implemented? |
I can link WebAssembly/wasi-filesystem#81 as more possible background on this, but I don't have any extra info beyond that. |
Like Alex, the only understanding I have of this topic is from PR 81 on the spec. It is quite possible that both the spec and our implementation are questionable, and that you are the first person to encounter problems with it in the wild. I don't have the expertise or time to devote to really digging into what the best path forward here is right now. Dan is out on leave right now, so I'm standing in as a wasi-filesystem champion in his absence, and I will defer to your judgement for changes to the spec and/or implementation. |
Thank you. I've been feeling not the best lately, and this has directly impacted my ability to reason on really intricate topics like this one. I'll keep this issue on my radar and try to work out a way forward that works for everyone involved. I do not yet have any firm conclusions and I'll need a deeper understanding of the topic and more consultation with others before I'll have them. Something that I noticed seems to have been an omission is the question of polyfilling the APIs. In one direction it's straightforward: wasip2 can be done on top of wasip1 easily. But it seems like nobody has drilled in detail into how to emulate wasip1 on top of wasip2, and given the state of the ecosystem (LLVM only directly knowing how to emit wasip1 binaries with wasip2 requiring a post-processing step) I think that's actually the more important direction? @pchickey Do you think there is possibility or appetite for a sort of "sidecar" specification that builds on top of wasip2 in order to provide a sound polyfill for |
Oh that's actually the most important direction for us, too, implementing p1 on top of p2. The crate you linked in the OP is the current canonical source for implementing p1 on top of p2 within wasm. There's a host-side implementation at https://github.com/bytecodealliance/wasmtime/blob/main/crates/wasi/src/preview1.rs as well where the p1 APIs are all defined primarily in terms of their p2 counterparts. Modifying the adapter ( |
Agreed with Alex: implementing p1 on top of p2 is an important direction. There are some places we are choosing to do that with low fidelity (the whole rights system in p1 is a mess and we want to leave it behind as much as possible, but it ends up mattering for corner cases where wasi-libc ended up depending on it) but I think this case matters a whole lot more. As for the incompleteness of the wasip2 support in wasi-libc, so that LLVM could emit a purely wasip2 binary, thats something thats "just" gated on someone who can work on wasi-libc having the time to do so, which is why it has stalled the last few months. We do consider that important to get done at some point, Alex has been filling in some related missing pieces recently, mostly focusing on the parts the Rust toolchain will need. |
In this case, given how p1 and p2 are defined, there must be a spec change to maintain soundness. For example there could be an additional wit interface that returns an arbitrary 128-bit unique identifier for a descriptor or a descriptor based path that is then stuffed into |
I think a backstop solution that prevents the most egregious consequences (e.g. Clang considering every input file as the same) is to use a monotonically increasing global counter for filling in |
Could you perhaps hash the filename in the meantime? And pretend that each filename is its own unique dev/ino? |
What about I guess |
Ah sorry I'm assuming that the structure you're maintaining on the host corresponding to the file would contain the hashed filename or the full filename or something like that. If all you have is the fd and nothing else I don't know what would be done. |
Oh there is a misunderstanding. What I was considering is the ability to solve this issue in the preview1 component adapter for an arbitrary host. In my particular host I have basically added fake but consistent inodes to everything, I return them as the metadata hash lower (in accordance with what the adapter currently does) and my code works fine now; it's not a blocker for me personally, I'm raising this because it is an ecosystem health issue. |
Ah makes sense. I would personally come to the same conclusion as you then that there's no way to actually "fix" things without changing the standard and/or docs there. Given the weak guarantees of One possibility would be to add something which indicates whether |
I agree, which one of us should kick off that discussion? |
If you wouldn't mind I'd defer to you as I fear I wouldn't have enough context on this |
Potentially bad idea, but I'm curious if it spawns other thinking: what if the inode were synthesized from an actual hash of file contents? Basically, imagine the filesystem like a content-addressed store (e.g. git). The inode is then just the hash that addresses some content. This certainly satisfies the "check the inode and timestamp first to see if it's the same file contents" use-case; it's no worse than what one would have to do manually to verify same file contents in the absence of an inode/timestamp, i.e., actually read the data. It's also conceptually "minimal" in the sense that it doesn't require any properties or guarantees from the p2 API. The main downside is that it's slow: it does not satisfy what the user expects to be an O(1)-ish metadata check. This is especially an issue since the inode number must be provided as the result of a stat operation (i.e., is not requested separately). Maybe that alone is enough to throw out this idea. But maybe not? |
That could confuse tar into thinking two files are hardlinks to each other and thus turn them into actual hard links when unpacking. |
The implementation uses the value of
metadataHash
asst_ino
:wasmtime/crates/wasi-preview1-component-adapter/src/lib.rs
Lines 950 to 951 in 225d20e
This is unsound. To quote the documentation for
metadataHash
:Applications will commonly use
st1.st_ino == st2.st_ino
in order to determine if two files are the same (for example, Clang does it). My implementation ofmetadataHash
, which always returns 0 and is compliant with the definition above, caused the combination of Clang and the wasip1 component adapter, to always treat all read files as the same file (which in practice meant the first#include
caused an infinite loop).The text was updated successfully, but these errors were encountered: