Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add std::fs::rename_noreplace #131

Open
piegamesde opened this issue Oct 30, 2022 · 13 comments
Open

Add std::fs::rename_noreplace #131

piegamesde opened this issue Oct 30, 2022 · 13 comments
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api

Comments

@piegamesde
Copy link

Proposal

Add std::fs::rename_noreplace, which is equivalent to std::fs::rename but with the semantic that an existing target location will never be overwritten and always yield an error instead.

Problem statement

There is no easy way to rename/move a file without accidentally overwriting the target location should it exist. There is no way to emulate the desired behavior in an atomic way, meaning that all workarounds suffer from TOCTTOU issues.

Motivation, use-cases

It is a common pattern for software to do file system writes into a temporary location next to the target and then move it in place once it is ready. This is useful because move operations are atomic. In many cases overwriting the target when it exists is a desired semantic, but in some cases it is not.

For example when extracting a compressed directory, the software would first extract the content into a temporary location then move it over. When destination files exist, it wants to handle it in some special way (e.g. chose a different name or ask the user). The existence check needs to be done atomically with the move operation, otherwise race hazards may occur.

Solution sketches

A new function std::fs::rename_noreplace is introduced. On Linux, it defers to the renameat2 syscall with the RENAME_NOREPLACE flag set. On MacOS, it uses the renameatx_np call with RENAME_EXCL. On Windows, this is the default behavior so the MOVEFILE_REPLACE_EXISTING need to be omitted.

I could not find any equivalent syscall on non-Linux unix systems (*BSD), and don't know how to deal with that aspect.


Alternatively, we could change the semantics of the current rename function (they are explicitly documented as unstable), and optionally add a rename_replace command.

As a second alternative, we could expose more of the platform specific syscalls: for example, Linux renameat2 also has a flag for atomically swapping two paths.

Links and related work

https://internals.rust-lang.org/t/rename-file-without-overriding-existing-target/17637

What happens now?

This issue is part of the libs-api team API change proposal process. Once this issue is filed the libs-api team will review open proposals in its weekly meeting. You should receive feedback within a week or two.

@piegamesde piegamesde added api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api labels Oct 30, 2022
@sanxiyn
Copy link
Member

sanxiyn commented Oct 30, 2022

Python 3 includes both os.rename and os.replace. os.replace is new in Python 3.3 and guaranteed to replace. os.rename is inconsistent, replacing on Unix and not replacing on Windows, following the default behavior of the platform. os.rename is old and probably that way due to backward compatibility constraints. Python 3 does not provide non-replacing version of the call, probably due to portability problems on Unix.

@the8472
Copy link
Member

the8472 commented Oct 30, 2022

But does python promise atomicity?

@piegamesde
Copy link
Author

os.replace is new in Python 3.3 and guaranteed to replace

Are you sure about that? "If dst is a non-empty directory, OSError will be raised." In that sense I don't really understand the semantics of os.replace, it feels like the only advantage is that you get the same messy behavior across all platforms?

But does python promise atomicity?

I prefer framing the question the other way around: Does the python potentially suffer from race hazards? But to answer the question, the docs say yes: "If successful, the renaming will be an atomic operation (this is a POSIX requirement)."

@the8472
Copy link
Member

the8472 commented Oct 30, 2022

It says the rename will be atomic, that doesn't tell us if the check is also part of that.

But if it's allowed to be not-atomic in the failure case then hardlink + unlink(src) approach could be used.

@SUPERCILEX
Copy link

I could not find any equivalent syscall on non-Linux unix systems (*BSD), and don't know how to deal with that aspect.

This is a hard blocker IMO. Functionality that's sometimes available seems better suited to a crate than the stdlib.

@ChrisDenton
Copy link
Member

ChrisDenton commented Jan 23, 2023

If there is wide enough support then we could have unsupported platforms always return an error. We do that sometimes. And an argument can be made that std is the right place for these kinds of building blocks even if they fall short of being truly universal. In this case all tier 1 targets support it, no?

However, it is admittedly janky and I don't want to deemphasize the downsides. I just don't think it's a hard blocker. It's more of a strong warning to tread very carefully.

@thomcc
Copy link
Member

thomcc commented Jan 23, 2023

This is useful because move operations are atomic

This is a common misconception and is often not true in the case of OS power loss or crash (which is a fairly common scenario in non-server usage -- users tend to hard-reset machines pretty often). In that case, it's filesystem dependent. As an example of how this can be fairly subtle: With btrfs, rename is only atomic on crash with if it is replacing a file, so on that filesystem fs::rename would be (conditionally) atomic, and fs::rename_noreplace would never be atomic.

That said I don't largely have an opinion beyond this, since it's probably "more atomic" (if such a thing is a sensible comparator) than the alternative folks are likely to write.

@ChrisDenton
Copy link
Member

ChrisDenton commented Jan 23, 2023

Yeah, I think "atomic" is the wrong way to describe this. The goal is to prevent overwrites in a way that mitigates TOCTOU issues. The whole operation being atomic (and crash safe) would be nice but isn't the main purpose and in any case can't be guaranteed.

@joshtriplett
Copy link
Member

We discussed this in today's @rust-lang/libs-api meeting. We came up with a potential solution that would work as a fallback on all platforms, with some limitations.

If the native platform doesn't support a rename-without-replacement operation, hardlink the source to the target, then unlink the source. link and linkat will fail if the target exists, so they won't replace an existing file. This will work atomically.

This is close to what the underlying filesystem does already. From the documentation of rename: "However, there will probably be a window in which both oldpath and newpath refer to the file being renamed."

This wouldn't work for directories, but for directories the ordinary rename or renameat system call already has something close to noreplace behavior: the destination must either not exist or must be an empty directory.

OSes that have neither a rename-without-replace syscall nor hardlinks won't be able to implement this, but that's fine: on those platforms, it can just fail unconditionally, and callers will have to use something else.

@ChrisDenton
Copy link
Member

ChrisDenton commented Dec 12, 2023

If the native platform doesn't support a rename-without-replacement operation, hardlink the source to the target, then unlink the source. link and linkat will fail if the target exists, so they won't replace an existing file. This will work atomically.

I think that's technically not atomic (due to old and new existing at the same time) but, as I said above, I also agree it doesn't matter for mitigating TOCTOU issues. A temp file not being deleted is a gc issue rather than a bug or security issue.

@joshtriplett
Copy link
Member

I think that's technically not atomic (due to old and new existing at the same time)

See the text I quoted from the rename manpage; the rename syscall can also result in both files existing at the same time.

@the8472
Copy link
Member

the8472 commented Jan 7, 2025

On Windows, this is the default behavior so the MOVEFILE_REPLACE_EXISTING need to be omitted.

Is this reliable? E.g. when network filesystems are involved? Does the hardlink fallback work there?

@ChrisDenton
Copy link
Member

From MS-FSCC

ReplaceIfExists (1 byte): A Boolean value. Set to TRUE to indicate that if a file with the given name already exists, it SHOULD be replaced with the given file. Set to FALSE to indicate that the rename operation MUST fail if a file with the given name already exists

So according to the protocol, it "MUST" fail if the name already exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change-proposal A proposal to add or alter unstable APIs in the standard libraries T-libs-api
Projects
None yet
Development

No branches or pull requests

7 participants