Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem in File.Copy (and possibly elsewhere) on Linux #111807

Open
rgr21 opened this issue Jan 24, 2025 · 4 comments
Open

Problem in File.Copy (and possibly elsewhere) on Linux #111807

rgr21 opened this issue Jan 24, 2025 · 4 comments
Labels
area-System.IO untriaged New issue has not been triaged by the area owner

Comments

@rgr21
Copy link

rgr21 commented Jan 24, 2025

Description

System.IO.File.Copy(source_string, dest_string) has been observed to return success but leave corrupt data in the destination file.

dotnet 8.0.110, Ubuntu 22.04.5 LTS, x64.

Hypothesis

FileSystem.CopyFile(Path.GetFullPath(sourceFileName), Path.GetFullPath(destFileName), overwrite);

sets up destination file handle here
using SafeFileHandle dst = SafeFileHandle.Open(destFullPath, overwrite ? FileMode.Create : FileMode.CreateNew,

CopyFile then proceeds operating on the destination file handle. Various syscalls to write() are issued and succeed. (Note source and destination are on different filesystems, so no CopyFileRange. Unclear if relevant.)

No fsync is issued.

Disposal of the destination file handle then happens. I think this calls through to

where the code from ReleaseHandle (which is where our syscall close happens) is ignored.

https://man7.org/linux/man-pages/man2/close.2.html says

Dealing with error returns from close()
A careful programmer will check the return value of close(),
since it is quite possible that errors on a previous write(2)
operation are reported only on the final close() that releases
the open file description. Failing to check the return value
when closing a file may lead to silent loss of data.

And follows up with

A careful programmer who wants to know about I/O errors may
precede close() with a call to fsync(2).

We've definitely seen in the wild with NFS that:

  1. write syscalls have succeeded
  2. close fails (for whatever reason - the network is fallible)
  3. data is lost
  4. success is reported to the user

strace and deliberately injecting faults leads me to the above.

This is only happening to a diminishingly small amount of operations, it isn't the case that my network or disks are entirely screwed. It's just that when it does happen - a few MB every PB of writes in my repro - data is bad and the application can't tell.

Reproduction Steps

System.IO.File.Copy("input", "output");

And something like the following, compiled and bashed in with LD_PRELOAD. All credit Copilot, all bugs mine.

`#include <stdio.h>
#include <dlfcn.h>
#include <errno.h>
#include <unistd.h>
#include <string.h>

typedef int (*orig_close_t)(int);

int close(int fd) {
static orig_close_t orig_close = NULL;
if (!orig_close) {
orig_close = (orig_close_t)dlsym(RTLD_NEXT, "close");
}
char path[1024];
snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);

char resolved_path[1024];
if (readlink(path, resolved_path, sizeof(resolved_path)) > 0) {
    if (strstr(resolved_path, "output")) {
        errno = EIO;
        return -1;
    }
}

return orig_close(fd);

}`

Or a variant on
strace -e trace=close -e fault=close:error=EIO -p <pid>
to achieve the same

Expected behavior

Either success and destination file has complete data, or failure reported to caller.
In the example above, that necessitates either not silently ignoring the close() call, or issuing an fsync beforehand.

Actual behavior

Silent data loss.

Regression?

No response

Known Workarounds

Can work around with lower level APIs, but "everyone" will use this one so it should probably be made safe.

Configuration

dotnet 8.0.110
Ubuntu 22.04.5 LTS
x64

Not believed specific to any of these

Other information

No response

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Jan 24, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

@janvorli
Copy link
Member

I believe the ReleaseHandle called for the SafeFileHandle is actually this one:

protected override bool ReleaseHandle()
{
// If DeleteOnClose was requested when constructed, delete the file now.
// (Unix doesn't directly support DeleteOnClose, so we mimic it here.)
// We delete the file before releasing the lock to detect the removal in Init.
if (_deleteOnClose)
{
// Since we still have the file open, this will end up deleting
// it (assuming we're the only link to it) once it's closed, but the
// name will be removed immediately.
Debug.Assert(_path is not null);
Interop.Sys.Unlink(_path); // ignore errors; it's valid that the path may no longer exist
}
// When the SafeFileHandle was opened, we likely issued an flock on the created descriptor in order to add
// an advisory lock. This lock should be removed via closing the file descriptor, but close can be
// interrupted, and we don't retry closes. As such, we could end up leaving the file locked,
// which could prevent subsequent usage of the file until this process dies. To avoid that, we proactively
// try to release the lock before we close the handle.
if (_isLocked)
{
Interop.Sys.FLock(handle, Interop.Sys.LockOperations.LOCK_UN); // ignore any errors
_isLocked = false;
}
// Close the descriptor. Although close is documented to potentially fail with EINTR, we never want
// to retry, as the descriptor could actually have been closed, been subsequently reassigned, and
// be in use elsewhere in the process. Instead, we simply check whether the call was successful.
int result = Interop.Sys.Close(handle);
if (result != 0)
{
t_lastCloseErrorInfo = Interop.Sys.GetLastErrorInfo();
}
return result == 0;
}

The return value from the Interop.Sys.Close that is called to close the handle is checked there. The Interop.Sys.Close actually executes

int32_t SystemNative_Close(intptr_t fd)
{
return close(ToFileDescriptor(fd));
}

@rgr21
Copy link
Author

rgr21 commented Jan 30, 2025

protected override bool ReleaseHandle()
{
// If DeleteOnClose was requested when constructed, delete the file now.
// (Unix doesn't directly support DeleteOnClose, so we mimic it here.)
// We delete the file before releasing the lock to detect the removal in Init.
if (_deleteOnClose)
{
// Since we still have the file open, this will end up deleting
// it (assuming we're the only link to it) once it's closed, but the
// name will be removed immediately.
Debug.Assert(_path is not null);
Interop.Sys.Unlink(_path); // ignore errors; it's valid that the path may no longer exist
}
// When the SafeFileHandle was opened, we likely issued an flock on the created descriptor in order to add
// an advisory lock. This lock should be removed via closing the file descriptor, but close can be
// interrupted, and we don't retry closes. As such, we could end up leaving the file locked,
// which could prevent subsequent usage of the file until this process dies. To avoid that, we proactively
// try to release the lock before we close the handle.
if (_isLocked)
{
Interop.Sys.FLock(handle, Interop.Sys.LockOperations.LOCK_UN); // ignore any errors
_isLocked = false;
}
// Close the descriptor. Although close is documented to potentially fail with EINTR, we never want
// to retry, as the descriptor could actually have been closed, been subsequently reassigned, and
// be in use elsewhere in the process. Instead, we simply check whether the call was successful.
int result = Interop.Sys.Close(handle);
if (result != 0)
{
t_lastCloseErrorInfo = Interop.Sys.GetLastErrorInfo();
}
return result == 0;
}

            int result = Interop.Sys.Close(handle);
            if (result != 0)
            {
                t_lastCloseErrorInfo = Interop.Sys.GetLastErrorInfo();
            }
            return result == 0;

That's in ReleaseHandle().
But then up in

                int lastError = Marshal.GetLastPInvokeError();
                ReleaseHandle();
                Marshal.SetLastPInvokeError(lastError);

This boolean result is ignored?

@janvorli
Copy link
Member

Ah, you are right, it is really ignored there. I'll let owners of this code comment on this then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.IO untriaged New issue has not been triaged by the area owner
Projects
None yet
Development

No branches or pull requests

2 participants