Skip to content

0-byte file is the result of copying a file to itself with DVCFileSystem.get_file with any file larger than COPY_PBAR_MIN_SIZE #318

@adamliter

Description

@adamliter

Bug report

If you use DVCFileSystem's get_file method to copy a file to itself, you'll get a file with size of 0 bytes if the file size greater than COPY_PBAR_MIN_SIZE. However, if the file size is less than COPY_PBAR_MIN_SIZE, you'll get the original file back.

You end up with a 0-byte file because of this code here.

Current behavior

$ cd /tmp
$ mkdir dvc-test
$ cd dvc-test
$ pdm init --python cpython@3.12
$ git init
$ dvc init
$ git add .
$ git commit -m "initial commit"
$ truncate -s 2G model.ckpt
$ dvc add model.ckpt
$ git add .
$ git commit -m "trained model"
$ ls -lh model.ckpt
-rw-r--r--@ 1 adam.liter  wheel   2.0G Dec  9 17:21 model.ckpt

Then from Python (e.g., pdm run python):

from dvc.api import DVCFileSystem
fs = DVCFileSystem()
fs.get_file("model.ckpt", "model.ckpt")

Now go back to a shell and check the file size:

$ ls -lh model.ckpt
-rw-r--r--@ 1 adam.liter  wheel     0B Dec  9 17:25 model.ckpt

Expected behavior

The behavior of dvc_objects.fs.utils.copyfile should be the same for all files, regardless of file size. In particular, if copying a file to itself when the file size is greater than COPY_PBAR_MIN_SIZE, the result should not be a 0-byte file.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions