Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

duperemove sees unchanged files as changed and eventually hangs after checksumming #345

Closed
klausman opened this issue Jul 3, 2024 · 3 comments

Comments

@klausman
Copy link

klausman commented Jul 3, 2024

Version: current HEAD, 9e97c82

Command line: duperemove -r -d -h -v --debug /store

Symptoms:

  • Reports unchanged files as changed, e.g.:
    process_extents: unable to get extent
    file /store/klausman/arma/repos/maps/@isla_duala/addons/isladuala3.pbo changed
    
    That file has not changed in months, if not years.
  • After checksumming all files, it just sits idle:
    file /store/klausman/arma/repos/maps/@isla_duala/addons/isladuala3.pbo changed
    file /store/klausman/git/kernel.org/log changed
    file /store/prometheus/data-2.x/chunks_head/001717 changed
    file /store/prometheus/data-2.x/wal/00019695 changed
    [3530542] idle
    [3530543] idle
    [3530544] idle
    [3530545] idle
    [3530546] idle
    [3530547] idle
    [3530548] idle
    [3530549] idle
    [3530550] idle
    [3530551] idle
    [3530552] idle
    [3530553] idle
            Files scanned: 265405/265405 (100.00%)
            Bytes scanned: 3.7TB/3.7TB (100.00%)
            File listing: completed
    
    Nothing happens after this state is reached. Both strace and ltrace just show it updating the
    progress indicators. There is very little CPU usage and no I/O on the mounted filesystem.

I am unsure if the two symptoms are connected or not. I am pretty sure this is not issue #305 since that seems to be tied to a peculiar filename. I have not tried using the sync method described in issue #319, since checksumming close to 4TB of data is taking a long time as it is.

I have purposefully not used a hashfile, to make sure it isn't a race on the WAL used for that.

@gabldotink
Copy link

gabldotink commented Aug 23, 2024

I have the same problem, except it successfully finishes the deduping stage (although it found nothing to dedupe, likely due to “unable to get extent” and “file changed”). Changing settings such as block size, partial and same, IO threads, and hashfile haven’t changed anything.1

For me, btrfs check (with checksums) and btrfs scrub found no errors. The partition is encrypted with LUKS, compressed with zlib:9, and mounted with strictatime.

Many of the files were created by a btrfs send and receive. Could this contribute to the problem?

Is “file changed” ever supposed to appear when you’re not using a hashfile? I’m not able to read the source code very well.

Update: As a temporary fix, I am able to dedupe the files that give this error by making a new non-reflinked copy. (Edit: This stops fixing the problem sometimes, so this is not a perfect solution. I don’t know the cause.)

Footnotes

  1. The table of properties is created, the filenames are recorded (twice? I don’t know what’s normal), and then it ends with about 100 bytes of essentially-blank data. I can provide more info if needed.

@JackSlateur
Copy link
Collaborator

Hello,
Thank you for the report

I reproduced the bug and fixed it
Please feel free to reopen the issue if not

@klausman
Copy link
Author

klausman commented Sep 7, 2024

I have tried current head (8d5921e) on three filesystems where I encountered the lockups. No issue so far, so this looks like it's fixed indeed. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants