Support for excluding files from analysis #69

jvassev · 2021-03-01T12:30:14Z

Hi,
I am using rdfind to find duplicate files in single folder which is used by a downloader service. Occasionally, the downloader fetches the same file under a different name and rdfind is able to successfully dedup.

I noticed that rdfind (which I run in a loop every 30s) does a lot redundant work while the file is being downloaded.

Is it possible to say "ignore file that were modified in the last X seconds from analysis"? Or maybe use a globbing pattern to exclude *.part files?

bes-internal · 2021-03-01T13:20:26Z

You can use system find as in an example in man rfdind:

 Search for duplicate files in directories called foo:
              find . -type d -name foo -print0 |xargs -0 rdfind

In your case is something like this:

find pathtodir -type f  ! -name '*.part' -print0 |xargs -0 rdfind
find pathtodir -type f  -mmin 1 -print0 |xargs -0 rdfind
                          ^-- File's data was last modified n minutes ago.

pauldreik · 2021-08-12T18:10:58Z

Sorry for the late answer. The suggestion by @bes-internal is excellent!

entrity · 2024-03-03T22:59:50Z

I think the xargs answer isn't sufficient in large collections. I want to do this on a large tree, and I get xargs: argument line too long.

pauldreik added the question A question, not a bug. label Aug 12, 2021

pauldreik closed this as completed Aug 12, 2021

cfgnunes mentioned this issue May 4, 2024

[Feature request] Introduce functionality to exclude specific filenames from analysis #157

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for excluding files from analysis #69

Support for excluding files from analysis #69

jvassev commented Mar 1, 2021

bes-internal commented Mar 1, 2021 •

edited

Loading

pauldreik commented Aug 12, 2021

entrity commented Mar 3, 2024

Support for excluding files from analysis #69

Support for excluding files from analysis #69

Comments

jvassev commented Mar 1, 2021

bes-internal commented Mar 1, 2021 • edited Loading

pauldreik commented Aug 12, 2021

entrity commented Mar 3, 2024

bes-internal commented Mar 1, 2021 •

edited

Loading