A small CLI utility used to find duplicate files.
- Project home
- Overview
- Features
- Changelog
- Requirements
- Installation
- Configuration Options
- Examples
- License
- References
See our GitHub repo for the latest code, to file an issue or submit improvements for review and potential inclusion into the project.
- Generate report
- Find duplicate files and report them via console-only output or an output CSV file
- Remove flagged files
- Process CSV file report generated earlier: if flag is set, (optionally) backup and then remove marked files
Generating a report is the first step towards indicating which files from a duplicate file set that you wish to remove (specified explicitly) and which you wish to keep (default behavior).
Pruning duplicate files is an optional second step following the generation of
a duplicate files report (via the report
subcommand).
You first open the CSV file using an application like Microsoft Excel or
LibreOffice Calc and then mark each file (remove_file
column) that you wish
to remove with either true
or false
; the default is false
, so marking an
entry with false
is not strictly necessary.
Once marked, you are then able to remove those files by specifying the full
path to the CSV file (via the prune
subcommand). See the
Examples section for details.
- Efficient evaluation of potential duplicates by limiting checksum generation to two or more identically sized files
- Support for creating CSV report of all duplicate file matches
- Support for generating (rough) console equivalent of CSV file for (potential) quick review
- Support for creating Microsoft Excel workbook of all duplicate file matches
- Support for evaluating one or many paths
- Recursive or shallow directory evaluation
- Optional removal of (user-flagged) duplicate files from a previously generated CSV report
- Go modules (vs classic
GOPATH
setup)
See the CHANGELOG.md
file for the changes associated with
each release of this application. Changes that have been merged to master
,
but not yet an official release may also be noted in the file under the
Unreleased
section. A helpful link to the Git commit history since the last
official release is also provided for further review.
The following is a loose guideline. Other combinations of Go and operating systems for building and running tools from this repo may work, but have not been tested.
- Go
- see this project's
go.mod
file for preferred version - this project tests against officially supported Go
releases
- the most recent stable release (aka, "stable")
- the prior, but still supported release (aka, "oldstable")
- see this project's
- GCC
- if building with custom options (as the provided
Makefile
does)
- if building with custom options (as the provided
make
- if using the provided
Makefile
- if using the provided
- Windows 10
- Ubuntu Linux 18.04+
- Download Go
- Install Go
- Clone the repo
cd /tmp
git clone https://github.com/atc0005/bridge
cd bridge
- Install dependencies (optional)
- for Ubuntu Linux
sudo apt-get install make gcc
- for CentOS Linux
sudo yum install make gcc
- for Ubuntu Linux
- Build
- for current operating system
go build -mod=vendor ./cmd/bridge/
- forces build to use bundled dependencies in top-level
vendor
folder
- forces build to use bundled dependencies in top-level
- for all supported platforms (where
make
is installed)make all
- for Windows
make windows
- for Linux
make linux
- for current operating system
- Copy the applicable binary to whatever systems needs to run it
- if using
Makefile
: look in/tmp/release_assets/bridge/
- if using
go build
: look in/tmp/bridge/
- if using
NOTE: Depending on which Makefile
recipe you use the generated binary
may be compressed and have an xz
extension. If so, you should decompress the
binary first before deploying it (e.g., xz -d bridge-linux-amd64.xz
).
- Download the latest release binaries
- Decompress binaries
- e.g.,
xz -d bridge-linux-amd64.xz
- e.g.,
- Deploy
- Place
bridge
in a location of your choice- e.g.,
/usr/local/bin/bridge
- e.g.,
- Place
NOTE:
DEB and RPM packages are provided as an alternative to manually deploying binaries.
Option | Required | Default | Repeat | Possible | Description |
---|---|---|---|---|---|
h , help |
No | false |
No | h , help |
Show Help text along with the list of supported flags. |
console |
No | false |
No | true , false |
Dump (approximate) CSV file equivalent to console. |
csvfile |
Yes | empty string | No | valid file name characters | The fully-qualified path to a CSV file that this application should generate. |
excelfile |
No | empty string | No | valid file name characters | The fully-qualified path to a Microsoft Excel file that this application should generate. |
size |
No | 1 (byte) |
No | 0+ |
File size limit for evaluation. Files smaller than this will be skipped. |
duplicates |
No | 2 |
No | 2+ |
Number of files of the same file size needed before duplicate validation logic is applied. |
ignore-errors |
No | false |
No | true , false |
Ignore minor errors whenever possible. This option does not affect handling of fatal errors such as failure to generate output report files. |
path |
Yes | empty string | Yes | one or more valid directory paths | Path to process. This flag may be repeated for each additional path to evaluate. |
recurse |
No | false |
No | true , false |
Perform recursive search into subdirectories per provided path. |
Option | Required | Default | Repeat | Possible | Description |
---|---|---|---|---|---|
h , help |
No | false |
No | h , help |
Show Help text along with the list of supported flags. |
console |
No | false |
No | true , false |
Dump (approximate) CSV file equivalent to console. |
dry-run |
No | false |
No | true , false |
Don't actually remove files. Echo what would have been done to stdout. |
ignore-errors |
No | false |
No | true , false |
Ignore minor errors whenever possible. This option does not affect handling of fatal errors such as failure to generate output report files. |
input-csvfile |
Yes | empty string | No | valid file name characters | The fully-qualified path to a CSV file that this application should use for file removal decisions. |
backup-dir |
No | empty string | No | valid directory path | The writable directory path where files should be relocated instead of removing them. The original path structure will be created starting with the specified path as the root. |
blank-line |
No | false |
No | true , false |
Add a blank line between sets of matching files in console and file output. |
use-first-row |
No | false |
No | true , false |
Attempt to use the first row of the input file. Normally this row is skipped since it is usually the header row and not duplicate file data. |
This example illustrates using the application to process a single path, recursively.
./bridge.exe report -recurse -path "/tmp/path1" -csvfile "path1-report.csv"
This example illustrates using the application to process multiple paths, without recursively evaluating any subdirectories.
./bridge.exe report -path "/tmp/path1" -path "/tmp/path2" -csvfile "report.csv"
Accidentally typing the wrong flag results in a message like this one:
$ ./bridge.exe report -fake-flag
DEBUG: subcommand 'report'
flag provided but not defined: -fake-flag
bridge x.y.z
https://github.com/atc0005/bridge
Usage of "bridge report":
-console
Dump (approximate) CSV file equivalent to console.
-csvfile string
The (required) fully-qualified path to a CSV file that this application should generate.
-duplicates int
Number of files of the same file size needed before duplicate validation logic is applied. (default 2)
-excelfile string
The (optional) fully-qualified path to an Excel file that this application should generate.
-ignore-errors
Ignore minor errors whenever possible. This option does not affect handling of fatal errors such as failure to generate output report files.
-path value
Path to process. This flag may be repeated for each additional path to evaluate.
-recurse
Perform recursive search into subdirectories per provided path.
-size int
File size limit (in bytes) for evaluation. Files smaller than this will be skipped. (default 1)
DEBUG: err returned from reportCmd.Parse(): flag provided but not defined: -fake-flag
ERROR: flag provided but not defined: -fake-flag
./bridge.exe prune -input-csvfile "report.csv" -dry-run -ignore-errors
Here we specify:
- Don't actually remove files, just simulate the process
- input CSV file (file previously generated by the
report
subcommand) - ignore (minor) errors
Because the console
flag wasn't specified, the output is minimal.
./bridge.exe prune -input-csvfile "report.csv" -dry-run -ignore-errors -console
Here we specify:
- Don't actually remove files, just simulate the process
- input CSV file (file previously generated by the
report
subcommand) - ignore (minor) errors
console
flag- enables printing table of parsed CSV contents
- enables printing table of file removal candidates
Because the console
flag was specified, the output is more verbose.
./bridge.exe prune -input-csvfile "report.csv" -backup-dir /tmp/tacos -dry-run -ignore-errors -console
Here we specify:
- the input CSV file (file previously generated by the
report
subcommand) - the backup directory that should be used to copy files to (just before a file removal operation is attempted)
- ignore (minor) errors
console
flag- enables printing table of parsed CSV contents
- enables printing table of file removal candidates
Because the console
flag was specified, the output is more verbose. This
can make the removal process easier to troubleshoot due to the explicit
listing of what would be removed and what actually occurred.
From the LICENSE file:
MIT License
Copyright (c) 2020 Adam Chalkley
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
These utility functions are provided by Stefan Nilsson under the Attribution 3.0 Unported (CC BY 3.0) license. See the References section of this document for links to additional information.
-
https://yourbasic.org/golang/formatting-byte-size-to-human-readable-format/
-
https://stackoverflow.com/questions/28322997/how-to-get-a-list-of-values-into-a-flag-in-golang
-
https://stackoverflow.com/questions/50324612/merge-maps-in-golang/50325337#50325337
-
https://www.digitalocean.com/community/tutorials/understanding-defer-in-go
-
https://www.linode.com/docs/development/go/creating-reading-and-writing-files-in-go-a-tutorial/
-
https://medium.com/@sebassegros/golang-dealing-with-maligned-structs-9b77bacf4b97
-
https://goenning.net/2017/01/25/adding-custom-data-go-binaries-compile-time/
- covers updating variables at build time, particularly sub-packages (GH-55)
-
https://github.com/360EntSecGroup-Skylar/excelize