Spacelink

Identify duplicate files on a Linux filesystem and generate a script to hardlink them.

How it works

Scans the source folder and maps out their file names, file sizes, and index nodes (inodes) on the drive
Checks the scan folder for files with an identical name and file size
- This method of detection is not as reliable as a file hash check, however, for the use case it is likely fine (see case study)
- The inode of the file in the scan folder is compared to the inode inside the map (from the source folder)
  - If the inode matches, the file is already a hard link and is ignored
  - If the inode does not match, the file is noted as a duplicate
All identified duplicate files get de-duplication instructions written to the output file, which will look similar to the following:
- rm 'scanFilename'
- ln 'sourceFilename' 'scanFilename'

./spacelink --source <path> --scan <path> --output <file name>

The source folder contains the original files you want to keep
The scan folder contains the duplicates you want to remove and replace with hardlinks
The output file is the path where the de-duplication script is written to. This can be anywhere on your filesystem, and doesn't have to match the folders you're scanning.
Do not include trailing slashes on source or scan arguments

Transmission (a download client) downloaded large files to /downloads
Plex Media Server had a library on the folder /tv
A third application (Sonarr) detects new files in /downloads and hard-links them to /tv
Due to permission issues, Sonarr copied the files from downloads rather than hard-linking them
This resulted in unneccassary duplicate data being written to the drive
After consolidating the files using Spacelink, over 47 GB of disk space was saved

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
inc		inc
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
filescanner.cpp		filescanner.cpp
main.cpp		main.cpp