Finding duplicate files on Linux is a common task, and since you are already comfortable with the terminal, there are a few ways to do this—ranging from a quick manual check to using specialized tools.
1. The Pro Way: Using fdupes or jdupes
This is the most efficient method. fdupes (or the faster, more modern version jdupes) scans directories for files with identical sizes and then compares their MD5 signatures to confirm they are exact duplicates.
Installation:
sudo apt update
sudo apt install jdupes
How to use it:
To find and list duplicates:
jdupes /home/shermanchen/my_folderTo see the size of space wasted by duplicates:
jdupes -S /home/shermanchen/my_folderTo delete duplicates automatically (keeping the first one found):
jdupes -dN /home/shermanchen/my_folder
2. The Built-in Way (No installation required)
If you don't want to install new software, you can use a combination of standard Linux tools. This command creates an MD5 hash for every file, sorts them, and highlights the ones that appear more than once.
find . -type f -exec md5sum {} + | sort | uniq -w32 -dD
Breakdown of what this does:
find . -type f: Finds all files in the current directory.-exec md5sum {} +: Calculates the unique "fingerprint" (MD5) for every file.sort: Groups identical fingerprints together.uniq -w32 -dD: Compares the first 32 characters (the hash) and prints only the lines that are duplicates.
3. Visual Way: Using fslint or Czar
If you prefer a GUI (Graphic User Interface) to safely look at the files before deleting them, you can use FSlint.
sudo apt install fslint
Once installed, search for "FSlint" in your Ubuntu dashboard. It has a dedicated "Duplicates" tab that lets you search and select files to delete with a checkbox.
A Warning for your Backup
Since you just ran an rsync backup, be very careful when running duplicate finders on your backup drive.
If you use a tool like jdupes with the -L (link) option, it can turn duplicates into hard links. This saves space by making two file names point to the same data on the disk, which is a great way to optimize a server without actually losing data.
Which directory are you planning to scan? If it's a massive media folder, I highly recommend jdupes for its speed.
No comments:
Post a Comment