Rsync to just delete files on destination when missing from source
I have this situation where I have a huge number of images (about 50 millions, with 3-4 versions of each one), organized in a nested tree of directories, like
This immense catalog of images is replicated from our internal “master” to a CDN-like box. Sometimes, the replication is out of sync and some images a destroyed on the master but on the slave.
It’s not a surprise that Rsync has the right set of options to deal with this :
$ rsync --recursive --delete --ignore-existing --existing --prune-empty-dirs --verbose src/ dst/
Let me explain each option.
--recursivewill explore the whole directory tree, not just the first level.
--deletewill remove files in
dstthat are not in
--ignore-existingwill not update any file in
--existingwill not create any file in
--prune-empty-dirswill remove empty directories in
dst, not just deleting files.
--verbosewill log what it does.
By not trying to compare the files, it’s much faster, but of course it’s only cleanup, not a real synchronization.
You can also run this a first time with
--dry-run to print each action instead of executing them, to verify that Rsync does what you want.
The complete list of options is available in the man page.