Rsync to just delete files on destination when missing from source
I have this situation where I have a huge number of images (about 50 millions, with 3-4 versions of each one), organized in a nested tree of directories, like images/103/045/475/example-{format}.jpg.
This immense catalog of images is replicated from our internal “master” to a CDN-like box. Sometimes, the replication is out of sync and some images a destroyed on the master but on the slave.
It’s not a surprise that Rsync has the right set of options to deal with this :
$ rsync --recursive --delete --ignore-existing --existing --prune-empty-dirs --verbose src/ dst/
Let me explain each option.
--recursivewill explore the whole directory tree, not just the first level.--deletewill remove files indstthat are not insrc.--ignore-existingwill not update any file indst--existingwill not create any file indst.--prune-empty-dirswill remove empty directories indst, not just deleting files.--verbosewill log what it does.
By not trying to compare the files, it’s much faster, but of course it’s only cleanup, not a real synchronization.
You can also run this a first time with --dry-run to print each action instead of executing them, to verify that Rsync does what you want.
The complete list of options is available in the man page.