Rsync to just delete files on destination when missing from source
I have this situation where I have a huge number of images (about 50 millions, with 3-4 versions of each one), organized in a nested tree of directories, like images/103/045/475/example-{format}.jpg
.
This immense catalog of images is replicated from our internal “master” to a CDN-like box. Sometimes, the replication is out of sync and some images a destroyed on the master but on the slave.
It’s not a surprise that Rsync has the right set of options to deal with this :
$ rsync --recursive --delete --ignore-existing --existing --prune-empty-dirs --verbose src/ dst/
Let me explain each option.
--recursive
will explore the whole directory tree, not just the first level.--delete
will remove files indst
that are not insrc
.--ignore-existing
will not update any file indst
--existing
will not create any file indst
.--prune-empty-dirs
will remove empty directories indst
, not just deleting files.--verbose
will log what it does.
By not trying to compare the files, it’s much faster, but of course it’s only cleanup, not a real synchronization.
You can also run this a first time with --dry-run
to print each action instead of executing them, to verify that Rsync does what you want.
The complete list of options is available in the man page.