Rsync to just delete files on destination when missing from source

2014-08-21

I have this situation where I have a huge number of images (about 50 millions, with 3-4 versions of each one), organized in a nested tree of directories, like images/103/045/475/example-{format}.jpg.

This immense catalog of images is replicated from our internal “master” to a CDN-like box. Sometimes, the replication is out of sync and some images a destroyed on the master but on the slave.

It’s not a surprise that Rsync has the right set of options to deal with this :

$ rsync --recursive --delete --ignore-existing --existing --prune-empty-dirs --verbose src/ dst/

Let me explain each option.

  • --recursive will explore the whole directory tree, not just the first level.
  • --delete will remove files in dst that are not in src.
  • --ignore-existing will not update any file in dst
  • --existing will not create any file in dst.
  • --prune-empty-dirs will remove empty directories in dst, not just deleting files.
  • --verbose will log what it does.

By not trying to compare the files, it’s much faster, but of course it’s only cleanup, not a real synchronization.

You can also run this a first time with --dry-run to print each action instead of executing them, to verify that Rsync does what you want.

The complete list of options is available in the man page.