Retry after errors, with exponential backup (in Ruby)


There are situations where some errors can occur. Let’s say you connect to a remote service, like a database or an API over HTTP. An error raised by your client is not always permanent. It might be a network glitch or something else.

Here is an attempt (in Ruby) to retry on error, with a longer sleep time between attempts.

class WhateverException < StandardError; end

debug_counter = 0
sleep_times = [0.1, 0.2, 0.5, 1]

  fail WhateverException, "counter=#{debug_counter += 1}"
rescue WhateverException
  if time = sleep_times[(nb_retries ||= 0)]
    sleep time
    puts "retry #{nb_retries} after #{time}s"
    nb_retries += 1

The 2 first lines are just context ; an exception class and a counter for debugging purposes.

sleep_times = [0.1, 0.2, 0.5, 1] is an array of times in seconds that I want to wait at each attempt.

The begin/rescue block allow to rescue the exception when it occurs, but also the retry (see later).

When an expected exception occurs, Ruby executes the body of the rescue part. It takes the first sleep time, wait that long, puts a debug line of text (that you’ll want to remove or change to an audit log message), increments the number of attempts and executes the retry statement.

A retry statement rolls back to the previous begin block and executes it again, without any condition. That’s why we have to deal with a maximum number of attempts or it will loop forever.

If we reach the end of the sleep_times array of times, Ruby will return nil and the if condition will fail. The original exception is raised again, as is.

Here is the output of this “script” :

ruby ~/tmp/retry.rb
retry 0 after 0.1s
retry 1 after 0.2s
retry 2 after 0.5s
retry 3 after 1s
/Users/jlecour/tmp/retry.rb:6:in `': counter: 5 (WhateverException)

Remember that in Ruby raise and fail are exactly the same method, but as Jim Weirich was saying :

Because I use exceptions to indicate failures, I almost always use the “fail” keyword rather than the “raise” keyword in Ruby. Fail and raise are synonyms so there is no difference except that “fail” more clearly communcates that the method has failed. The only time I use “raise” is when I am catching an exception and re-raising it, because here I’m not failing, but explicitly and purposefully raising an exception.