Add a Jitter strategy to the Retry policy #245

CESARDELATORRE · 2017-05-04T22:51:52Z

I think this is not available in Polly's retry policy.
Basically, a regular Retry policy can impact your system in cases of high concurrency and scalability and under high contention.
Would be great to have it Polly and not very complicated to add Jitter to the retry algorithm/poilicy.
It'd improve the overall performance to the end-to-end system by adding randomness to the exponential backoff. It'd spread out the spikes when issues arise.

The problem is explained here:
https://brooker.co.za/blog/2015/03/21/backoff.html
https://www.awsarchitectureblog.com/2015/03/backoff.html

If this is already available in Polly, please, tell me how to implement its usage.
Thanks

reisenberger · 2017-05-04T23:08:28Z

hey @CESARDELATORRE . Great idea! (seen this in a Java resilience library)

It could already be achieved with Polly, by using one of the .WaitAndRetry(...) configuration overloads which allow you to specify a Func<..., TimeSpan> for the amount of wait. (Similar overloads exist for async.)

Something like:

Random jitterer = new Random(); 
Policy
  .Handle<HttpResponseException>() // etc
  .WaitAndRetry(5,  
      retryAttempt => TimeSpan.FromSeconds(Math.Pow(2, retryAttempt))  // exponential back-off
                    + TimeSpan.FromMilliseconds(jitterer.Next(0, 100)) // plus some jitter
  );

Does this cover it?

Such a great idea, I'll aim to add a wiki 'how to' page and/or blog it.

EDIT: Thx for the references! Reading the articles in detail, they explore a more sophisticated range of jitter algorithms than the small amount of jitter in my example above. However, the principle of how to do this with Polly is the same: using the Func<..., Timespan> overloads, you have complete control to adopt whatever randomness/jitter algorithm you like.

reisenberger · 2017-05-05T00:42:59Z

Alternative: I note that the 'Decorrelated Jitter' suggested in this article appears to use an Accumulator approach (next value depends on the preceding one).

To implement this with Polly, another alternative could be a WaitAndRetry() overload taking an IEnumerable sleepDurations, and use the standard LINQ Aggregate for the Accumulator?

CORRECTION: It needs a yield return type approach (see below).

KennethWKZ · 2017-08-04T01:25:01Z

@reisenberger tried to use decorrelated jitter that part of code but got error with not able to convert class TimeSpan to IEnumerable

Anyway to solve this? not very familiar on this.

reisenberger · 2017-08-04T16:30:30Z

@KennethWKZ I have revised the previous sketch to use a yield return type approach, per below. Please let us know if you need anything else on this.

public static IEnumerable<TimeSpan> DecorrelatedJitter(int maxRetries, TimeSpan seedDelay, TimeSpan maxDelay)
{
    Random jitterer = new Random();
    int attempt = 0;

    double seed = seedDelay.TotalMilliseconds;
    double max = maxDelay.TotalMilliseconds;
    double current = seed;

    while (attempt++ <= maxRetries) // EDIT: As pointed out in a later comment, this boundary check allows one more retry than prescribed.
    {
        current = Math.Min(max, Math.Max(seed, current * 3 * jitterer.NextDouble())); // adopting the 'Decorrelated Jitter' formula from https://www.awsarchitectureblog.com/2015/03/backoff.html.  Can be between seed and previous * 3.  Mustn't exceed max.
        yield return TimeSpan.FromMilliseconds(current);
    }
}

Used as (for example) :

    Policy retryWithDecorrelatedJitter = Policy
        .Handle<WhateverException>()
        .WaitAndRetry(DecorrelatedJitter(maxRetries, seedDelay, maxDelay));

KennethWKZ · 2017-08-04T18:07:56Z

@reisenberger Thanks!!! Now I'm more understanding how it's works~!

KennethWKZ · 2017-08-05T09:21:45Z

@reisenberger

current = Math.Min(max, Math.Max(seed, current * 3 * jitterer.NextDouble())); // adopting the 'Decorrelated Jitter' formula from the quoted article. //Can be between seed and previous * 3. Mustn't exceed max.

May I asking this part? Is it me should set the seed lower than max? Example: max is 1000ms, so seed will be 100ms?

reisenberger · 2017-08-05T13:45:58Z

@KennethWKZ Yes. As with a pure exponential-backoff strategy, the idea is that seed is a low-ish, starting value.

The algorithm will produce sleep values all of which fall between seed and max: seed should represent the minimum wait-before-retry that you want, max the max.
max >= 4 * seed works well (and bigger ratios are fine). If seed and max are any closer, values will tend to bunch at the min and max.

To see the kind of retry delays it generates, you can run up a small console app like this.

reisenberger · 2018-01-02T11:39:52Z

Closing. Detailed wiki page now created describing how to use Polly with jitter.

23W · 2018-01-02T16:56:30Z

Excuse me, why do you use loop with "<=" comparison criteria in DecorrelatedJitter, while (attempt++ <= maxRetries)? I think it should be strict "<", shouldn't it? So right code is:

public static IEnumerable<TimeSpan> DecorrelatedJitter(int maxRetries, TimeSpan seedDelay, TimeSpan maxDelay)
{
    Random jitterer = new Random();
    int attempt = 0;

    double seed = seedDelay.TotalMilliseconds;
    double max = maxDelay.TotalMilliseconds;
    double current = seed;

    while (attempt++ < maxRetries)
    {
        current = Math.Min(max, Math.Max(seed, current * 3 * jitterer.NextDouble()));
        yield return TimeSpan.FromMilliseconds(current);
    }
}

reisenberger · 2018-01-02T17:04:37Z

@23W I agree. I have annotated the above and corrected it in both the wiki page and github gist example. Thank you for catching this.

sahir · 2019-09-25T11:37:22Z

@CESARDELATORRE is there any way to perform 25 retries over approximately 21 days using jitter strategy on the Retry Policy?

reisenberger · 2019-09-25T19:14:14Z

Hi @sahir , thanks for the q.

Polly only runs in process, in memory, and has no persistent backing store for retry state. It is not designed for retry loops that might span several days. While it's theoretically possible, if the process running such a long retry loop with Polly were to crash, the retry state would be lost. The use cases Polly targets are instead short-lived, transient faults.

There are no plans to take Polly in the direction of having a backing store for long-lived retries, because there are already a number of solutions in the market for scheduled job engines with persistence. eg in Azure: Azure timer-triggered functions; Azure Durable functions with delay; or something built around timer-triggered Azure Logic Apps. In a Windows or web app: hangfire, quartz.net; Timer-driven invocations on a method in a background IHostedService in .NET Core - to give some examples. Some of these are only scheduled-job orchestrators with persistence - within that, you might have to write some code to check whether your task was done or needed further retrying, depending on your needs.

Hope that helps.

sahir · 2019-09-26T06:09:50Z

Hi, @reisenberger Thanks For The answer.

@reisenberger i retry failures with an exponential backoff using the formula (retry_count ** 4) + 15 + (rand(30) * (retry_count + 1)) (i.e. 15, 16, 31, 96, 271, ... seconds + a random amount of time). I assume It will perform 25 retries over approximately 21 days. can you please check the code.

But code is not unit Tested yet

public static class RetryWithExponentialBackoff
    {
        public static Policy GetRetryPolicyHandler()
        {
            Random random       = new Random();
            double maxDelay     = (Math.Pow(25,4) + 15 + (random.Next(30) * 25 + 1));
            TimeSpan seedDelay = TimeSpan.FromMilliseconds(100);
            Policy retryWithDecorrelatedJitter = Policy
                   .Handle<Exception>()
                   .WaitAndRetry(DecorrelatedJitter(25, seedDelay, maxDelay), onRetry: (e, t) => Console.WriteLine($"Retry delay: {t.TotalMilliseconds} ms."));

            return retryWithDecorrelatedJitter;
        }

        public static IEnumerable<TimeSpan> DecorrelatedJitter(int maxRetries, TimeSpan seedDelay, double maxDelay)
        {
            Random jitterer = new Random();
            int retries = 0;

            double seed = seedDelay.TotalMilliseconds;
            double max = maxDelay;
            double current = seed;

            while (++retries <= maxRetries)
            {
                current = Math.Min(max, Math.Max(seed, current * 3 * jitterer.NextDouble()));
                yield return TimeSpan.FromMilliseconds(current);
            }
        }
}

reisenberger · 2019-09-26T20:42:48Z

@sahir . As long as you are happy that the retry sequence will be lost if the process terminates, a Polly wait-and-retry policy (including jitter variants) can in principle permit waiting before each retry for as long as TimeSpan can be configured for (more than enough ;~).

@sahir This issue #245 is old and the code in this thread is no longer recommended. Our up-to-date jitter documentation is here. Awesome community contributors have helped bring forward new and refined jitter algorithms as part of the Polly.Contrib.WaitAndRetry package.

Let me know if there is anything else we can help with on this (edit: or open an issue on the Polly.Contrib.WaitAndRetry repo if it relates to code there). (Once we are done here, I will probably lock this issue and post a note to indicate clearly that this issue doesn't reflect current recommended jitter practice.)

reisenberger · 2019-09-26T21:38:59Z

@sahir I wanted to be sure to answer your question: "can you please check the code". Was the goal to assess the distribution of retry intervals the code would generate? (Edit: If not, please could you clarify?)

Most jitter formulae return an IEnumerable<TimeSpan> to use in the Polly policy. To explore typical retry delays generated (and experiment with the effect of changing parameter values), I would suggest starting by Console.WriteLine-ing from a small console app:

class Program
{
    static void Main(string[] args)
    {
        IEnumerable<TimeSpan> timeSpans = ... // get the enumerable you are trialling
        foreach (TimeSpan timeSpan in timeSpans)
        {
            Console.WriteLine(timeSpan.ToString()); // Use a timespan format string (https://docs.microsoft.com/en-us/dotnet/standard/base-types/standard-timespan-format-strings) or custom format string (https://docs.microsoft.com/en-us/dotnet/standard/base-types/custom-timespan-format-strings) if you want.
            // or use `Console.WriteLine(timeSpan.TotalSeconds)` or similar, to get pure numbers.
        }
    }
}

Of course, that only shows results one "run" at a time - you need to run it a few times to get a feel for the results.

If you need a more sophisticated approach, you can run a similar experiment a large number of times and aggregate the results. For example, for the new jitter algorithm in Polly.Contrib.WaitAndRetry, we aggregated data over 100000 runs and then graphed this data to assess the distribution.

sahir · 2019-09-27T06:16:59Z

@reisenberger Thanks For The answer.

reisenberger · 2019-09-28T07:23:24Z

@sahir . If your goal is to generate a broadly exponential backoff (1, 2, 4, 8 seconds etc) with some additional randomness, try the new jitter algorithm in Polly.Contrib.WaitAndRetry; the new algorithm is much more strongly correlated to exponential backoff than the one originally in this thread.

Going to lock this thread now to make it clearer that the jitter strategies here are no longer recommended, but @sahir : if we can help further, please do open another issue on Polly, or on Polly.Contrib.WaitAndRetry if your question is more directly related to the formula there.

reisenberger · 2019-09-28T07:25:40Z

Notice: The jitter code in this thread is no longer the Polly recommendation for jitter. Please see our jitter documentation for latest information.

reisenberger added the how-to label May 4, 2017

CESARDELATORRE mentioned this issue May 5, 2017

Add a Jitter strategy to the Retry policy dotnet-architecture/eShopOnContainers#188

Closed

reisenberger closed this as completed Jan 2, 2018

App-vNext locked and limited conversation to collaborators Sep 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a Jitter strategy to the Retry policy #245

Add a Jitter strategy to the Retry policy #245

CESARDELATORRE commented May 4, 2017

reisenberger commented May 4, 2017 •

edited

Loading

reisenberger commented May 5, 2017 •

edited

Loading

KennethWKZ commented Aug 4, 2017

reisenberger commented Aug 4, 2017 •

edited

Loading

KennethWKZ commented Aug 4, 2017

KennethWKZ commented Aug 5, 2017

reisenberger commented Aug 5, 2017

reisenberger commented Jan 2, 2018

23W commented Jan 2, 2018

reisenberger commented Jan 2, 2018

sahir commented Sep 25, 2019 •

edited

Loading

reisenberger commented Sep 25, 2019

sahir commented Sep 26, 2019 •

edited

Loading

reisenberger commented Sep 26, 2019 •

edited

Loading

reisenberger commented Sep 26, 2019 •

edited

Loading

sahir commented Sep 27, 2019

reisenberger commented Sep 28, 2019

reisenberger commented Sep 28, 2019

Add a Jitter strategy to the Retry policy #245

Add a Jitter strategy to the Retry policy #245

Comments

CESARDELATORRE commented May 4, 2017

reisenberger commented May 4, 2017 • edited Loading

reisenberger commented May 5, 2017 • edited Loading

KennethWKZ commented Aug 4, 2017

reisenberger commented Aug 4, 2017 • edited Loading

KennethWKZ commented Aug 4, 2017

KennethWKZ commented Aug 5, 2017

reisenberger commented Aug 5, 2017

reisenberger commented Jan 2, 2018

23W commented Jan 2, 2018

reisenberger commented Jan 2, 2018

sahir commented Sep 25, 2019 • edited Loading

reisenberger commented Sep 25, 2019

sahir commented Sep 26, 2019 • edited Loading

reisenberger commented Sep 26, 2019 • edited Loading

reisenberger commented Sep 26, 2019 • edited Loading

sahir commented Sep 27, 2019

reisenberger commented Sep 28, 2019

reisenberger commented Sep 28, 2019

reisenberger commented May 4, 2017 •

edited

Loading

reisenberger commented May 5, 2017 •

edited

Loading

reisenberger commented Aug 4, 2017 •

edited

Loading

sahir commented Sep 25, 2019 •

edited

Loading

sahir commented Sep 26, 2019 •

edited

Loading

reisenberger commented Sep 26, 2019 •

edited

Loading

reisenberger commented Sep 26, 2019 •

edited

Loading