Add SLR indicator header #65

dannycohen · 2013-09-30T08:07:23Z

As Opie, I would like to be notified when there are many or increasing number of SLRs performed, so I can investigate further if this is an indication of system instability (and I will be able to take corrective actions before it deteriorates further).

Implementation proposal:
SecondLevelRetriesPerformed : true|{false}

Note:
In order to conserve header space, avoid sending the header when SLR were not performed (i.e. the very existence of the header is an indication of it being true; the value of false is optional).

andreasohlund · 2013-10-01T12:57:24Z

@johnsimons please go ahead with this one

indualagarsamy · 2013-11-06T08:55:49Z

@dannycohen - what is the associated issue in SP?

dannycohen · 2013-11-06T11:54:01Z

This is currently a bug affecting existing functionality in SI (Particular/ServiceInsight#28)

There are plans for a post V1 SP indicator for SLR requests, but its a low prio in the backlog. Why ?

johnsimons · 2013-11-06T12:13:38Z

I thought we came to the conclusion that we can't supply this information
because those headers are wiped by NSB.
Also those headers are an internal implementation concern that can change.
On top of it SLRs can be turned off by the user or/and completely custom
replaced.

On Wednesday, November 6, 2013, Danny Cohen wrote:

This is currently a bug affecting existing functionality in SI (
Particular/ServiceInsight#28 Particular/ServiceInsight#28
)

There are plans for a post V1 SP indicator for SLR requests, but its a low
prio in the backlog. Why ?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/65#issuecomment-27866387
.

Regards
John Simons
NServiceBus

dannycohen · 2013-11-06T13:20:17Z

I thought we came to the conclusion that we can't supply this information
because those headers are wiped by NSB.

I was under the impression the number of SLR's is wiped, but that we can check whether or nor SLR's occurred (i.e. a boolean parameter).

Also those headers are an internal implementation concern that can change.

The fact that retries occurr repeatedly may indicate the system is somewhat unstable. A growing trend of retries is similarly significant to Opie and team.
Therefore, although it may be implemented as an internal feature, but it is of value and importance to users.

andreasohlund · 2013-11-06T13:20:45Z

Correct, we can provide a boolean parameter

On Wed, Nov 6, 2013 at 2:20 PM, Danny Cohen notifications@github.comwrote:

I thought we came to the conclusion that we can't supply this information
because those headers are wiped by NSB.

I was under the impression the number of SLR's is wipred, but that we cna
check whether or nor SLR's occurred (i.e. a boolean parameter).

Also those headers are an internal implementation concern that can change.

The fact that retries occurr repeatedly may indicate the system is
somewhat unstable. A growing trend of retries is similarly significant to
Opie and team.
Therefore, although it may be implemented as an internal feature, but it
is of value and importance to users.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/65#issuecomment-27872361
.

indualagarsamy · 2013-11-13T00:41:49Z

@dannycohen @andreasohlund - After a discussion with @johnsimons and @SimonCropp, rather than add a header which is a boolean to say if SLR was invoked it makes more sense to add a header which provides the total number of retries that occured, and this information will be available on messages that were successfully processed. So, by looking at the audit message and looking at the number of retries, we can identify problems.

This count will be the net total (of FLR + SLR) that occured.

Will that satisfy the requirement?

johnsimons · 2013-11-13T01:39:26Z

This will allow the users to sort by this column in descending order and see which messages get retried most :)

dannycohen · 2013-11-13T07:06:34Z

@indualagarsamy / @johnsimons / @andreasohlund -

Having the SLR count is nice, and that was the original content of the existing header.
Due to a bug fix, we concluded with @johnsimons and @andreasohlund that this is not possible, and that a boolean value for SLR is the best remaining option.
- Details:
If you can pass it as a number - great.
If you can pass it as a boolean parameter - more than good enough.

Let me know.

johnsimons · 2013-11-13T07:33:26Z

What we are proposing is for us to have a new header that will contain the number of times that message was retried.

dannycohen · 2013-11-13T07:38:14Z

What we are proposing is for us to have a new header that will contain the number of times that message was retried.

There is such a header and it is actively removed in order to fix a bug (see Particular/ServiceInsight#20 (comment)).

Are you saying we can un-remove it ?

johnsimons · 2013-11-13T07:46:13Z

No, I'm proposing a completely new header :)

On 13 November 2013 18:38, Danny Cohen notifications@github.com wrote:

What we are proposing is for us to have a new header that will contain the
number of times that message was retried.

There is such a header and it is actively removed in order to fix a bug
(see Particular/ServiceInsight#20 (comment)Particular/ServiceInsight#20 (comment)
).

Are you saying we can un-remove it ?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/65#issuecomment-28372929
.

Regards
John Simons
NServiceBus

dannycohen · 2013-11-13T07:59:04Z

@johnsimons - Whatever makes you happy... :-)

dannycohen · 2013-12-13T07:26:34Z

@indualagarsamy / @andreasohlund / @johnsimons - guys, where are we on this ?

This causes a bug in SI existing display. See Use NServiceBus.Retries.Timestamp header to indicate SLR attempts ServiceInsight#28 (comment)
We may want to hide the display to allow us to defer the retry header bug fix.
Either way - we need to decide (once and for all) what is to be done and when.

// CC @HEskandari

andreasohlund · 2013-12-13T07:42:32Z

This turned out to be nontrivial, can we bump this?

On Fri, Dec 13, 2013 at 8:26 AM, Danny Cohen notifications@github.comwrote:

@indualagarsamy https://github.com/indualagarsamy / @andreasohlundhttps://github.com/andreasohlund/
@johnsimons https://github.com/johnsimons - guys, where are we on this ?

This causes a bug in SI existing display. See Use NServiceBus.Retries.Timestamp header to indicate SLR attempts ServiceInsight#28
(comment)Use NServiceBus.Retries.Timestamp header to indicate SLR attempts ServiceInsight#28 (comment)

We may want to hide the display to allow us to defer the retry
header bug fix.

Either way - we need to decide (once and for all) what is to be done
and when.

// CC @HEskandari https://github.com/HEskandari

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/65#issuecomment-30491317
.

andreasohlund · 2013-12-13T07:42:57Z

@indualagarsamy https://github.com/indualagarsamy can you add the details
on the issues we found

On Fri, Dec 13, 2013 at 8:42 AM, Andreas Öhlund <
andreas.ohlund@particular.net> wrote:

This turned out to be nontrivial, can we bump this?

On Fri, Dec 13, 2013 at 8:26 AM, Danny Cohen notifications@github.comwrote:

@indualagarsamy https://github.com/indualagarsamy / @andreasohlundhttps://github.com/andreasohlund/
@johnsimons https://github.com/johnsimons - guys, where are we on this
?

This causes a bug in SI existing display. See Use NServiceBus.Retries.Timestamp header to indicate SLR attempts ServiceInsight#28
(comment)Use NServiceBus.Retries.Timestamp header to indicate SLR attempts ServiceInsight#28 (comment)

We may want to hide the display to allow us to defer the retry
header bug fix.

Either way - we need to decide (once and for all) what is to be
done and when.

// CC @HEskandari https://github.com/HEskandari

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/65#issuecomment-30491317
.

dannycohen · 2013-12-13T07:48:09Z

@andreasohlund -

can we bump this?

NP, but it would be good to record the diagnosis and the prognosis.

andreasohlund · 2013-12-13T08:05:41Z

Indu has the details

In short: We need the new pipeline hooks in NServiceBus 4.4.0 to pull this
off

On Fri, Dec 13, 2013 at 8:48 AM, Danny Cohen notifications@github.comwrote:

@andreasohlund https://github.com/andreasohlund -

can we bump this?

NP, but it would be good to record the diagnosis and the prognosis.

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/65#issuecomment-30492040
.

dannycohen · 2013-12-13T09:21:08Z

In short: We need the new pipeline hooks in NServiceBus 4.4.0 to pull this
off

I'm fine with that.

johnsimons · 2013-12-15T23:58:46Z

This turned out to be nontrivial, can we bump this?

What was the issue @indualagarsamy ?

indualagarsamy · 2013-12-16T05:35:40Z

Trying to add the header, we realized that there is no easy way to count the FLRs the way it stands now.

johnsimons · 2013-12-16T05:53:14Z

What is wrong with adding it in https://github.com/Particular/NServiceBus/blob/develop/src/NServiceBus.Core/Unicast/Transport/FirstLevelRetries.cs#L49 ?

dannycohen · 2014-02-11T13:24:51Z

Assigning to backlog, pending SC / SP roadmap alignment

johnsimons · 2014-04-21T06:32:00Z

It can be reopen if we decide to do it

udidahan · 2014-08-11T09:43:31Z

There is another stakeholder here. As a salesperson for Particular Software, when talking to other departments at our larger customers, I would like to be able to tell them how many messages NServiceBus not only prevented from failing but handled those failures automatically without any manual intervention thus clarifying the value proposition versus the team using HTTP or rolling their own solution on top of a queue.

dannycohen · 2014-08-11T13:36:18Z

@udidahan - how would you prioritize it ? (post LM3, of course)

udidahan · 2014-08-11T14:33:52Z

That's a question to be answered via the Trello board, taking into account all the other things we have for LM4.

andreasohlund · 2015-08-27T11:12:33Z

I think the way to go here is that instead of a header we should create a plugin that hooks into the notifications, http://docs.particular.net/nservicebus/errors/subscribing-to-push-based-error-notifications. This way we can let the users set threshold for flr/slr per second and have the plugin notify SC when that happens?

Arguable this could be implemented and distributed as a custom check?

We could start by adding this as a sample to gauge interest?

johnsimons · 2016-02-17T00:56:51Z

@Particular/servicecontrol-maintainers
I propose we close this one until we ready to tackle it?

SimonCropp · 2016-02-17T02:58:29Z

@johnsimons if u close it how would distinguish it from all the other issues that are closed "as fixed"?

johnsimons · 2016-02-17T03:02:43Z

@SimonCropp no milestone associated

SimonCropp · 2016-02-17T03:04:17Z

@johnsimons so u regularly go through the 268 closed issues with no milestone as part of your backlog pruning? https://github.com/Particular/ServiceControl/issues?q=is%3Aissue+no%3Amilestone+is%3Aclosed

johnsimons · 2016-02-17T03:12:21Z

Not sure what you are talking about @SimonCropp.
If we close this issue and later this topic resurfaces, I am pretty sure we can find it and reopen.

andreasohlund · 2016-02-17T07:49:02Z

I have a demo going that shows how to do this without a header and instead using a custom check to "alert" in SP when SLR rates go up. How about I get that recorded and we close this one as won't fix?

andreasohlund · 2016-02-17T07:50:52Z

Still agree with @SimonCropp that we shouln't just close issues that we still might act on. Perhaps move it some where or add a bullet point to some more generic "crazy SC ideas" issue.

How about we discuss this in plat dev since this is relevant for all our repos?

mauroservienti · 2016-02-17T08:02:41Z

Still think agree with @SimonCropp that we shouln't just close issues that we still might act on.

Agree.

johnsimons · 2016-02-17T22:15:41Z

we shouln't just close issues that we still might act on

In my view, closing a feature request does not imply that we will not be acting on, it just means that right now the focus is elsewhere. It just help us concentrate on issues/features that are currently important. Having this massive backlog of open issues IMO makes it difficult to figure out where we going. The way I see it, is open feature requests that we do not intend to work on (right now), are just clutering the system, and also sending the wrong message to user out there, the fact is that we have no idea when we are going to tackle any of these or even if we ever going to do it at all.

I have a demo going that shows how to do this without a header and instead using a custom check to "alert" in SP when SLR rates go up

@andreasohlund that would be great, looking forward to it.

gbiellem · 2016-02-17T23:48:28Z

@johnsimons

As @andreasohlund said let's have the discussion about do we or don't we close issues in platform dev.

mauroservienti · 2016-02-18T06:22:04Z

yeah, let's discuss it. I don't think we should close issues, a closed issue is a lost one, IMO. If a huge backlog is putting pressure on us it's a tolling problem (and to be honest GitHUb issues suck) since we have no way (other than the waffle boards, that are not that much better) to prioritize and look only at the top N issues. Also given that we have no milestones set in stone priorities may change overtime and is very easy to lose track of closed issues that were closed because of lack of capacity or out of scope at that time.

johnsimons · 2016-02-18T06:47:58Z

@mauroservienti it is quite simple to keep track of feature requests that were closed.
This query gives u all issues that are closed and have no associated milestone and are not bugs
https://github.com/Particular/ServiceControl/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aclosed+no%3Amilestone+-label%3A%22Type%3A+Bug%22+

johnsimons · 2016-05-10T03:00:19Z

This is currently not prioritised to be done any time soon, I will close this one for now and if we ever go down this path we can reopen it

ghost assigned johnsimons Sep 30, 2013

dannycohen mentioned this issue Sep 30, 2013

Use NServiceBus.Retries.Timestamp header to indicate SLR attempts Particular/ServiceInsight#28

Closed

indualagarsamy pushed a commit to Particular/NServiceBus that referenced this issue Nov 12, 2013

Fixes Particular/ServiceControl#65

81b1edc

ghost assigned indualagarsamy Nov 19, 2013

This was referenced Dec 13, 2013

Remove "Second Level Retries" from Context menu Particular/ServiceInsight#160

Closed

Remove "Retries" property from property pane Particular/ServiceInsight#161

Closed

dannycohen added this to the Backlog milestone Feb 11, 2014

johnsimons closed this as completed Apr 21, 2014

johnsimons removed this from the Backlog milestone Apr 21, 2014

dannycohen mentioned this issue May 18, 2014

Publicly expose the current FLR and SLR count Particular/NServiceBus#2108

Closed

udidahan reopened this Aug 11, 2014

SimonCropp added the Feature label Oct 21, 2014

andreasohlund removed the enhancement label Nov 13, 2014

andreasohlund self-assigned this May 11, 2015

andreasohlund removed their assignment Aug 27, 2015

johnsimons closed this as completed May 10, 2016

Add SLR indicator header #65

Add SLR indicator header #65

Comments

dannycohen commented Sep 30, 2013

andreasohlund commented Oct 1, 2013

indualagarsamy commented Nov 6, 2013

dannycohen commented Nov 6, 2013

johnsimons commented Nov 6, 2013

dannycohen commented Nov 6, 2013

andreasohlund commented Nov 6, 2013

indualagarsamy commented Nov 13, 2013

johnsimons commented Nov 13, 2013

dannycohen commented Nov 13, 2013

johnsimons commented Nov 13, 2013

dannycohen commented Nov 13, 2013

johnsimons commented Nov 13, 2013

dannycohen commented Nov 13, 2013

dannycohen commented Dec 13, 2013

andreasohlund commented Dec 13, 2013

andreasohlund commented Dec 13, 2013

dannycohen commented Dec 13, 2013

andreasohlund commented Dec 13, 2013

dannycohen commented Dec 13, 2013

johnsimons commented Dec 15, 2013

indualagarsamy commented Dec 16, 2013

johnsimons commented Dec 16, 2013

dannycohen commented Feb 11, 2014

johnsimons commented Apr 21, 2014

udidahan commented Aug 11, 2014

dannycohen commented Aug 11, 2014

udidahan commented Aug 11, 2014

andreasohlund commented Aug 27, 2015

johnsimons commented Feb 17, 2016

SimonCropp commented Feb 17, 2016

johnsimons commented Feb 17, 2016

SimonCropp commented Feb 17, 2016

johnsimons commented Feb 17, 2016

andreasohlund commented Feb 17, 2016

andreasohlund commented Feb 17, 2016

mauroservienti commented Feb 17, 2016

johnsimons commented Feb 17, 2016

gbiellem commented Feb 17, 2016

mauroservienti commented Feb 18, 2016

johnsimons commented Feb 18, 2016

johnsimons commented May 10, 2016