Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SLR indicator header #65

Closed
dannycohen opened this issue Sep 30, 2013 · 41 comments
Closed

Add SLR indicator header #65

dannycohen opened this issue Sep 30, 2013 · 41 comments
Labels

Comments

@dannycohen
Copy link

As Opie, I would like to be notified when there are many or increasing number of SLRs performed, so I can investigate further if this is an indication of system instability (and I will be able to take corrective actions before it deteriorates further).

Implementation proposal:
SecondLevelRetriesPerformed : true|{false}

Note:
In order to conserve header space, avoid sending the header when SLR were not performed (i.e. the very existence of the header is an indication of it being true; the value of false is optional).

@andreasohlund
Copy link
Member

@johnsimons please go ahead with this one

@indualagarsamy
Copy link
Contributor

@dannycohen - what is the associated issue in SP?

@dannycohen
Copy link
Author

This is currently a bug affecting existing functionality in SI (Particular/ServiceInsight#28)

There are plans for a post V1 SP indicator for SLR requests, but its a low prio in the backlog. Why ?

@johnsimons
Copy link
Member

I thought we came to the conclusion that we can't supply this information
because those headers are wiped by NSB.
Also those headers are an internal implementation concern that can change.
On top of it SLRs can be turned off by the user or/and completely custom
replaced.

On Wednesday, November 6, 2013, Danny Cohen wrote:

This is currently a bug affecting existing functionality in SI (
Particular/ServiceInsight#28Particular/ServiceInsight#28
)

There are plans for a post V1 SP indicator for SLR requests, but its a low
prio in the backlog. Why ?


Reply to this email directly or view it on GitHubhttps://github.com//issues/65#issuecomment-27866387
.

Regards
John Simons
NServiceBus

@dannycohen
Copy link
Author

I thought we came to the conclusion that we can't supply this information
because those headers are wiped by NSB.

I was under the impression the number of SLR's is wiped, but that we can check whether or nor SLR's occurred (i.e. a boolean parameter).

Also those headers are an internal implementation concern that can change.

The fact that retries occurr repeatedly may indicate the system is somewhat unstable. A growing trend of retries is similarly significant to Opie and team.
Therefore, although it may be implemented as an internal feature, but it is of value and importance to users.

@andreasohlund
Copy link
Member

Correct, we can provide a boolean parameter

On Wed, Nov 6, 2013 at 2:20 PM, Danny Cohen notifications@github.comwrote:

I thought we came to the conclusion that we can't supply this information
because those headers are wiped by NSB.

I was under the impression the number of SLR's is wipred, but that we cna
check whether or nor SLR's occurred (i.e. a boolean parameter).

Also those headers are an internal implementation concern that can change.

The fact that retries occurr repeatedly may indicate the system is
somewhat unstable. A growing trend of retries is similarly significant to
Opie and team.
Therefore, although it may be implemented as an internal feature, but it
is of value and importance to users.


Reply to this email directly or view it on GitHubhttps://github.com//issues/65#issuecomment-27872361
.

indualagarsamy pushed a commit to Particular/NServiceBus that referenced this issue Nov 12, 2013
@indualagarsamy
Copy link
Contributor

@dannycohen @andreasohlund - After a discussion with @johnsimons and @SimonCropp, rather than add a header which is a boolean to say if SLR was invoked it makes more sense to add a header which provides the total number of retries that occured, and this information will be available on messages that were successfully processed. So, by looking at the audit message and looking at the number of retries, we can identify problems.

This count will be the net total (of FLR + SLR) that occured.

Will that satisfy the requirement?

@johnsimons
Copy link
Member

This will allow the users to sort by this column in descending order and see which messages get retried most :)

@dannycohen
Copy link
Author

@indualagarsamy / @johnsimons / @andreasohlund -

  1. Having the SLR count is nice, and that was the original content of the existing header.
  2. Due to a bug fix, we concluded with @johnsimons and @andreasohlund that this is not possible, and that a boolean value for SLR is the best remaining option.
  3. If you can pass it as a number - great.
  4. If you can pass it as a boolean parameter - more than good enough.

Let me know.

@johnsimons
Copy link
Member

What we are proposing is for us to have a new header that will contain the number of times that message was retried.

@dannycohen
Copy link
Author

What we are proposing is for us to have a new header that will contain the number of times that message was retried.

There is such a header and it is actively removed in order to fix a bug (see Particular/ServiceInsight#20 (comment)).

Are you saying we can un-remove it ?

@johnsimons
Copy link
Member

No, I'm proposing a completely new header :)

On 13 November 2013 18:38, Danny Cohen notifications@github.com wrote:

What we are proposing is for us to have a new header that will contain the
number of times that message was retried.

There is such a header and it is actively removed in order to fix a bug
(see Particular/ServiceInsight#20 (comment)Particular/ServiceInsight#20 (comment)
).

Are you saying we can un-remove it ?


Reply to this email directly or view it on GitHubhttps://github.com//issues/65#issuecomment-28372929
.

Regards
John Simons
NServiceBus

@dannycohen
Copy link
Author

@johnsimons - Whatever makes you happy... :-)

@ghost ghost assigned indualagarsamy Nov 19, 2013
@dannycohen
Copy link
Author

@indualagarsamy / @andreasohlund / @johnsimons - guys, where are we on this ?

// CC @HEskandari

@andreasohlund
Copy link
Member

This turned out to be nontrivial, can we bump this?

On Fri, Dec 13, 2013 at 8:26 AM, Danny Cohen notifications@github.comwrote:

@indualagarsamy https://github.com/indualagarsamy / @andreasohlundhttps://github.com/andreasohlund/
@johnsimons https://github.com/johnsimons - guys, where are we on this ?

// CC @HEskandari https://github.com/HEskandari


Reply to this email directly or view it on GitHubhttps://github.com//issues/65#issuecomment-30491317
.

@andreasohlund
Copy link
Member

@indualagarsamy https://github.com/indualagarsamy can you add the details
on the issues we found

On Fri, Dec 13, 2013 at 8:42 AM, Andreas Öhlund <
andreas.ohlund@particular.net> wrote:

This turned out to be nontrivial, can we bump this?

On Fri, Dec 13, 2013 at 8:26 AM, Danny Cohen notifications@github.comwrote:

@indualagarsamy https://github.com/indualagarsamy / @andreasohlundhttps://github.com/andreasohlund/
@johnsimons https://github.com/johnsimons - guys, where are we on this
?

// CC @HEskandari https://github.com/HEskandari


Reply to this email directly or view it on GitHubhttps://github.com//issues/65#issuecomment-30491317
.

@dannycohen
Copy link
Author

@andreasohlund -

can we bump this?

NP, but it would be good to record the diagnosis and the prognosis.

@andreasohlund
Copy link
Member

Indu has the details

In short: We need the new pipeline hooks in NServiceBus 4.4.0 to pull this
off

On Fri, Dec 13, 2013 at 8:48 AM, Danny Cohen notifications@github.comwrote:

@andreasohlund https://github.com/andreasohlund -

can we bump this?

NP, but it would be good to record the diagnosis and the prognosis.


Reply to this email directly or view it on GitHubhttps://github.com//issues/65#issuecomment-30492040
.

@dannycohen
Copy link
Author

In short: We need the new pipeline hooks in NServiceBus 4.4.0 to pull this
off

I'm fine with that.

@johnsimons
Copy link
Member

This turned out to be nontrivial, can we bump this?

What was the issue @indualagarsamy ?

@indualagarsamy
Copy link
Contributor

Trying to add the header, we realized that there is no easy way to count the FLRs the way it stands now.

@johnsimons
Copy link
Member

@dannycohen dannycohen added this to the Backlog milestone Feb 11, 2014
@dannycohen
Copy link
Author

Assigning to backlog, pending SC / SP roadmap alignment

@johnsimons
Copy link
Member

It can be reopen if we decide to do it

@udidahan
Copy link
Member

There is another stakeholder here. As a salesperson for Particular Software, when talking to other departments at our larger customers, I would like to be able to tell them how many messages NServiceBus not only prevented from failing but handled those failures automatically without any manual intervention thus clarifying the value proposition versus the team using HTTP or rolling their own solution on top of a queue.

@udidahan udidahan reopened this Aug 11, 2014
@dannycohen
Copy link
Author

@udidahan - how would you prioritize it ? (post LM3, of course)

@udidahan
Copy link
Member

That's a question to be answered via the Trello board, taking into account all the other things we have for LM4.

@andreasohlund
Copy link
Member

I think the way to go here is that instead of a header we should create a plugin that hooks into the notifications, http://docs.particular.net/nservicebus/errors/subscribing-to-push-based-error-notifications. This way we can let the users set threshold for flr/slr per second and have the plugin notify SC when that happens?

Arguable this could be implemented and distributed as a custom check?

We could start by adding this as a sample to gauge interest?

@andreasohlund andreasohlund removed their assignment Aug 27, 2015
@johnsimons
Copy link
Member

@Particular/servicecontrol-maintainers
I propose we close this one until we ready to tackle it?

@SimonCropp
Copy link
Contributor

@johnsimons if u close it how would distinguish it from all the other issues that are closed "as fixed"?

@johnsimons
Copy link
Member

@SimonCropp no milestone associated

@SimonCropp
Copy link
Contributor

@johnsimons so u regularly go through the 268 closed issues with no milestone as part of your backlog pruning? https://github.com/Particular/ServiceControl/issues?q=is%3Aissue+no%3Amilestone+is%3Aclosed

@johnsimons
Copy link
Member

Not sure what you are talking about @SimonCropp.
If we close this issue and later this topic resurfaces, I am pretty sure we can find it and reopen.

@andreasohlund
Copy link
Member

I have a demo going that shows how to do this without a header and instead using a custom check to "alert" in SP when SLR rates go up. How about I get that recorded and we close this one as won't fix?

@andreasohlund
Copy link
Member

Still agree with @SimonCropp that we shouln't just close issues that we still might act on. Perhaps move it some where or add a bullet point to some more generic "crazy SC ideas" issue.

How about we discuss this in plat dev since this is relevant for all our repos?

@mauroservienti
Copy link
Member

Still think agree with @SimonCropp that we shouln't just close issues that we still might act on.

Agree.

@johnsimons
Copy link
Member

we shouln't just close issues that we still might act on

In my view, closing a feature request does not imply that we will not be acting on, it just means that right now the focus is elsewhere. It just help us concentrate on issues/features that are currently important. Having this massive backlog of open issues IMO makes it difficult to figure out where we going. The way I see it, is open feature requests that we do not intend to work on (right now), are just clutering the system, and also sending the wrong message to user out there, the fact is that we have no idea when we are going to tackle any of these or even if we ever going to do it at all.

I have a demo going that shows how to do this without a header and instead using a custom check to "alert" in SP when SLR rates go up

@andreasohlund that would be great, looking forward to it.

@gbiellem
Copy link
Contributor

@johnsimons

As @andreasohlund said let's have the discussion about do we or don't we close issues in platform dev.

@mauroservienti
Copy link
Member

yeah, let's discuss it. I don't think we should close issues, a closed issue is a lost one, IMO. If a huge backlog is putting pressure on us it's a tolling problem (and to be honest GitHUb issues suck) since we have no way (other than the waffle boards, that are not that much better) to prioritize and look only at the top N issues. Also given that we have no milestones set in stone priorities may change overtime and is very easy to lose track of closed issues that were closed because of lack of capacity or out of scope at that time.

@johnsimons
Copy link
Member

@mauroservienti it is quite simple to keep track of feature requests that were closed.
This query gives u all issues that are closed and have no associated milestone and are not bugs
https://github.com/Particular/ServiceControl/issues?utf8=%E2%9C%93&q=is%3Aissue+is%3Aclosed+no%3Amilestone+-label%3A%22Type%3A+Bug%22+

@johnsimons
Copy link
Member

This is currently not prioritised to be done any time soon, I will close this one for now and if we ever go down this path we can reopen it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants