[service-bus] Reliability improvements and testing updates #15098

richardpark-msft · 2021-04-30T01:05:20Z

This PR has a few changes in it, primarily to improve our robustness and our reporting:

General reliability improvements:

Migrates to a workflow that treats subscription start as a retryable entity, rather than just link creation (which is what had previously).
It checks and throws exceptions on much more granular conditions, particularly in addCredit
Error checking and handling has been migrated to be in far fewer spots and to be more unconditional, which should hopefully eliminate any areas where an exception or error could occur but it never gets forwarded or seen.

SDK debugging:

Adds a new SDK only flag (forwardInternalErrors) which will make it so areas that used to eat errors now can forward them to processError. Prior to this the errors were only logged, but that meant they could be missed. Most of these would probably be considered cosmetic by customers so this is primarly for debugging purposes within the SDK itself.
The internal processInitialize handler has been split into two (still internal) handlers - preInitialize and postInitialize. preInitialize runs before init(), and postInitialize runs after init() but before addCredit. This lets us write more reliable tests. These are not exposed to customers.

…er (and associated code) are correct.

…my with internal functions. For the most part though the tests got simpler since there were fewer functions that needed mocking (subscribe now does initialization and subscription), etc.. There were also some interesting breakages when I changed assertThrow to chain on the promises returned (which made my test callstacks a bit nicer) in that one of the methods that's being tested (settleMessages) was not marked as async, causing it to throw exceptions that were not "catchable" by chaining to the returned promise. I've also enabled code coverage, even for unit tests. They go under the coverage/unit-tests folder, just to keep it all clean.

richardpark-msft · 2021-04-30T01:06:27Z

/azp run js - service-bus - tests

azure-pipelines · 2021-04-30T01:06:39Z

Azure Pipelines successfully started running 1 pipeline(s).

…rom a more relevant place) in the caller.

richardpark-msft · 2021-04-30T01:22:22Z

/azp run js - service-bus - tests

azure-pipelines · 2021-04-30T01:22:33Z

Azure Pipelines successfully started running 1 pipeline(s).

richardpark-msft · 2021-04-30T02:34:49Z

/azp run js - service-bus - tests

azure-pipelines · 2021-04-30T02:35:01Z

Azure Pipelines successfully started running 1 pipeline(s).

- Add in another 'debugging' event (preInitialize) that gets called before our internal _init() - Doing some rearrangement and testing

…g and debugging just work, etc.. - Fixing some other issues: - Revert sesions back over to the way it was before, with a resume call thrown in for now. NOTE: messagesession could use some work as it's falling behind some of the changes we're making to the non-session workflow.

richardpark-msft · 2021-05-01T03:54:34Z

/azp run js - service-bus - tests

azure-pipelines · 2021-05-01T03:54:46Z

Azure Pipelines successfully started running 1 pipeline(s).

richardpark-msft · 2021-05-01T04:27:50Z

/azp run js - service-bus - tests

azure-pipelines · 2021-05-01T04:28:02Z

Azure Pipelines successfully started running 1 pipeline(s).

richardpark-msft · 2021-05-01T07:23:15Z

/azp run js - service-bus - tests

azure-pipelines · 2021-05-01T07:23:26Z

Azure Pipelines successfully started running 1 pipeline(s).

richardpark-msft · 2021-05-03T20:38:19Z

/azp run js - service-bus - tests

azure-pipelines · 2021-05-03T20:38:33Z

Azure Pipelines successfully started running 1 pipeline(s).

sdk/servicebus/service-bus/package.json

chradek · 2021-04-30T22:29:59Z

sdk/servicebus/service-bus/src/core/receiverHelper.ts

   */
  async drain(): Promise<void> {
    const { receiver, logPrefix } = this._getCurrentReceiver();

    if (!this._isValidReceiver(receiver)) {
+      // TODO: should we throw?


I guess draining is technically done?

Yeah, this comment was me thinking "aloud".

I think we could consider it a candidate to throw on but it seemed less important. The spot where it's used would basically do nothing useful if the receiver was invalid anyways but it'd be nice to go through here again. I think we have some future work to add in a "drain with timeout" for another bug in our tests, so perhaps we could lump in the investigation there.

I would consider this a no-op and not an error scenario. The need for drain comes in order to avoid receiving messages when we are not ready for them. Absence of the receiver achieves that goal. So, I would combine the two checks here

if (!receiver || !received.isOpen() || receiver.credit === 0) { return; }

sdk/servicebus/service-bus/src/core/streamingReceiver.ts

sdk/servicebus/service-bus/src/core/receiverHelper.ts

sdk/servicebus/service-bus/src/core/streamingReceiver.ts

chradek · 2021-05-03T23:14:58Z

sdk/servicebus/service-bus/src/core/streamingReceiver.ts

+   *
+   * @returns A promise that will resolve when a link is created and we successfully add credits to it.
+   */
+  private async _subscribeImpl(caller: "detach" | "subscribe"): Promise<void> {


The caller parameter feels a little odd to me. Are you just trying to get away from using useNewName? Maybe something that's a bit more descriptive of what's happening where the values are something like "initialize" and "reconnect" would be a bit clearer without coupling us to the method names calling it?

I can change the parameter name but those are the names of the two logical flows I think.

However, I have thought about renaming 'detach' to 'reconnect' a few times. Maybe this is just a happy chance to do so?

sdk/servicebus/service-bus/src/core/receiverHelper.ts

…scribe() as a SessionLockLost error. This gives the user a hint that the link is definitely dead.

… in our version check.

sdk/servicebus/service-bus/src/core/receiverHelper.ts

Co-authored-by: Ramya Rao <ramya.rao.a@outlook.com>

sdk/servicebus/service-bus/src/session/messageSession.ts

- Send a nice standard error message that indicates the session lock has been lost when we throw for addCredit() in session receivers.

richardpark-msft · 2021-05-07T21:34:27Z

/azp run js - service-bus - tests

azure-pipelines · 2021-05-07T21:34:39Z

Azure Pipelines successfully started running 1 pipeline(s).

ramya-rao-a · 2021-05-07T22:09:56Z

sdk/servicebus/service-bus/src/receivers/receiverCommon.ts

+ *
+ * @internal
+ */
+export async function retryForever<T>(


Nit and not a blocker for this PR

retryForever is only used by the streaming receiver in non session case. So far other methods in this file "receiverCommon" are for cases that are indeed common between sessions and non sessions. So, perhaps not the right place for the code. Moving it can also re-enforce the understanding of one of the main differences in sessions and non sessions receiver which is the forever loop

Yeah, there are a few little naming changes that I didn't do but @HarshaNalluru has in one of his PRs (specifically receiverHelper is really only streamingReceiver helper, etc..). We should do a little cleanup of our file organization as it's a bit of a mess at the moment.

…es more on public APIs. It did point out a simple use-case where we shouldn't bug the user with an error - if they suspend the receiver (and thus addCredit throws AbortErrors) we can ignore those since the user has initiated the action that caused it and there'd be nothing useful about reporting that specific error. Also, I made the session "failed to add credit" message a bit more like the one we throw in non-session receivers.

richardpark-msft · 2021-05-08T00:07:19Z

/azp run js - service-bus - tests

azure-pipelines · 2021-05-08T00:07:30Z

Azure Pipelines successfully started running 1 pipeline(s).

richardpark-msft · 2021-05-08T00:11:09Z

@ramya-rao-a - made a slight change to MessageSession just now. I don't report the error to the user if it happens to be AbortError. That only happens if the error is being caused by a user suspending the receiver, so something like this:

const subscription = sessionReceiver.subscribe({
   processMessage: async () => {
      await subscription.close();
   }
   // and addCredit happens after processMessage and throws an AbortError
});

I think this sticks with the spirit of our original discussion - let's not report errors to the user if they shouldn't do anything or it's not useful (which this qualifies). All other errors are still reported.

For non-sessions we only report the error to our internal debugging-only processError so I've left that in place.

(BTW: found this because we had a test that was doing this, but in a really round-about way. I fixed that test as well so it looks like what I have above, which is legitimate)

richardpark-msft · 2021-05-08T00:18:02Z

/azp run js - service-bus - tests

azure-pipelines · 2021-05-08T00:18:13Z

Azure Pipelines successfully started running 1 pipeline(s).

check-enforcer · 2021-05-08T00:28:30Z

This pull request is protected by Check Enforcer.

What is Check Enforcer?

Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass.

Why am I getting this message?

You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged.

What should I do now?

If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows:
/check-enforcer evaluate
Typically evaulation only takes a few seconds. If you know that your pull request is not covered by a pipeline and this is expected you can override Check Enforcer using the following command:
/check-enforcer override
Note that using the override command triggers alerts so that follow-up investigations can occur (PRs still need to be approved as normal).

What if I am onboarding a new service?

Often, new services do not have validation pipelines associated with them, in order to bootstrap pipelines for a new service, you can issue the following command as a pull request comment:
/azp run prepare-pipelines
This will run a pipeline that analyzes the source tree and creates the pipelines necessary to build and validate your pull request. Once the pipeline has been created you can trigger the pipeline using the following comment:
/azp run js - [service] - ci

richardpark-msft · 2021-05-08T01:21:01Z

/azp run js - service-bus - tests

azure-pipelines · 2021-05-08T01:21:13Z

Azure Pipelines successfully started running 1 pipeline(s).

- Fixing a test that used to depend on only a single error happening on detach to handle the possibility of more than one. This might point to moving towards more like what we did in non-session-receiver.

richardpark-msft · 2021-05-08T02:28:58Z

/azp run js - service-bus - tests

azure-pipelines · 2021-05-08T02:29:11Z

Azure Pipelines successfully started running 1 pipeline(s).

Updating vm sizes property (Azure#15098) * updating vm sizes property * prettier fix * prettier fix Co-authored-by: Aaheli Chattopadhyay <aahelic@microsoft.com>

Richard Park added 2 commits April 29, 2021 11:56

WIP: tests aren't working but I think the changes to streaming receiv…

eef34bf

…er (and associated code) are correct.

ghost added the Service Bus label Apr 30, 2021

richardpark-msft requested a review from chradek April 30, 2021 01:05

Richard Park added 2 commits April 29, 2021 18:14

Make it so we don't accidentally allow null message handlers.

bb8cebb

Remove this check for now. There's a better one (since it can throw f…

bc9ebdd

…rom a more relevant place) in the caller.

Remove unused improt

f2612a2

Richard Park added 2 commits April 30, 2021 19:09

Some bigger changes (some tests are still failing, however):

23dcc40

- Add in another 'debugging' event (preInitialize) that gets called before our internal _init() - Doing some rearrangement and testing

Fixing a non-awaited promise

95c6912

formatting

393fcc6

richardpark-msft marked this pull request as ready for review May 3, 2021 20:38

richardpark-msft requested review from HarshaNalluru and ramya-rao-a as code owners May 3, 2021 20:38

richardpark-msft changed the title ~~[draft] service bus receiver streaming refactor~~ [service-bus] Reliability improvements and testing updates May 3, 2021

chradek reviewed May 3, 2021

View reviewed changes

chradek reviewed May 5, 2021

View reviewed changes

sdk/servicebus/service-bus/src/core/receiverHelper.ts Show resolved Hide resolved

Richard Park added 6 commits May 5, 2021 16:22

Remove nyc usage for unit tests

c61ef51

We'll release 7.0.6 from master now.

c592e36

Fixing missing versioning update

1bdad81

Be consistent about reporting an error thrown from MessageSession.sub…

01a175a

…scribe() as a SessionLockLost error. This gives the user a hint that the link is definitely dead.

Remove unused method

158f6a3

Comment on the reason we support two different paths to the same file…

4878a86

… in our version check.

ramya-rao-a reviewed May 7, 2021

View reviewed changes

sdk/servicebus/service-bus/src/core/receiverHelper.ts Outdated Show resolved Hide resolved

Leave no code suggestion behind!

c9a80fc

Co-authored-by: Ramya Rao <ramya.rao.a@outlook.com>

ramya-rao-a reviewed May 7, 2021

View reviewed changes

sdk/servicebus/service-bus/src/session/messageSession.ts Outdated Show resolved Hide resolved

- Changed the error message so I needed to fix the tests

6b5bea5

- Send a nice standard error message that indicates the session lock has been lost when we throw for addCredit() in session receivers.

ramya-rao-a approved these changes May 7, 2021

View reviewed changes

Changelog entry for fix.

4961578

- Make ServiceBusError thrown from Session receiver be non-retryable

d5699d2

- Fixing a test that used to depend on only a single error happening on detach to handle the possibility of more than one. This might point to moving towards more like what we did in non-session-receiver.

richardpark-msft merged commit e2b5a62 into Azure:master May 8, 2021

jeremymeng mentioned this pull request Oct 26, 2021

[Service Bus] Receivers not closed correctly when client is closed #17932

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[service-bus] Reliability improvements and testing updates #15098

[service-bus] Reliability improvements and testing updates #15098

richardpark-msft commented Apr 30, 2021 •

edited

Loading

richardpark-msft commented Apr 30, 2021

azure-pipelines bot commented Apr 30, 2021

richardpark-msft commented Apr 30, 2021

azure-pipelines bot commented Apr 30, 2021

richardpark-msft commented Apr 30, 2021

azure-pipelines bot commented Apr 30, 2021

richardpark-msft commented May 1, 2021

azure-pipelines bot commented May 1, 2021

richardpark-msft commented May 1, 2021

azure-pipelines bot commented May 1, 2021

richardpark-msft commented May 1, 2021

azure-pipelines bot commented May 1, 2021

richardpark-msft commented May 3, 2021

azure-pipelines bot commented May 3, 2021

chradek Apr 30, 2021

richardpark-msft May 5, 2021

ramya-rao-a May 7, 2021

chradek May 3, 2021

richardpark-msft May 4, 2021

richardpark-msft commented May 7, 2021

azure-pipelines bot commented May 7, 2021

ramya-rao-a May 7, 2021

richardpark-msft May 7, 2021

richardpark-msft commented May 8, 2021

azure-pipelines bot commented May 8, 2021

richardpark-msft commented May 8, 2021 •

edited

Loading

richardpark-msft commented May 8, 2021

azure-pipelines bot commented May 8, 2021

check-enforcer bot commented May 8, 2021

richardpark-msft commented May 8, 2021

azure-pipelines bot commented May 8, 2021

richardpark-msft commented May 8, 2021

azure-pipelines bot commented May 8, 2021

[service-bus] Reliability improvements and testing updates #15098

[service-bus] Reliability improvements and testing updates #15098

Conversation

richardpark-msft commented Apr 30, 2021 • edited Loading

richardpark-msft commented Apr 30, 2021

azure-pipelines bot commented Apr 30, 2021

richardpark-msft commented Apr 30, 2021

azure-pipelines bot commented Apr 30, 2021

richardpark-msft commented Apr 30, 2021

azure-pipelines bot commented Apr 30, 2021

richardpark-msft commented May 1, 2021

azure-pipelines bot commented May 1, 2021

richardpark-msft commented May 1, 2021

azure-pipelines bot commented May 1, 2021

richardpark-msft commented May 1, 2021

azure-pipelines bot commented May 1, 2021

richardpark-msft commented May 3, 2021

azure-pipelines bot commented May 3, 2021

chradek Apr 30, 2021

Choose a reason for hiding this comment

richardpark-msft May 5, 2021

Choose a reason for hiding this comment

ramya-rao-a May 7, 2021

Choose a reason for hiding this comment

chradek May 3, 2021

Choose a reason for hiding this comment

richardpark-msft May 4, 2021

Choose a reason for hiding this comment

richardpark-msft commented May 7, 2021

azure-pipelines bot commented May 7, 2021

ramya-rao-a May 7, 2021

Choose a reason for hiding this comment

richardpark-msft May 7, 2021

Choose a reason for hiding this comment

richardpark-msft commented May 8, 2021

azure-pipelines bot commented May 8, 2021

richardpark-msft commented May 8, 2021 • edited Loading

richardpark-msft commented May 8, 2021

azure-pipelines bot commented May 8, 2021

check-enforcer bot commented May 8, 2021

What is Check Enforcer?

Why am I getting this message?

What should I do now?

What if I am onboarding a new service?

richardpark-msft commented May 8, 2021

azure-pipelines bot commented May 8, 2021

richardpark-msft commented May 8, 2021

azure-pipelines bot commented May 8, 2021

richardpark-msft commented Apr 30, 2021 •

edited

Loading

richardpark-msft commented May 8, 2021 •

edited

Loading