Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[service-bus] Reliability improvements and testing updates #15098

Merged
merged 21 commits into from
May 8, 2021

Conversation

richardpark-msft
Copy link
Member

@richardpark-msft richardpark-msft commented Apr 30, 2021

This PR has a few changes in it, primarily to improve our robustness and our reporting:

General reliability improvements:

  • Migrates to a workflow that treats subscription start as a retryable entity, rather than just link creation (which is what had previously).
  • It checks and throws exceptions on much more granular conditions, particularly in addCredit
  • Error checking and handling has been migrated to be in far fewer spots and to be more unconditional, which should hopefully eliminate any areas where an exception or error could occur but it never gets forwarded or seen.

SDK debugging:

  • Adds a new SDK only flag (forwardInternalErrors) which will make it so areas that used to eat errors now can forward them to processError. Prior to this the errors were only logged, but that meant they could be missed. Most of these would probably be considered cosmetic by customers so this is primarly for debugging purposes within the SDK itself.
  • The internal processInitialize handler has been split into two (still internal) handlers - preInitialize and postInitialize. preInitialize runs before init(), and postInitialize runs after init() but before addCredit. This lets us write more reliable tests. These are not exposed to customers.

Fixes #14535

Richard Park added 2 commits April 29, 2021 11:56
…my with internal functions. For the most part though the tests got simpler since there were fewer functions that needed mocking (subscribe now does initialization and subscription), etc..

There were also some interesting breakages when I changed assertThrow to chain on the promises returned (which made my test callstacks a bit nicer) in that one of the methods that's being tested (settleMessages) was not marked as async, causing it to throw exceptions that were not "catchable" by chaining to the returned promise.

I've also enabled code coverage, even for unit tests. They go under the coverage/unit-tests folder, just to keep it all clean.
@ghost ghost added the Service Bus label Apr 30, 2021
@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Richard Park added 2 commits April 30, 2021 19:09
- Add in another 'debugging' event (preInitialize) that gets called before our internal _init()
- Doing some rearrangement and testing
…g and debugging just work, etc..

- Fixing some other issues:
  - Revert sesions back over to the way it was before, with a resume call thrown in for now.

NOTE: messagesession could use some work as it's falling behind some of the changes we're making to the non-session workflow.
@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft richardpark-msft marked this pull request as ready for review May 3, 2021 20:38
@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft richardpark-msft changed the title [draft] service bus receiver streaming refactor [service-bus] Reliability improvements and testing updates May 3, 2021
sdk/servicebus/service-bus/package.json Outdated Show resolved Hide resolved
*/
async drain(): Promise<void> {
const { receiver, logPrefix } = this._getCurrentReceiver();

if (!this._isValidReceiver(receiver)) {
// TODO: should we throw?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess draining is technically done?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this comment was me thinking "aloud".

I think we could consider it a candidate to throw on but it seemed less important. The spot where it's used would basically do nothing useful if the receiver was invalid anyways but it'd be nice to go through here again. I think we have some future work to add in a "drain with timeout" for another bug in our tests, so perhaps we could lump in the investigation there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider this a no-op and not an error scenario. The need for drain comes in order to avoid receiving messages when we are not ready for them. Absence of the receiver achieves that goal. So, I would combine the two checks here

if (!receiver || !received.isOpen() || receiver.credit === 0) {
   return;
   }

*
* @returns A promise that will resolve when a link is created and we successfully add credits to it.
*/
private async _subscribeImpl(caller: "detach" | "subscribe"): Promise<void> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller parameter feels a little odd to me. Are you just trying to get away from using useNewName? Maybe something that's a bit more descriptive of what's happening where the values are something like "initialize" and "reconnect" would be a bit clearer without coupling us to the method names calling it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change the parameter name but those are the names of the two logical flows I think.

However, I have thought about renaming 'detach' to 'reconnect' a few times. Maybe this is just a happy chance to do so?

Co-authored-by: Ramya Rao <ramya.rao.a@outlook.com>
- Send a nice standard error message that indicates the session lock has been lost when we throw for addCredit() in session receivers.
@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

*
* @internal
*/
export async function retryForever<T>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit and not a blocker for this PR

retryForever is only used by the streaming receiver in non session case. So far other methods in this file "receiverCommon" are for cases that are indeed common between sessions and non sessions. So, perhaps not the right place for the code. Moving it can also re-enforce the understanding of one of the main differences in sessions and non sessions receiver which is the forever loop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there are a few little naming changes that I didn't do but @HarshaNalluru has in one of his PRs (specifically receiverHelper is really only streamingReceiver helper, etc..). We should do a little cleanup of our file organization as it's a bit of a mess at the moment.

…es more on public APIs.

It did point out a simple use-case where we shouldn't bug the user with an error - if they suspend the receiver (and thus addCredit throws AbortErrors) we can ignore those since the user has initiated the action that caused it and there'd be nothing useful about reporting that specific error.

Also, I made the session "failed to add credit" message a bit more like the one we throw in non-session receivers.
@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft
Copy link
Member Author

richardpark-msft commented May 8, 2021

@ramya-rao-a - made a slight change to MessageSession just now. I don't report the error to the user if it happens to be AbortError. That only happens if the error is being caused by a user suspending the receiver, so something like this:

const subscription = sessionReceiver.subscribe({
   processMessage: async () => {
      await subscription.close();
   }
   // and addCredit happens after processMessage and throws an AbortError
});

I think this sticks with the spirit of our original discussion - let's not report errors to the user if they shouldn't do anything or it's not useful (which this qualifies). All other errors are still reported.

For non-sessions we only report the error to our internal debugging-only processError so I've left that in place.

(BTW: found this because we had a test that was doing this, but in a really round-about way. I fixed that test as well so it looks like what I have above, which is legitimate)

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@check-enforcer
Copy link

check-enforcer bot commented May 8, 2021

This pull request is protected by Check Enforcer.

What is Check Enforcer?

Check Enforcer helps ensure all pull requests are covered by at least one check-run (typically an Azure Pipeline). When all check-runs associated with this pull request pass then Check Enforcer itself will pass.

Why am I getting this message?

You are getting this message because Check Enforcer did not detect any check-runs being associated with this pull request within five minutes. This may indicate that your pull request is not covered by any pipelines and so Check Enforcer is correctly blocking the pull request being merged.

What should I do now?

If the check-enforcer check-run is not passing and all other check-runs associated with this PR are passing (excluding license-cla) then you could try telling Check Enforcer to evaluate your pull request again. You can do this by adding a comment to this pull request as follows:
/check-enforcer evaluate
Typically evaulation only takes a few seconds. If you know that your pull request is not covered by a pipeline and this is expected you can override Check Enforcer using the following command:
/check-enforcer override
Note that using the override command triggers alerts so that follow-up investigations can occur (PRs still need to be approved as normal).

What if I am onboarding a new service?

Often, new services do not have validation pipelines associated with them, in order to bootstrap pipelines for a new service, you can issue the following command as a pull request comment:
/azp run prepare-pipelines
This will run a pipeline that analyzes the source tree and creates the pipelines necessary to build and validate your pull request. Once the pipeline has been created you can trigger the pipeline using the following comment:
/azp run js - [service] - ci

@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

- Fixing a test that used to depend on only a single error happening on detach to handle the possibility of more than one. This might point to moving towards more like what we did in non-session-receiver.
@richardpark-msft
Copy link
Member Author

/azp run js - service-bus - tests

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@richardpark-msft richardpark-msft merged commit e2b5a62 into Azure:master May 8, 2021
openapi-sdkautomation bot pushed a commit to AzureSDKAutomation/azure-sdk-for-js that referenced this pull request Jul 10, 2021
Updating vm sizes property (Azure#15098)

* updating vm sizes property

* prettier fix

* prettier fix

Co-authored-by: Aaheli Chattopadhyay <aahelic@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The topic receiver silently failed. isClosed does not reflect
4 participants