-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(cli): stack monitor reads complete stack history every 5 seconds #9795
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The stack monitor used to read all stack events on every tick (every 1-5 seconds). That's right, every 5 seconds it would page through all events that had ever happened to the stack, only to get to the last page to see if there were new events. We were quadratic in our `DescribeStackEvents` API call. This behavior has probably contributed to our users getting throttled by CloudFormation in the past, and only really shows pathological behavior on stacks with a long history (which we ourselves don't routinely manage using `cdk deploy`), which is why it had eluded detection for so long. The fix is to remember the token of the last page between calls, so that we don't have to read through all pages just to get to the last one. Fixes #9470.
Just going to leave this here... |
RomainMuller
approved these changes
Aug 18, 2020
Thank you for contributing! Your pull request will be updated from master and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
rix0rrr
added a commit
that referenced
this pull request
Aug 19, 2020
In #9795, I was under the impression that the `DescribeStackEvents` would list events in chronological order. This was a misunderstanding resulting from not reading the documentation-- in fact, the events are returned in reverse chronological order. The paging logic was therefore wrong and the events would be fed into the activity printer in the wrong order, leading to incorrect progress reporting. This is a critical bug and must be fixed immediately! While I was in there, fixed two other paper cuts around the printer at the same time: - Events occurring near the start of the monitor would be missed. This was because time-based filtering would be employed based on the client machine's local clock, which might be out of sync with the remote clock. Use the server's clock instead, by using the `CreationTime` of the ChangeSet as the start time. - Events occurring near the end of the monitor would be missed. This was because `DescribeStackStatus` might return a completion state before the monitor had polled the events that led up to the completion. Fix by doing a final poll after the stop call has been received, so we are guaranteed to have seen all events.
misterjoshua
pushed a commit
to misterjoshua/aws-cdk
that referenced
this pull request
Aug 19, 2020
…ws#9795) The stack monitor used to read all stack events on every tick (every 1-5 seconds). That's right, every 5 seconds it would page through all events that had ever happened to the stack, only to get to the last page to see if there were new events. We were quadratic in our `DescribeStackEvents` API call. This behavior has probably contributed to our users getting throttled by CloudFormation in the past, and only really shows pathological behavior on stacks with a long history (which we ourselves don't routinely manage using `cdk deploy`), which is why it had eluded detection for so long. The fix is to remember the token of the last page between calls, so that we don't have to read through all pages just to get to the last one. Fixes aws#9470. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
mergify bot
pushed a commit
that referenced
this pull request
Aug 19, 2020
In #9795, I was under the impression that the `DescribeStackEvents` would list events in chronological order. This was a misunderstanding resulting from not reading the documentation-- in fact, the events are returned in reverse chronological order. The paging logic was therefore wrong and the events would be fed into the activity printer in the wrong order, leading to incorrect progress reporting. This is a critical bug and must be fixed immediately! While I was in there, fixed two other paper cuts around the printer at the same time: - Events occurring near the start of the monitor would be missed. This was because time-based filtering would be employed based on the client machine's local clock, which might be out of sync with the remote clock. Use the server's clock instead, by using the `CreationTime` of the ChangeSet as the start time. - Events occurring near the end of the monitor would be missed. This was because `DescribeStackStatus` might return a completion state before the monitor had polled the events that led up to the completion. Fix by doing a final poll after the stop call has been received, so we are guaranteed to have seen all events. The actual claim in that PR was incorrect. We were *not* quadratic (we wouldn't list all pages on every tick), we were just more linear than necessary (we *would* list all pages on the *first* tick... which is still a lot). The current PR properly fixes the issue by stopping paging not just if we've already seen the events, but also if the events are starting to fall outside the time window we are interested in. This *actually* fixes #9470. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The stack monitor used to read all stack events on every tick (every 1-5
seconds).
That's right, every 5 seconds it would page through all events that had
ever happened to the stack, only to get to the last page to see if there
were new events. We were quadratic in our
DescribeStackEvents
APIcall.
This behavior has probably contributed to our users getting throttled by
CloudFormation in the past, and only really shows pathological
behavior on stacks with a long history (which we ourselves don't
routinely manage using
cdk deploy
), which is why it had eludeddetection for so long.
The fix is to remember the token of the last page between calls,
so that we don't have to read through all pages just to get to the last
one.
Fixes #9470.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license