Skip to content

Fix deploy with long poll delay events order and stop condition#3919

Merged
jfuss merged 5 commits intoaws:developfrom
BourgoisMickael:fix/deploy_stack_events
Jun 3, 2022
Merged

Fix deploy with long poll delay events order and stop condition#3919
jfuss merged 5 commits intoaws:developfrom
BourgoisMickael:fix/deploy_stack_events

Conversation

@BourgoisMickael
Copy link
Contributor

@BourgoisMickael BourgoisMickael commented May 30, 2022

Which issue(s) does this change fix?

Fixes #3910
Fixes #3911

Why is this change necessary?

When using the feature from #3500 and deploy with a large SAM_CLI_POLL_DELAY events for each poll iterations are in reverse order. And if another process update the stack while sleeping the loop won't stop and catch events from another process.

How does it address the issue?

  • Buffer events in a deque (push front) to print in chronological order
  • Remove DescribeStacks request and rely on first not_in_progress event on the root stack to exit the loop
  • In tests reverse mock events order to match DescribeStackEvents api reverse chronological order
  • Test print order

What side effects does this change have?

  • No more requests to DescribeStack
  • Print refresh delay (client_sleep) in table header

Mandatory Checklist

PRs will only be reviewed after checklist is complete

  • Add input/output type hints to new functions/methods
  • Write/update unit tests
  • make pr passes

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Replace describe_stack as the event has the stack status
Fix events order
Stop loop at first not in progress
@github-actions github-actions bot added pr/external stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at. labels May 30, 2022
@mingkun2020 mingkun2020 removed the stage/needs-triage Automatically applied to new issues and PRs, indicating they haven't been looked at. label May 30, 2022
@jfuss
Copy link
Contributor

jfuss commented May 31, 2022

@BourgoisMickael Thanks for the quick response and pushing more fixes!

Remove DescribeStacks request and rely on first not_in_progress event on the root stack to exit the loop

This seems reasonable. I am trying to figure out if there are any edge cases we introduce here. I assume we initially put this in place for "a reason", though not sure what that would be. So I just want to do some digging on this to see if the commit has any indications.

On the ordering:
I am a little confused. From my understanding it seems like the original ordering is consistent before and after your change, unless I am missing something. The reason we did it from top down, was so that we can stream the printing to console and you can read the changes as they come in. I think this change breaks that (from reading the code). Why do you expect the opposite here? What I am really trying to understand is what happened for the ordering to change, and if that was on us or something else changed.

@jfuss jfuss self-requested a review May 31, 2022 15:50
@jfuss jfuss self-assigned this May 31, 2022
@BourgoisMickael
Copy link
Contributor Author

Because DesciribeStackEvents reverse chronological order. So when you poll every 0.5s there will most likely by only one new event for each request and it's printed immediately.

When if a set a delay of 60s, during this time a lot of events will happen, so the DescribeStackEvents will return multiple new events with this latests on top, that's wy it needs reordering to print the olddest event first and the latest last

@jfuss
Copy link
Contributor

jfuss commented Jun 2, 2022

@BourgoisMickael I see. So it all has to do with the polling time. Sorry I did not connect those dots initially.

I hate diffs, makes changes harder to read sometimes. I see what is happening now. There is a while "stack is in progress" do thing going on. Which allows for the streaming. Or at least keeps the same thing we had before in how we "streamed".

Copy link
Contributor

@jfuss jfuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall changes look good to me. We appreciate the testing you did on nightly to catch this and provide a patch to address it!

I will ping some team members tomorrow when everyone is in the office to get a second reviewer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: remove unnecessary describe-stack calls during deploy Bug: deploy with large SAM_CLI_POLL_DELAY has a wrong display (nightly)

4 participants