Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[3.x] Pre-import cdk libraries at the beginning of cluster create and update #4778

Merged
merged 1 commit into from
Jan 5, 2023

Conversation

awslitvit
Copy link
Contributor

Description of changes

Import of cdk libraries at lambda cold start takes 12 to 16 seconds, sometimes causing lambda timeout.
Importing them in a separate thread, started at the beginning of create and update operations and executed concurrently with stack validation allows to reduce the time to complete the operation.
The amount of the execution time gain depends on how long the validation takes.
Longer is the validation more impactful this optimization gets.

Tests

Manual test via API

  • create-cluster dryrun
    • no import performed
  • create-cluster no dryrun
    • import starts 1 msec after the lambda execution starts
    • validation takes ~ 5.5 sec.
    • total time to import cdk: ~ 12 sec., waiting time post validation before starting cdk generation: ~ 6.5 sec.
  • update-cluster dryrun
    • no import performed
  • update-cluster no dryrun
    • import starts 1 msec after the lambda execution starts
    • validation takes ~ 6 sec.
    • total time to import cdk: ~ 12 sec., waiting time post validation before starting cdk generation: ~ 6 sec.
      Manual test via CLI
  • create-cluster in dryrun mode and real cluster creation worked as expected
  • update-cluster considered redundant

References

Checklist

  • Make sure you are pointing to the right branch and add a label in the PR title (i.e. 2.x vs 3.x)
  • Check all commits' messages are clear, describing what and why vs how.
  • Make sure to have added unit tests or integration tests to cover the new/modified code.
  • Check if documentation is impacted by this change.

Please review the guidelines for contributing and Pull Request Instructions.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@awslitvit awslitvit added the skip-changelog-update Disables the check that enforces changelog updates in PRs label Jan 4, 2023
@codecov
Copy link

codecov bot commented Jan 4, 2023

Codecov Report

Merging #4778 (b7cef99) into develop (550806e) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop    #4778      +/-   ##
===========================================
+ Coverage    89.17%   89.19%   +0.01%     
===========================================
  Files          163      164       +1     
  Lines        14369    14386      +17     
===========================================
+ Hits         12814    12831      +17     
  Misses        1555     1555              
Impacted Files Coverage Δ
cli/src/pcluster/models/cluster.py 78.69% <100.00%> (+0.10%) ⬆️
cli/src/pcluster/templates/import_cdk.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Copy link
Contributor

@demartinofra demartinofra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 questions:

  1. what happens if the main thread reaches the cdk import before the warm up thread has completed?can it break due to some race condition?
  2. what happens if I exit the cli with ctrl+c while the background thread is running? will it hang?what is present in logs?

cli/src/pcluster/api/controllers/import_cdk.py Outdated Show resolved Hide resolved
@awslitvit
Copy link
Contributor Author

awslitvit commented Jan 5, 2023

what happens if the main thread reaches the cdk import before the warm up thread has completed?can it break due to some race condition?

That's exactly the case I have tested. It doesn't break, both threads continue in parallel and the import completes simultaneously in both. I ran this test several times and it never broke.

what happens if I exit the cli with ctrl+c while the background thread is running? will it hang?what is present in logs?

cli exits without hanging, logging:

2023-01-05 10:29:41,985 - INFO - import_cdk.py:10:import_cdk() - Start importing
2023-01-05 10:29:43,273 - INFO - entrypoint.py:244:main() - Received KeyboardInterrupt. Exiting.

Import of cdk libraries at lambda cold start takes 12 to 16 seconds, sometimes causing lambda timeout.
Importing them in a separate thread, started at the beginning of create and update operations
and executed concurrently with stack validation allows to reduce the time to complete the operation.
@awslitvit
Copy link
Contributor Author

Multiple concurrent cdk imports tested:

import time
from pcluster.templates.import_cdk import start

print("Starting")
threads = []
for i in range(1, 21):
    threads.append(start())
    time.sleep(0.2)

print("Waiting")

for t in threads:
    t.join()

print("Completed")

Output:

Starting
Start importing (x20)
Waiting
Import complete in 5.7333149909973145 seconds
Import complete in 5.53127908706665 seconds
Import complete in 5.330389976501465 seconds
Import complete in 5.128042936325073 seconds
Import complete in 4.927767992019653 seconds
Import complete in 4.726654052734375 seconds
Import complete in 4.526315689086914 seconds
Import complete in 4.3224451541900635 seconds
Import complete in 4.120244026184082 seconds
Import complete in 3.9184842109680176 seconds
Import complete in 3.7149658203125 seconds
Import complete in 3.5131521224975586 seconds
Import complete in 3.3125240802764893 seconds
Import complete in 3.1082780361175537 seconds
Import complete in 2.9072659015655518 seconds
Import complete in 2.7062909603118896 seconds
Import complete in 2.5047547817230225 seconds
Import complete in 2.301931619644165 seconds
Import complete in 2.101245880126953 seconds
Import complete in 1.9005579948425293 seconds
Completed

Same result with 1000 iterations and delay of 0.01 sec.

@awslitvit
Copy link
Contributor Author

Following code exited with ctrl+c without hanging:

from pcluster.templates.import_cdk import start

print("Starting")
thread = start()
print("Waiting")
thread.join()  # ctrl+c here
print("Completed")

@awslitvit awslitvit marked this pull request as ready for review January 5, 2023 10:33
@awslitvit awslitvit requested review from a team as code owners January 5, 2023 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
skip-changelog-update Disables the check that enforces changelog updates in PRs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants