Skip to content

Conversation

@sanderegg
Copy link
Member

@sanderegg sanderegg commented Nov 25, 2025

What do these changes do?

Currently all the newly created nodes via autoscaling get the same docker node labels (such as gpu, dynamic-sidecar). This can create problems when non-gpu machines are created but they still receive these labels.
This PR adds the option to define per EC2 type custom node labels.

For example that means that t3.medium machines will NOT have anymore the gpu label set.

That requires a change in how the EC2_INSTANCES_ALLOWED_TYPES environment variable is created.

Related issue/s

How to test

Dev-ops

@sanderegg sanderegg added this to the Imparable milestone Nov 25, 2025
@sanderegg sanderegg self-assigned this Nov 25, 2025
@sanderegg sanderegg added the a:autoscaling autoscaling service in simcore's stack label Nov 25, 2025
@codecov
Copy link

codecov bot commented Nov 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.26%. Comparing base (8b86539) to head (3e2d6fa).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8635      +/-   ##
==========================================
+ Coverage   87.55%   89.26%   +1.71%     
==========================================
  Files        2009     1349     -660     
  Lines       79039    56662   -22377     
  Branches     1382      227    -1155     
==========================================
- Hits        69200    50578   -18622     
+ Misses       9444     6021    -3423     
+ Partials      395       63     -332     
Flag Coverage Δ
integrationtests 63.87% <ø> (-0.03%) ⬇️
unittests 87.64% <100.00%> (+1.27%) ⬆️
Components Coverage Δ
pkg_aws_library 95.30% <100.00%> (+<0.01%) ⬆️
pkg_celery_library ∅ <ø> (∅)
pkg_dask_task_models_library ∅ <ø> (∅)
pkg_models_library ∅ <ø> (∅)
pkg_notifications_library ∅ <ø> (∅)
pkg_postgres_database ∅ <ø> (∅)
pkg_service_integration ∅ <ø> (∅)
pkg_service_library ∅ <ø> (∅)
pkg_settings_library ∅ <ø> (∅)
pkg_simcore_sdk 84.45% <ø> (-0.14%) ⬇️
agent 93.44% <ø> (ø)
api_server 91.37% <ø> (ø)
autoscaling ∅ <ø> (∅)
catalog 92.06% <ø> (ø)
clusters_keeper 99.14% <100.00%> (+<0.01%) ⬆️
dask_sidecar 91.72% <ø> (ø)
datcore_adapter 97.95% <ø> (ø)
director 75.72% <ø> (ø)
director_v2 91.21% <ø> (-0.05%) ⬇️
dynamic_scheduler 96.66% <ø> (ø)
dynamic_sidecar 90.82% <ø> (ø)
efs_guardian 89.83% <ø> (ø)
invitations 90.90% <ø> (ø)
payments 92.70% <ø> (ø)
resource_usage_tracker 92.00% <ø> (-0.11%) ⬇️
storage 86.74% <ø> (+0.23%) ⬆️
webclient ∅ <ø> (∅)
webserver 86.67% <ø> (-0.06%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b86539...3e2d6fa. Read the comment docs.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mergify
Copy link
Contributor

mergify bot commented Nov 25, 2025

🧪 CI Insights

Here's what we observed from your CI run for 3e2d6fa.

❌ Job Failures

Pipeline Job Health on master Retries 🔍 CI Insights 📄 Logs
CI system-tests Healthy 0 View View
unit-tests Broken 0 View View

@sanderegg sanderegg force-pushed the add-type-specific-labels branch from 2dbe9f8 to 5a41131 Compare November 25, 2025 09:23
@sanderegg sanderegg changed the title ✨Autoscaling: allow to set EC2 type specific docker node labels ✨Autoscaling: allow to set EC2 type specific docker node labels (⚠️ DevOPS) Nov 25, 2025
@sanderegg sanderegg force-pushed the add-type-specific-labels branch from 7685bb6 to 476780c Compare November 25, 2025 17:08
@sanderegg sanderegg force-pushed the add-type-specific-labels branch from 476780c to 8d4f264 Compare November 28, 2025 08:06
@sanderegg sanderegg requested a review from Copilot November 28, 2025 08:11
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for EC2 instance type-specific Docker node labels in the autoscaling system. Previously, all EC2 instances received the same Docker node labels regardless of their type, which caused issues when non-GPU machines incorrectly received GPU labels.

Key changes:

  • Extended EC2InstanceBootSpecific model to include custom_node_labels field for per-instance-type label configuration
  • Changed EC2_INSTANCES_ALLOWED_TYPES and related settings from plain types to Json[T] types for proper JSON parsing
  • Refactored tag key constants to use TypeAdapter validation and defined them as module-level constants
  • Updated tests to verify custom node labels are correctly applied to EC2 instances

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
packages/aws-library/src/aws_library/ec2/_models.py Added custom_node_labels field to EC2InstanceBootSpecific model; migrated from TypeAlias to Python 3.13 type syntax
services/autoscaling/src/simcore_service_autoscaling/core/settings.py Changed EC2 settings fields to use Json[T] type annotations for proper JSON parsing
services/autoscaling/src/simcore_service_autoscaling/utils/utils_docker.py Updated get_new_node_docker_tags to merge custom node labels from EC2 instance type configuration
services/autoscaling/src/simcore_service_autoscaling/utils/utils_ec2.py Refactored tag key constants to use TypeAdapter validation
services/autoscaling/src/simcore_service_autoscaling/modules/cluster_scaling/_provider_computational.py Updated computational provider to include custom node labels
services/autoscaling/src/simcore_service_autoscaling/modules/cluster_scaling/_auto_scaling_core.py Changed to use get_application_settings helper function
services/autoscaling/tests/unit/test_utils_docker.py Added test coverage for custom node labels functionality
services/autoscaling/tests/unit/test_modules_cluster_scaling_dynamic.py Updated test expectations to include custom node labels
services/autoscaling/tests/unit/test_modules_cluster_scaling_computational.py Updated test expectations to include custom node labels
services/autoscaling/tests/unit/test_core_settings.py Added validation test for invalid custom node labels; corrected expected error input type
services/clusters-keeper/src/simcore_service_clusters_keeper/utils/ec2.py Refactored tag key constants and improved type validation using TypeAdapter
services/clusters-keeper/src/simcore_service_clusters_keeper/modules/clusters_management_core.py Updated to use TypeAdapter for tag value validation
services/clusters-keeper/src/simcore_service_clusters_keeper/modules/clusters.py Updated to use TypeAdapter for tag value validation
.pre-commit-config.yaml Upgraded pyupgrade from v3.21.1 to v3.21.2

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

sanderegg and others added 3 commits November 28, 2025 09:54
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@sanderegg sanderegg marked this pull request as ready for review November 28, 2025 09:43
@sanderegg sanderegg requested a review from pcrespov as a code owner November 28, 2025 13:21
@sonarqubecloud
Copy link

Copy link
Contributor

@GitHK GitHK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@mrnicegyu11 mrnicegyu11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice thanks a lot

Copy link
Member

@pcrespov pcrespov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

a:autoscaling autoscaling service in simcore's stack

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Autoscaling: Allow to set specific node labels based on EC2 type

5 participants