Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Capacity Reservation support #1977

Open
wants to merge 13 commits into
base: master
Choose a base branch
from

Conversation

solovyevt
Copy link

Closes #1155.

Implemented reservation profile parameter. It is not a final version wanted to check a few things first:

  • Wasn't sure if I should include any explicit validation on CLI client, such as verifying that spot_policy: spot and reservation: <id> are not provided simultaneously. It makes more sense to implicitly filter out spot: true offerings during the offer acquisition, but could not find a way to do it in the least intrusive way.

  • Couldn't squeeze in server-side capacity block validations inside AWSCompute without modifying the signatures of existing functions. For example, capacity blocks should certainly be validated in regards to their available capacity and instance type, and this is something that cannot extracted just from requirements alone. Is it okay if I expand the base class Compute with additional methods for reservation validation?

This is only a fleet implementation in the moment, some minor changes for dev environments are left out on purpose to avoid intrusive plumbing without checking up with you first. Thanks!

@solovyevt
Copy link
Author

One addition: from what it seems, Capacity Reservations and Capacity Blocks share the same API with negligible differences, so one can replace the other for testing, at least in this scenario.

Copy link
Collaborator

@jvstme jvstme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @solovyevt, thanks for this draft, everything looks good so far.

explicit validation on CLI client, such as verifying that spot_policy: spot and reservation: <id> are not provided simultaneously

I think validation in CLI is optional. The server will still need to filter out spot offers if the configuration specifies spot_policy: auto.

filter out spot: true offerings during the offer acquisition, but could not find a way to do it in the least intrusive way

As an option, this filtering can be done in AWSCompute.get_offers if you'll make this method aware of whether a capacity block is used or not.

Is it okay if I expand the base class Compute with additional methods for reservation validation?

Yes, feel free to add new methods or update the signatures of existing ones as you see fit. I think the validation could be done in AWSCompute.get_offers if you make it aware of capacity blocks, but that choice is up to you.

@solovyevt
Copy link
Author

@jvstme Ready for review

@solovyevt solovyevt changed the title [Draft] AWS Capacity Reservation support AWS Capacity Reservation support Nov 13, 2024
@@ -51,6 +53,7 @@ def get_fleets_table(fleets: List[Fleet], verbose: bool = False) -> Table:

row = [
fleet.name if i == 0 else "",
fleet.spec.configuration.reservation if i == 0 else "",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only include reservation value if verbose. Otherwise it shifts all the values without verbose:

✗ dstack fleet
 FLEET                  INSTANCE  BACKEND  RESOUR…  PRICE     STATUS  CREATED      
 popular-rattlesnake-1            0        aws      2xCPU,    $0.107  provis…  now 
                                           (eu-we…  8GB,                           
                                                    100.0GB                        
                                                    (disk) 

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5647aff

instance_count: Optional[int] = 0,
instance_types: Optional[List[str]] = None,
is_capacity_block: Optional[bool] = False,
) -> Optional[str]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's not an str that's returned?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d48d3ec

Comment on lines 610 to 612
instance_count: Optional[int] = 0,
instance_types: Optional[List[str]] = None,
is_capacity_block: Optional[bool] = False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is a need to make instance_count or is_capacity_block optional – just having default values would do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d48d3ec

@r4victor
Copy link
Collaborator

Pointed out some issues, but LGTM overall. Leaving it to @jvstme for a more thorough review.

Comment on lines 112 to 115
reservation: Annotated[
Optional[str],
Field(description="The existing reservation for the instances"),
] = None
Copy link
Collaborator

@r4victor r4victor Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introducing a new field breaks client backward compatibility since a new client will send a value to an old server that will reject it. The CLI can explicitly handle that (some examples https://github.com/dstackai/dstack/blob/a7410ac605a37bc4342c9b4f5a1a079c882674f9/src/dstack/api/server/_runs.py#L115-L127)

Probably @jvstme can help with that.

Copy link
Author

@solovyevt solovyevt Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've addressed backward compatibility in 2b93408, but would be great to have it validated. Tested myself against the latest server version, i.e. 0.18.25.

Comment on lines +103 to +106
reservation: Annotated[
Optional[str],
Field(description="The existing reservation for the instances"),
] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just pointing out another client backward incompatibility that we need to address.

@r4victor
Copy link
Collaborator

@Solovyev, fix the tests please.

@solovyevt
Copy link
Author

solovyevt commented Nov 14, 2024

@r4victor Python tests are fixed. For the future, is it possible to run them locally? Couldn't find any info in contribution docs.

@r4victor
Copy link
Collaborator

@solovyevt, we somehow missed the section on tests in contributing. I updated it: https://github.com/dstackai/dstack/blob/master/CONTRIBUTING.md#run-tests

@solovyevt
Copy link
Author

solovyevt commented Nov 14, 2024

Please wait with the merge.

Suddenly realised that there might be a nasty bug related to resource quota filtering in get_offers() in AWSCompute class since Capacity Blocks - unlike Capacity Reservations - do not count into an account-wide on-demand instance quota. This might lead to reserved compute instances to be filtered out and fail with No offers available.

To validate the fix more thoroughly, I will test it on one of our upcoming Capacity Blocks next Monday. Sorry for the trouble.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: Support the Capacity Blocks for ML feature for AWS
4 participants