Skip to content

Conversation

@adamint
Copy link
Member

@adamint adamint commented Oct 23, 2024

Backport of #6368 to release/9.0

Customer Impact

  • Removes the ability to set HealthStatus directly on a resource, and removes the field from the resource service protocol
  • Fixes a bug where health reports will not be updated if the overall health of the resource does not change
  • Fixes a bug where the health status is not shown in resource details while waiting for health checks to initially run

Testing

Much manual testing, as well as additional unit tests for computing health status.

Risk

Medium. It is possible that this breaking change may impact existing projects that set HealthStatus, and may impact customer assumptions during resource updates - for example, resources now start as unhealthy if they are running and wait on a health check, instead of null, as it previously was.

Regression?

No

Microsoft Reviewers: Open in CodeFlow
Microsoft Reviewers: Open in CodeFlow

* Make resource HealthStatus computed from HealthReports
* Update health reports when they have changed but the status has not

---------

Co-authored-by: Adam Ratzman <adamratzman@microsoft.com>
Co-authored-by: Mitch Denny <midenn@microsoft.com>

(cherry picked from commit a4ef97a)
@davidfowl
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@joperezr
Copy link
Member

Thanks @adamint, I also understand this is something that ACA team needs. Before we mark it as approved though, do you mind expanding a bit on the risk? So as opposed to just saying Medium, elaborate a bit more on what the potential issues are with this?

[InlineData(KnownResourceState.Running, DiagnosticsHealthStatus.Healthy, new string[]{})]
[InlineData(KnownResourceState.Running, DiagnosticsHealthStatus.Healthy, new string?[] {"Healthy"})]
[InlineData(KnownResourceState.Running, DiagnosticsHealthStatus.Unhealthy, new string?[] {null})]
[InlineData(KnownResourceState.Running, DiagnosticsHealthStatus.Degraded, new string?[] {"Healthy", "Degraded"})]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need tests for other states like Exited, or FailedToStart, building etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t, all states other than Running are treated the same

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don’t, all states other than Running are treated the same

Maybe tests to check for that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitionally true, but I can add these tests. Assuming they just need to go into main?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added this in #6461

@radical
Copy link
Member

radical commented Oct 23, 2024

Fixes a bug where health reports will not be updated if the overall health of the resource does not change
Fixes a bug where the health status is not shown in resource details while waiting for health checks to initially run

Can we have tests for these?

@adamint
Copy link
Member Author

adamint commented Oct 23, 2024

Thanks @adamint, I also understand this is something that ACA team needs. Before we mark it as approved though, do you mind expanding a bit on the risk? So as opposed to just saying Medium, elaborate a bit more on what the potential issues are with this?

@joperezr sorry, the edit didn’t save. I’ve added this information now

@adamint
Copy link
Member Author

adamint commented Oct 23, 2024

Fixes a bug where health reports will not be updated if the overall health of the resource does not change

I've added this in #6461

Fixes a bug where the health status is not shown in resource details while waiting for health checks to initially run

There is already a test for this

@joperezr
Copy link
Member

Thanks for adding the additional info, @adamint . I'm trying to parse your risk assessment and specifically to understand who might be broken by this. Am I understanding correctly that in order to be in a broken state is if I am explicitly setting a status for a resource inside my AppHost code? If they wanted to continue to do that, how can they go about it? Also, assuming there is no way to do this in a non-breaking way?

@adamint
Copy link
Member Author

adamint commented Oct 23, 2024

Am I understanding correctly that in order to be in a broken state is if I am explicitly setting a status for a resource inside my AppHost code?

It's a breaking code change; previously, you were able to directly set the HealthStatus and HealthReports for a resource yourself, either through setting the initial state or publishing updates. Now, you are no longer able to do either. Instead, you must add health checks and then the AppHost will compute the health reports and status for you.

If they wanted to continue to do that, how can they go about it? Also, assuming there is no way to do this in a non-breaking way?

We have made the decision to disallow that behavior, so they cannot continue to do that. Now that the only component that can set the HealthStatus is the AppHost, there is only one source of truth.

@joperezr

@joperezr
Copy link
Member

Got it. Last question. Do we know how common is it for people to set their own Health status? Given this is a major version, it is technically ok to make this sort of breaking change, but it does feel like the type of change you don't want to take this late after snap as there will be less time (basically none) to react to feedback. I'm wondering if we considered quirking this, i.e. allowing the users to set an app context switch that allows them to continue setting the status and marking the API call as obsolete, and then fully removing it in the next major. This essentially is to reduce friction in adoption from 8 to 9.

@davidfowl
Copy link
Member

This change needs to go in as it has API changes to both the gRPC proto file and the app model. It also fixes bugs with the current health checks implementation that affect the new dashboard in ACA.

@joperezr
Copy link
Member

Any comments around whether or not we should quirk this? How concerned are we about breaking existing customers?

@davidfowl
Copy link
Member

No we're not quirking this. This should go in as is.

Copy link
Member

@joperezr joperezr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Merging as is, but let's keep an eye out for feedback on this.

@joperezr joperezr merged commit b1e81d7 into release/9.0 Oct 24, 2024
9 checks passed
@joperezr joperezr deleted the adamint/backport-computed-healthstatus branch October 24, 2024 00:28
joperezr added a commit to joperezr/aspire that referenced this pull request Nov 13, 2024
#### AI description  (iteration 1)
#### PR Classification
Merge branch `release/9.0` into `internal/release/9.0` to integrate recent changes and address specific work items.

#### PR Summary
This pull request integrates changes from the `release/9.0` branch into `internal/release/9.0`, addressing several work items related to Azure storage and network isolation.
- `/tests/Aspire.Hosting.Azure.Tests/AzureContainerAppsTests.cs`: Added a new test `ConfigureCustomDomainsMutatesIngress` to validate custom domain configuration for Azure Container Apps.
- `/src/Aspire.Hosting.Azure.AppContainers/ContainerAppExtensions.cs`: Introduced a new extension method `ConfigureCustomDomain` to simplify the process of assigning custom domains to Azure Container Apps.

Related work items: dotnet#6368, dotnet#6391, dotnet#6433, dotnet#6458, dotnet#6467
@github-actions github-actions bot locked and limited conversation to collaborators Nov 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants