Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add staging pipeline to PRs #45178

Merged
merged 2 commits into from
Nov 30, 2020
Merged

Add staging pipeline to PRs #45178

merged 2 commits into from
Nov 30, 2020

Conversation

safern
Copy link
Member

@safern safern commented Nov 25, 2020

This PR adds a staging pipeline which sets continueOnError: true on test steps. Also, we're moving Android legs that run on an emulator to this pipeline as they've been unstable for a long time.

I've validated the generated yml and continueOnError is set to true accordingly.

I tried to do it in a minimal way by defining a template expression variable on the common variable template, but continueOnError is evaluated too early on the template expansion that the variable was evaluating to Null. So this was the best way I could think off, where the value of the variable is in a central location.

I just added continueOnError: true for test results, as we should protect build verticals of new platforms from PRs on breaking it and verticals are usually stable on new platforms, so a PR that adds a new platform, should add a vertical that is green from the beginning.

The pipeline name is runtime-staging, I also thought about runtime-onboarding, let me know what your preference is.

Fixes: #44031

cc: @sunandabalu @markwilkie @jkotas @fanyang-mono @naricc

@safern safern requested a review from a team November 25, 2020 01:28
@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@ghost
Copy link

ghost commented Nov 25, 2020

Tagging subscribers to this area: @ViktorHofer
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR adds a staging pipeline which sets continueOnError: true on test steps. Also, we're moving Android legs that run on an emulator to this pipeline as they've been inestable for a long time.

I've validated the generated yml and continueOnError is set to true accordingly.

I tried to do it in a minimal way by defining a template expression variable on the common variable template, but continueOnError is evaluated too early on the template expansion that the variable was evaluating to Null. So this was the best way I could think off, where the value of the variable is in a central location.

I just added continueOnError: true for test results, as we should protect build verticals of new platforms from PRs on breaking it and verticals are usually stable on new platforms, so a PR that adds a new platform, should add a vertical that is green from the beginning.

The pipeline name is runtime-staging, I also thought about runtime-onboarding, let me know what your preference is.

Fixes: #44306

cc: @sunandabalu @markwilkie @jkotas @fanyang-mono @naricc

Author: safern
Assignees: -
Labels:

area-Infrastructure

Milestone: -

@@ -0,0 +1,110 @@
# The staging pipeline only makes sense to run it on PRs.
trigger: none
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disabled batched rolling builds for the staging pipeline as it didn't make sense to have it. However I did think that it might be useful for people trying to get a new platform stable, to have rolling builds that do fail the build when there are test failures.

Opinions on that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit I'm not a big fan of this idea for two reasons: first, subtle differences between PR and rolling builds are enormously cryptical for most developers. I myself have learned to appreciate and leverage them in the runtime repo but only after having been involved in infrastructure for quite some time. Second, according to my understanding the conceptual thinking is that failures in the staging runs are not "hard errors", they are just indicators that are subsequently processed by a different statistic mechanism to identify "true errors". I worry that having them manifest as "hard errors" in half of the runs would cause more harm than good - after all, [the few] developers standing up runs for new platforms are typically experienced engineers used to digging through logs to validate their efforts, it seems counter-intuitive to add a feature benefiting just them and potentially harming / confusing to most other developers. Just my one cent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally a staging pipeline should do exactly what the prod pipeline does ie run as part of PRs and CI/Rolling builds; on the same platforms (unless a new platform is being introduced) etc. So when the test running in staging passes consistently it can get promoted with confidence to the prod pipeline. IMHO, we should have rolling builds along with PR builds for staging pipeline.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your thoughts on this. @trylek I completely agree with your points, however the reason why I was asking is because there are 3 points that made me raise this question:

1st is @sunandabalu's point above.
2nd is, in order to measure pass rate, we will need to have a rolling build with "failures" in it, as PR builds of this will always be green if there are test failures, so it will be harder to get data on that.
3rd if someone breaks a test or something on their PR because this build was green, by having rolling builds that actually fail when running tests, we will know sooner rather than later and also it would be easier to track which change actually broke it.

Please let me know if that makes sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Santi for the clarifications, they basically make sense to me. I think part of the problem is that, according to my recollections, in our past discussions on the subject we combined and maybe somewhat conflated two fundamentally different types of tests that cause trouble in PR / CI runs and should be moved or reclassified:

(a) Tests that exhibit some inherent failure rate due to relying on external factors, typically network availability (e.g. some of the HTTP or SSL tests);

(b) Tests being brought up, typically the Mono tests on devices, that exhibited some initial flakiness after they started running on a regular basis but issues got filed for these and they ultimately became green, possibly modulo group (1).

For category (b) the staging pipeline is a trivial and natural solution similar to flaky test quarantine used by ASP.NET infra. Its application to (a) is more tricky because then your bullet 1 (sunandabalu's comment) becomes somewhat contradictory to the remaining bullets 2-3. Simply put, you cannot just "quarantine" all networking tests because they fail 1% of the time due to transient network failures.

You can move them to another pipeline so that you can track and measure them separately but that doesn't change the fact that our mental model based on Jared's initial analysis is that their results are not "binary". Here we hit the problem that AzDO infra is built on tests having binary results. So that, if / as we don't seem to have any easy leverage to make AzDO implement non-binary test outcomes in their core infra, we need some customization or bolt-on to make them work.

This customization basically boils down to either reporting individual failures as PR errors (which is what we have today and don't like), not reporting them as errors at all and just use some custom side tooling a la runfo to monitor their long-term trends and raise some red flags / file issues when their failure rate exceeds a certain threshold (which is what I suggested but is not ideal either as you point out) or reporting them as errors in the staging pipeline and not in the "real" PR pipeline (that seems to go against sunandabalu's point that the two pipelines should behave in a similar manner letting tests freely flow between them; I also cannot help feeling it's somewhat counter-intuitive).

Maybe if AzDO at least supported three-state test outcomes, we could use some "unknown" or "yellow" state for these; but most likely it's a no-go. That seems to imply we're down to identifying the "lesser evil" and I think your pick is probably best from a practical perspective despite my reservations.

Copy link
Member Author

@safern safern Nov 30, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @trylek for the points. While I completely agree with this, I think this would make sense if we would include bucket a, which we agreed it is out of the scope at the moment as we already have a mechanism for that which is adding ActiveIssue to those tests.

So I think we should be able to enable CI runs for this with failures on the tests to track pass rate and to track things that might have broken it since the intent for the pipeline is only bullet point b that you mention. Anyway I will do this in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume too that if there were additional tools to make the tests more resilient to transient issues (like network), that this would be acceptable as well @trylek ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely a useful improvement compared to what we had before. I just wanted to point out that the question of "resiliency to transient issues" (no matter whether that means resiliency of the tests themselves or of the test infra) is largely orthogonal and not fully addressed / fixed by this change - this was not immediately obvious to me and may have caused some of my confusion.

@jkotas
Copy link
Member

jkotas commented Nov 25, 2020

The pipeline name is runtime-staging

👍

Copy link
Member

@trylek trylek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me, thank you!

@@ -0,0 +1,110 @@
# The staging pipeline only makes sense to run it on PRs.
trigger: none
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit I'm not a big fan of this idea for two reasons: first, subtle differences between PR and rolling builds are enormously cryptical for most developers. I myself have learned to appreciate and leverage them in the runtime repo but only after having been involved in infrastructure for quite some time. Second, according to my understanding the conceptual thinking is that failures in the staging runs are not "hard errors", they are just indicators that are subsequently processed by a different statistic mechanism to identify "true errors". I worry that having them manifest as "hard errors" in half of the runs would cause more harm than good - after all, [the few] developers standing up runs for new platforms are typically experienced engineers used to digging through logs to validate their efforts, it seems counter-intuitive to add a feature benefiting just them and potentially harming / confusing to most other developers. Just my one cent.

@stephentoub
Copy link
Member

Thanks, @safern.

@safern safern merged commit 9f92311 into master Nov 30, 2020
@safern safern deleted the AddStagingPipeline branch November 30, 2020 23:56
@ghost ghost locked as resolved and limited conversation to collaborators Dec 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Creating a separate pipeline to stage onboarding new test targets
8 participants