Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SuperPMI artifact logging #77685

Conversation

AaronRobinsonMSFT
Copy link
Member

@ghost
Copy link

ghost commented Oct 31, 2022

Tagging subscribers to this area: @hoyosjs
See info in area-owners.md if you want to be subscribed.

Issue Details

Fix suggested at dotnet/arcade#11447

/cc @michellemcdaniel @riarenas

Author: AaronRobinsonMSFT
Assignees: -
Labels:

area-Infrastructure-coreclr

Milestone: 8.0.0

@BruceForstall
Copy link
Member

If the System.JobAttempt works, then there is no need for "continueOnError: true" (because if we can't upload an artifact then it really is an error). Note that in the SuperPMI pipelines, this error isn't fatal (but is reported in the UI as an error).

However, the documentation for System.JobAttempt says that it is not available in templates:

https://learn.microsoft.com/en-us/azure/devops/pipelines/build/variables?view=azure-devops&tabs=yaml

and this YAML is in a template. So does it work?

Also, the other SuperPMI pipelines have the same issue, not just this one.

Finally, every other pipeline that uses "PublishPipelineArtifact@1" (which is all of them?) simply sets "continueOnError: true" which means they fail to upload artifacts for 2nd and subsequent runs, and ignore those failures. They could also benefit from a change like this that properly uploads artifacts, if it works.

@riarenas
Copy link
Member

However, the documentation for System.JobAttempt says that it is not available in templates:

That usually means the variable is not available at template evaluation time, like to resolve conditionals on whether certain parts of YAML should execute.

The source-build template follows the approach of using the attempt just like this suggested change:

artifactName: BuildLogs_SourceBuild_${{ parameters.platform.name }}_Attempt$(System.JobAttempt)

That template uploads its logs as:
image

@riarenas
Copy link
Member

riarenas commented Oct 31, 2022

Note that in the SuperPMI pipelines, this error isn't fatal (but is reported in the UI as an error).

That doesn't seem to be the case for this pipeline, so I would check others for the same bug. https://dev.azure.com/dnceng-public/public/_build/results?buildId=65930&view=results 's only failures were log upload failures, and they caused the build to fail.

@BruceForstall
Copy link
Member

That doesn't seem to be the case for this pipeline, so I would check others for the same bug. https://dev.azure.com/dnceng-public/public/_build/results?buildId=65930&view=results 's only failures were log upload failures, and they caused the build to fail.

Ok, my definition of "fail" here is a little different: the "real work" of the pipeline completes, and subsequent steps all get run, but the job still reports as failure.

That usually means the variable is not available at template evaluation time, like to resolve conditionals on whether certain parts of YAML should execute.

That makes sense.

But we should actually test it with (1) a successful run where the log file name is now different, and (2) a failing run where we force a retry and see two log files.

Then, we should change all SuperPMI pipelines (and other pipelines) to the same system.

@AaronRobinsonMSFT
Copy link
Member Author

I am going to change all uses of PublishPipelineArtifact@1 in the SuperPMI pipelines to append the _Attempt$(System.JobAttempt and mark them as continueOnError: true.

Any other changes? I'd prefer to avoid continually kicking the CI, so let's iterate on expectations before I submit an update.

@BruceForstall
Copy link
Member

The changes LGTM as-is.

I don't think the continueOnError: true is necessary, but since everybody else uses it for PublishPipelineArtifact@1 I guess it's fine. If included, it means that any failure to upload a log file will not show up as a job failure. Maybe that's ok, or maybe it will hide problems that should be investigated. Given that the upload failure shouldn't happen, it seems like it should be investigated. At least to make sure that we're generating the correct file name that we expect to upload. (If there's some transient network issue on upload, that's out of our control.)

@AaronRobinsonMSFT
Copy link
Member Author

Thanks @BruceForstall.

Can I get a sign off from someone?

@BruceForstall
Copy link
Member

Can I get a sign off from someone?

I'm happy to sign off after it is tested.

@AaronRobinsonMSFT
Copy link
Member Author

@BruceForstall Whats the best way to make SuperPMI fail?

@BruceForstall
Copy link
Member

@BruceForstall Whats the best way to make SuperPMI fail?

I think you first want to test the success case. Note that the runtime-coreclr superpmi-diffs pipeline and runtime-coreclr superpmi-replay pipelines did not run on this PR because they only run if a file in the JIT directory changes. You probably want to trigger them manually (azp run might not work).

As for failure, this will cause it for superpmi replays:

https://github.com/dotnet/runtime/compare/main...BruceForstall:FailSpmi?expand=1

@AaronRobinsonMSFT
Copy link
Member Author

/azp list

@azure-pipelines
Copy link

CI/CD Pipelines for this repository:

@AaronRobinsonMSFT
Copy link
Member Author

/azp run runtime-coreclr superpmi-replay
/azp run runtime-coreclr superpmi-diffs
/azp run runtime-coreclr superpmi-asmdiffs-checked-release

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@AaronRobinsonMSFT
Copy link
Member Author

@BruceForstall Can you help me trigger this? I am signed into AzDO, but am not being shown anything that allows me to trigger this run.

@BruceForstall
Copy link
Member

I triggered superpmi-replay manually by going to https://dev.azure.com/dnceng-public/public/_build?definitionId=150&_a=summary, hitting "Run Pipeline", then using "refs/pull/77685/head" as the "Branch".

There's no need, IMO, to run the others: it would be the same testing, and there's no need to burn all those cycles.

@BruceForstall
Copy link
Member

@AaronRobinsonMSFT AaronRobinsonMSFT added the NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) label Nov 1, 2022
@AaronRobinsonMSFT
Copy link
Member Author

@AaronRobinsonMSFT
Copy link
Member Author

I will revert my breaking trigger.

@runfoapp runfoapp bot mentioned this pull request Nov 1, 2022
Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing the testing.

@AaronRobinsonMSFT AaronRobinsonMSFT merged commit 0e6fa62 into dotnet:main Nov 1, 2022
@AaronRobinsonMSFT AaronRobinsonMSFT deleted the update_superpmi_artifact_logging branch November 1, 2022 18:07
@ghost ghost locked as resolved and limited conversation to collaborators Dec 1, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-Infrastructure-coreclr NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants