Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build Image in Azure is unreliable #3615

Closed
charris-msft opened this issue Sep 16, 2022 · 3 comments
Closed

Build Image in Azure is unreliable #3615

charris-msft opened this issue Sep 16, 2022 · 3 comments
Labels

Comments

@charris-msft
Copy link

charris-msft commented Sep 16, 2022

Issue

I've run through the "Build Image in Azure..." process at least 10 times and it has failed at least 6 of those. I've seen failures after 122 seconds, 2 seconds, 1 second, etc.

I'm trying to build the container in Canada Central because I'm working with ACA and I know there is ACA capacity in that region.

I can run the command again with the exact same parameters and it may fail or may succeed.

  • The error message tells me to check the URL, but I didn't set a URL so I can't check it.
  • The error message tells me to check whether the credentials have expired, but I didn't specify any credentials so I can't check those.
  • The error message typically gets logged more than once and in the screenshot, you can see that it was logged three times for each of the attempts. This create more confusion when I was starting out because the first time it was logged twice, so I assumed there was at least one automatic retry. Without any progress indication, I wasn't sure when it was actually done. On my 2nd attempt, it only logged the error once, so I waited 5-10 minutes before realizing it wasn't actually retrying the operation.

Here are some examples:
image

Expected behavior

Ideally there wouldn't be a failure, but those happen. I expect

  • the error message to provide more useful information given the context of where I am running the command
  • error to be logged only once
  • the output to clearly indicate that the operation has failed so I'm not waiting and wondering

Video Repros

https://microsoft.sharepoint.com/:f:/t/JEM/EklDiahGaQtDigc5E2SlrYAB3YU7H2NvdoiyIo1AkQUZ-A?e=eCGRIh

@bwateratmsft
Copy link
Collaborator

Related to #3616 and #3617. The error logs are being streamed from Azure so we don't control their content. Additionally, the service itself is encountering the failures, and since it doesn't happen all the time I'm inclined to believe it's a service issue.

I'll use #3616 and #3617 to track logging improvements but there's nothing we can do to improve service reliability.

@bwateratmsft bwateratmsft closed this as not planned Won't fix, can't repro, duplicate, stale Sep 19, 2022
@charris-msft
Copy link
Author

@bwateratmsft -
I've got a couple quesitons regarding your comment "there's nothing we can do to improve service reliability"

  1. Can you implement some retry logic in the extension to try at least one extra time in the even of a service failure?
  2. Which team owns the service? I'd like to see if there is anything they can do to improve things on their end.

@bwateratmsft
Copy link
Collaborator

bwateratmsft commented Sep 19, 2022

These errors are happening within the service itself. The blob containing source code has been uploaded to blob storage (IIRC the container registry has its own blob storage account for that, or something along those lines), but for some reason the image build machines are failing to download it. Adding retry to the blob upload wouldn't help, and we don't have any control over those image build machines.

The owners are probably the ACR team, but regrettably I don't know any contacts for that 😞

@microsoft microsoft locked and limited conversation to collaborators Nov 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants