-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some Taskcluster runs aren't uploaded to wpt.fyi successfully #604
Comments
The working theory is that my webhook takes too long to respond, sometimes close to 1min, which is well beyond GitHub's 10s timeout for webhooks. The client (GitHub server) then closes the connection, which sometimes causes AppEngine to terminate the request handling thread, leading to incomplete results. We should make sure the webhooks respond within 10 seconds, which I think is totally possible given that there are quite a few HTTP requests we can parallelize. |
@Hexcles would it work to download the results at the other end of the task queue, or is the contract that once the results receiver API has returned, that the client doesn't need to keep storing the results for some unknown number of minutes/hours? |
@foolip yeah that would also work, but I think the download should be well within 10s once parallelized. |
I did a little more digging into the logs. In addition to the potential timeout issue, Taskcluster API itself also sometimes fails. Namely, the endpoint for downloading the test artifacts has failed a few times in the past week. I suspect we are hitting the endpoint too fast and/or too soon after a task finishes (the artifacts might not be available on their cloud storage yet). I'll add retry. |
The latency of the effective webhook requests (the ones that actually upload results, not the no-op ones) has decreased from 85s to 5s, which is well within the GitHub timeout (10s). And with the retry mechanism built in, I believe this is largely solved. Now we just need to wait for a prod release. |
Closing this unless it happens again. |
https://wpt.fyi/test-runs currently shows stable and beta runs from Taskcluster, but not the dev run.
https://api.github.com/repos/web-platform-tests/wpt/commits/ee2e69bfb1d44c4013a8ce94ca6932f86d63aa31/statuses does include "TaskGroup: success" pointing to https://tools.taskcluster.net/groups/ecpaEJHuRfmPunkTiCmsKQ which appears to be in good condition.
This is the weekly run, which is why I noticed. It's possible that runs are being randomly dropped in a way that is less noticeable.
@Hexcles, can you investigate?
The text was updated successfully, but these errors were encountered: