-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regular windows test failures #2351
Comments
Another error that I have seen multiple times was
And from a quick Google search it seems like NVD might be rate limiting us. But this occurring only from windows is very strange. |
My current theory about why windows has NVD problems more often is
Neither of those is fixable by us directly and I'm probably not going to stand up non-shared runners, so I'm still thinking we either cache in a way that public PRs can use it or use synthetic test data. I need to go chat with our licensing folk about public caches... |
That said, if this is an NVD timeout, we should see if we can detect is and throw a better error message and let the rest of the tests run. So there's still something for us to do short-term to improve the windows test experience. I think @anthonyharrison updated the code in there so I'm sort of surprised it's not firing here. |
@terriko I have been seeing a lot of NVD failures of Linux as well. I tried backing off longer (we currently back off for 3 seconds) to 10 seconds and then 30 seconds to see if it helps (not much!). Watching a full download of the NVD data showed that after about 20 requests, the failures increased. I was wondering if we try to limit the number of parallel requests to a much smaller number to see if that helps when downloading a full copy of the database. |
That actually gives us a potential path forwards:
|
Worth noting: most of our test runners start with a single run of the tool that may hit nvd if the cache isn't available. We do in fact want to see that the tool can be installed and run on each platform, so if that turns out to be what's failing after 3 minutes on windows that may not help. |
@terriko Looking at the RateLimiter code and the copy on Github, I note that the RATE variable is 1 not 10. I also note that just before RateLimiter is set up, the following code exists
19 seems a strange number! Should this be aligned with the number of tokens in the RateLimiter? Maybe if we should introduce environment variables for the RATE. MAX_TOKENS and LIMIT_PER_HOST and then see if we can track down the issue. I am sure having the 4 additional data sources, all with a RateLimiter will also be having some impact. ** UPDATE ** I have found an issue :-) I deleted the database from the cache to force a reload of all of the data. I just got 404 errors from all of the requests using the API. Manually tried the URL and I got the error
So becuase we now default to incremental update, we look for the date of the database. If the database doesn't exist in the call to ** UPDATE ** Update to ** UPDATE ** Looked at NVD website. I think we should be backing off at least 6 seconds. I am also seeing 'payload is not completed' as an error. There is a github issue for this and it looks like it is still an active issue. ** FIX ** There was little bug introduced when NVD 2.0 API was added which prevents the API key being passed to the API. Fxing this gets a full download of the data in around 90 seconds. #2355 contains the fixes. There is now a new issue to be aware of if we have a very old database which is more than 120 days out of date as incremental update won't work. (See #2356) |
Phew, thanks for debugging this @anthonyharrison ! |
Oh, re: RateLimiter. I believe the 19 was empirically defined (which is research paper speak for "someone experimented and that was the number that worked"). Given how much NVD has changed about the rate limits, I will not be shocked if it is no longer the correct number. |
We're getting a lot of windows tests fails on PRs now, mostly with errors like the following:
Which honestly looks similar to what happens when a job times out, but it's happening after 3 minutes so that's not it. It doesn't look like our usual NVD problem where the rate limit gets exceeded, but it could be related to NVD in a different way, I don't really know yet.
Not sure what's up, but I'm filing the issue in case anyone else has any insights or recognizes the message, and to remind me that this needs further investigation.
The text was updated successfully, but these errors were encountered: