-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GitLab scraping #20
Comments
Gitlab CE now supports filtering However, Gitlab uses Crater's use-case won't be quite that bad. If 5% of projects on Gitlab contain Rust, we'll need only 37.5 minutes of database time to scrape. If the result is cached, perhaps it's feasible to start running crater on Gitlab projects, but it's probably better to wait until keyset-based pagination is re-enabled. |
That's great! Thanks for your work on this ❤️
Yeah, our results are cached, so it wouldn't that bad. If you ping me after the API is deployed on GitLab.com I can look into it, and if the performance is still bad I'll contact GitLab support to make sure we're not hurting them with the scrape. |
Will do! |
Looks like this will be deployed as part of the 11.8 release on February 22. |
If needed, we can add API calls to the |
There's also the question of which instances to scrape. |
Scraping GitLab repositories is required if we want to test them on Crater (and we do). This issue tracks the implementation of the scraper.
The API calls we would need to make to scrape are:
/api/v4/projects
, filtering only Rust ones/api/v4/projects/{id}/repository/tree
to check if the cargo files existsThis is currently blocked on:
/projects
API to support filtering only Rust repos (GitLab issue)The text was updated successfully, but these errors were encountered: