You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Last year, I used this repository as part of my research of analysing release practices of all Java repositories on GitHub. During this, I discovered that this repository had a few issues, partially to just not being updated in a while. I hope it is not too presumptuous of me to suggest a rework, but I think it could be a nice thing to do and am willing to take it on myself.
This is a tracking issue, documenting all the things I've found (and still remember).
When I encounter/remember more, I'll add them to this issue.
Retrying is not done correctly (sort of related to rate limits)
Outdated Dependencies and Rust Edition
This also includes using libraries like failure which are deprecated
The final scraper I have implemented for Java can be found here, specifically in src/scraper. I'd mostly want to port that code to rust-repos as I've verified it to work and should be mostly applicable.
A natural issue I ran into when scraping millions of repositories is that it can take weeks to scrape all of GitHub when respecting the rate-limits (while using some tricks even).
There are different solutions to this, but importantly it is good to find out how much of an issue this is with Rust, as there are far fewer repositories than Java. This is also related to #65, in its current state it may simply not be feasible to do that, but I can look into if it is.
The text was updated successfully, but these errors were encountered:
Last year, I used this repository as part of my research of analysing release practices of all Java repositories on GitHub. During this, I discovered that this repository had a few issues, partially to just not being updated in a while. I hope it is not too presumptuous of me to suggest a rework, but I think it could be a nice thing to do and am willing to take it on myself.
This is a tracking issue, documenting all the things I've found (and still remember).
When I encounter/remember more, I'll add them to this issue.
failure
which are deprecatedThe final scraper I have implemented for Java can be found here, specifically in
src/scraper
. I'd mostly want to port that code torust-repos
as I've verified it to work and should be mostly applicable.A natural issue I ran into when scraping millions of repositories is that it can take weeks to scrape all of GitHub when respecting the rate-limits (while using some tricks even).
There are different solutions to this, but importantly it is good to find out how much of an issue this is with Rust, as there are far fewer repositories than Java. This is also related to #65, in its current state it may simply not be feasible to do that, but I can look into if it is.
The text was updated successfully, but these errors were encountered: