-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issues data dump #165
Comments
Right now 100% of the data on issues can be extracted from https://developer.github.com/v3/issues/. We may want to store data in a local db in the future, however (the only thing we store in the db is username and avatar logo URI) |
Related to @miketaylr comment |
I guess we could store a copy of all API responses from GitHub in a NOSQL db or something like PostGres which can do JSON. This could be how we implement what I described in #239 (comment)... maybe. |
to karl explore to do the data dump using github repo. |
https://github.com/webcompat/issue_parser/blob/master/dump_webcompat_to_db.py creates a database with fields id, summary, URL and fills it with all data from our existing issues. (It needs to be copied along with the extract_id_title_url.py script to somewhere it sees the webcompat module to import the 'engine' database pointer). The plan is to
|
Should we add a field having the full body to this DB? It's not hard. |
Issue #165. Moving database connection stuff to a dedicated db module
This branch: It's tested locally and seems to work fine. It might lack a tiny bit of polish, I'll do pep8 and flake8 and such shortly before doing a pull request. It has a field called "url" but should perhaps have a "domain" field? The webhooks stuff on that branch covers new issues. Additionally, there's code here: Comments welcome, but feel free to ignore small issues that pep8/flake8 will complain about :) |
Hmm. @hallvors took the ownership. Let me change that. |
Indeed I took over this, and I didn't even plan to, just happened to think that the stuff in issue_parser repo could be extended slightly to make a DB dump - and went from there.. :) Now, we're still not grabbing several details that matter: labels, comments.. It's probably easyish to do, but just as a sort of backup feature. The issues DB we'll be using to generate RSS and JSON data, but a comments DB would just be an "in case we want to move off GitHub" thing. So I don't think that's a priority.. |
(Of course one use case for more data in the DB is if we need to or ought to use GitHub's APIs less. This might also make stuff faster and make it easier to build more markup on the server and use less JS? We'd have potential syncing complications though..) |
Sure. I'll try to recognise the line between "exploring an idea" and "taking over a task" ;) |
To note that this issue was about creating an "issues data dump", which is not totally related to what @hallvors is trying to do here, which is a lot more complicated. The initial issue is
Somehow this is more but in a different way :) What @hallvors is trying to do (and is useful) would deserve a separate issue (something like Live Issues DB) and relates a lot more to my comment in #327 (comment) Right now the proposed code is one way.
Some cases for out of sync of the db:
With the Live Issues DB, we could indeed send templates with more information and resync if it's slightly out of date, etc. aka using the xhr stuff for its purpose. That would be neat. |
As for
nitpick. Recommendations:
;) |
[Edit: removing process discussion, let's keep this on topic]
|
(Given this state of affairs though - this bug could be considered fixed by the dump_webcompat_to_db.py script. If someone wants a local DB with a dump of all the issues in webcompat/web-bugs/issues all they need to do is to clone that repo and run the script AFAIK. @karlcow , thoughts?) |
Closing this since code to create a local "data dump" has been written. |
@karlcow @hallvors I'm trying to run the dump_webcompat_to_db.py to dump the data from github to issues.db I tried the below steps.
Am i missing anything here? |
The error message says it: Basically you got a module which is not installed. BUT
|
It could be useful for statistics purpose to have a dump of the full DB of issues.
So someone could do matching with a local db, or extract the information about their own system for importing.
Not urgent.
The text was updated successfully, but these errors were encountered: