Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

issues data dump #165

Closed
karlcow opened this issue Jul 8, 2014 · 20 comments
Closed

issues data dump #165

karlcow opened this issue Jul 8, 2014 · 20 comments

Comments

@karlcow
Copy link
Member

karlcow commented Jul 8, 2014

It could be useful for statistics purpose to have a dump of the full DB of issues.
So someone could do matching with a local db, or extract the information about their own system for importing.

Not urgent.

@miketaylr
Copy link
Member

Right now 100% of the data on issues can be extracted from https://developer.github.com/v3/issues/. We may want to store data in a local db in the future, however (the only thing we store in the db is username and avatar logo URI)

@karlcow
Copy link
Member Author

karlcow commented Aug 20, 2014

Related to @miketaylr comment
0a61cb0

@miketaylr
Copy link
Member

I guess we could store a copy of all API responses from GitHub in a NOSQL db or something like PostGres which can do JSON. This could be how we implement what I described in #239 (comment)... maybe.

@karlcow
Copy link
Member Author

karlcow commented Sep 30, 2015

to karl explore to do the data dump using github repo.

@hallvors
Copy link
Contributor

hallvors commented Oct 1, 2015

https://github.com/webcompat/issue_parser/blob/master/dump_webcompat_to_db.py creates a database with fields id, summary, URL and fills it with all data from our existing issues.

(It needs to be copied along with the extract_id_title_url.py script to somewhere it sees the webcompat module to import the 'engine' database pointer).

The plan is to

@hallvors
Copy link
Contributor

hallvors commented Oct 1, 2015

Should we add a field having the full body to this DB? It's not hard.

miketaylr pushed a commit that referenced this issue Oct 14, 2015
Issue #165. Moving database connection stuff to a dedicated db module
@hallvors
Copy link
Contributor

This branch:
https://github.com/webcompat/webcompat.com/tree/dbdump
has code that dumps data about new bugs to a local SQLite database file called issues.db

It's tested locally and seems to work fine. It might lack a tiny bit of polish, I'll do pep8 and flake8 and such shortly before doing a pull request.

It has a field called "url" but should perhaps have a "domain" field?

The webhooks stuff on that branch covers new issues. Additionally, there's code here:
https://github.com/webcompat/issue_parser/blob/master/dump_webcompat_to_db.py
that can be run at a suitable point in time to grab all the data for existing issues and adding them to the db. This is more of a one-off operation, you would place that file and extract_id_title_url.py next to the webcompat folder and run.py, log in with ssh and run the dump_webcompat_to_db.py script, and remove those two again.

Comments welcome, but feel free to ignore small issues that pep8/flake8 will complain about :)

@karlcow karlcow assigned hallvors and unassigned karlcow Oct 20, 2015
@karlcow
Copy link
Member Author

karlcow commented Oct 20, 2015

Hmm. @hallvors took the ownership. Let me change that.

@hallvors
Copy link
Contributor

Indeed I took over this, and I didn't even plan to, just happened to think that the stuff in issue_parser repo could be extended slightly to make a DB dump - and went from there.. :)

Now, we're still not grabbing several details that matter: labels, comments.. It's probably easyish to do, but just as a sort of backup feature. The issues DB we'll be using to generate RSS and JSON data, but a comments DB would just be an "in case we want to move off GitHub" thing. So I don't think that's a priority..

@hallvors
Copy link
Contributor

(Of course one use case for more data in the DB is if we need to or ought to use GitHub's APIs less. This might also make stuff faster and make it easier to build more markup on the server and use less JS? We'd have potential syncing complications though..)

@miketaylr
Copy link
Member

@hallvors we assigned this to @karlcow in Paris. I'm sure he's got plenty of other things to do right now, but in the future can you give a heads up or ask if he's not in the middle of something before taking over? 🚔

@hallvors
Copy link
Contributor

Sure. I'll try to recognise the line between "exploring an idea" and "taking over a task" ;)

@karlcow
Copy link
Member Author

karlcow commented Oct 22, 2015

To note that this issue was about creating an "issues data dump", which is not totally related to what @hallvors is trying to do here, which is a lot more complicated.

The initial issue is

It could be useful for statistics purpose to have a dump of the full DB of issues.
So someone could do matching with a local db, or extract the information about their own system for importing.

Somehow this is more but in a different way :)
https://github.com/webcompat/issue_parser/blob/master/dump_webcompat_to_db.py
(which I didn't know before the comment here).

What @hallvors is trying to do (and is useful) would deserve a separate issue (something like Live Issues DB) and relates a lot more to my comment in #327 (comment)

Right now the proposed code is one way.

  1. someone enters data
  2. data gets pushed to a DB on the opened event through the webhook.

Some cases for out of sync of the db:

  • If someone fixes the data on github
  • if someone deletes the issue on github
  • if someone fills the issue directly on github

With the Live Issues DB, we could indeed send templates with more information and resync if it's slightly out of date, etc. aka using the xhr stuff for its purpose. That would be neat.

@karlcow
Copy link
Member Author

karlcow commented Oct 22, 2015

As for

Sure. I'll try to recognise the line between "exploring an idea" and "taking over a task" ;)

nitpick. Recommendations:

  1. Read the initial comment of the issue
  2. Ask about it in the comments
  3. Discuss about it in the issue through the comments.
  4. Nothing is really urgent, allow time for people to reply, think about it.

;)

@karlcow
Copy link
Member Author

karlcow commented Oct 22, 2015

Replying to my own comment:

What @hallvors is trying to do (and is useful) would deserve a separate issue (something like Live Issues DB) and relates a lot more to my comment in #327 (comment)

This is partly #327 in fact.

@hallvors
Copy link
Contributor

[Edit: removing process discussion, let's keep this on topic]

@hallvors
Copy link
Contributor

(Given this state of affairs though - this bug could be considered fixed by the dump_webcompat_to_db.py script. If someone wants a local DB with a dump of all the issues in webcompat/web-bugs/issues all they need to do is to clone that repo and run the script AFAIK. @karlcow , thoughts?)

@hallvors
Copy link
Contributor

Closing this since code to create a local "data dump" has been written.

@deepthivenkat
Copy link
Member

@karlcow @hallvors I'm trying to run the dump_webcompat_to_db.py to dump the data from github to issues.db

I tried the below steps.

  1. I have cloned webbugs, issueparser and webcompat repo.
  2. As mentioned above, I have the copy of dump_to_db.py and extract_id_title_url.py inside the webcompat.com repo.
  3. When i ran dump_to_db.py, I get the below error.

Am i missing anything here?

photo593046103980615621

@karlcow
Copy link
Member Author

karlcow commented Jul 7, 2016

@deepthivenkat

The error message says it:
ImportError: no module named db.

Basically you got a module which is not installed. BUT

  1. Avoid to clutter your current webcompat working space with other scripts, not part of the projet, you might end up adding and committing them to the project.
  2. create a specific working environment for this script.
  3. If you want I can look at https://github.com/webcompat/issue_parser and straighten it a bit to make it more beginner friendly before you work on it. This could avoid future mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants