-
Notifications
You must be signed in to change notification settings - Fork 5
Fix #8 - Migration script to backup database file and restart the s… #9
Conversation
…art the server
# check if a previous version of a backup exist and delete it | ||
print 'Backup database already exists. Removing older version of the back up' | ||
call('rm -r backup_db', shell=True) | ||
call('git rm -r backup_db/.', shell=True) |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
@@ -47,7 +60,7 @@ def main(): | |||
# stuff data into database.. | |||
for bug in data['bugs']: | |||
db_session.add( | |||
Issue(bug['id'], bug['summary'], bug['url'], bug['body'])) | |||
Issue(bug['id'], bug['summary'], bug['url'], extract_domain_name(bug['url']), bug['body'], bug['state'], bug['creation_time'], bug['last_change_time'])) |
This comment was marked as abuse.
This comment was marked as abuse.
Sorry, something went wrong.
@deepthivenkat can you summarize the strategy and design here, before addressing review comments? Like, what is the main entry point, by who and when is this script run, what are the expected outcomes. Right now it just looks like it copies issues.db and then puts fresh data into a new issues.db. Why are we doing this? Presumably something to do with migrating db schemas -- but I don't see any code to handle the actual schema changes. I'm also concerned about the idea of creating and populating a new db between killing and starting the app. What if GitHub is down, or slow? Or what happens when there are 10,000 or 100,000 issues? Do we hit API limits? Does this take 2 or 10 minutes, and does that mean the site is down until it's finished? |
I also have concerns generally about basing this on extract_id_title_url.py -- it was written by Hallvord for a different project. RIght now it's not up to date (for example, it doesn't know about Have you verified that the database created from the |
@miketaylr As you mentioned in IRC, few lines in extract_id_title_url.py like https://github.com/webcompat/issue_parser/blob/master/extract_id_title_url.py#L84 is currently unused. Should there be another script for extracting issues specifically for webcompat inside webcompat.com root folder? |
Summarising the strategy: The script would be run by the person who is handling the fabfile for deployment. The script will update the db schema, and create new data dump for the schema, backup old db inside a .gitignored folder in webcompat root repository. The entry point would be to use one of the following tools to handle
If I choose the second option, I will be able to use flask-migrate that runs on alembic. The github api rate limits is 5000 per hour for authenticated users and it is linked to the github account. The number of issues we have currently is already about half of maximum limit. So if we run extract_id_title_url.py a couple of times, we will end up with 403 status code. So it is not very viable to kill the app while populating the new db. Suggestions: We can start the app as soon as running the script for db schema migration(yet to be done using flask-migrate) is complete. When make the db schema changes, the db will be locked for changes. Once the schema change is done, the webhooks we have in webhooks/init.py may try to insert the newly opened issue which is ok. Solutions:
Suggestions for extracting issues:
X-RateLimit-Limit: 5000 and add sleep until the X-RateLimit-Reset and retrieve issues again. If the webhooks/init.py
We can edit the webhook code to handle this error and attempt dumping the data after a time period. We can also move this discussion and files to webcompat repository and let issue_parser to be in peace ^_^ |
This current PR doesn't touch any db schema though, right? (Or am I missing something)?
I think we should start here. Doing a "back-up" is simple, it's literally copying a file and moving it somewhere. But we also want some mechanism to bootstrap a new database with current GitHub information. Right now, the issues.db is 100% unused (and doesn't even contain 100% of issues). If we made any schema changes today, and re-built the DB nothing would break. So I think it's probably too early to add the complexity of alembic, etc when we don't need it yet.
I think this is what we should do first. It should live in the webcompat.com repo as well -- this issue_parser repo is a side-project that was not written with our needs in mind. github-backup |
…erver
r? @miketaylr @karlcow