Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for full export #72

Open
scottohara opened this issue Feb 16, 2017 · 2 comments
Open

Add support for full export #72

scottohara opened this issue Feb 16, 2017 · 2 comments
Labels

Comments

@scottohara
Copy link
Owner

scottohara commented Feb 16, 2017

For importing, we currently have two options:

  • Full import (truncates all local WebSql tables, fetches the complete data set from Couch, and inserts all rows into the local WebSql tables)
  • Incremental import (only fetches documents from Couch where the current device is listed as "pending")

New devices must register first, and the first import is always a "full" one (fast import slider is not available). Subsequent imports can be either full or fast.

For exporting, we only have one option:

  • Incremental export (only exports entries that are listed in the local Sync table)

Recently we had an issue with the DBaaS vendor hosting our Couch database, where they inadvertently deleted all data as part of a 'routine maintenance' job; requiring a restore from backup.

The only 'backup' we had for this purpose was the data residing in the local WebSql tables on a device, so the restore process therefore was to run a 'full export'.

As the export process only includes entries in the Sync table, to achieve a full export we needed a way to mark every program/series/episode as 'dirty' (i.e. insert it into Sync). This was done by connecting the device via USB and using Safari Web Inspector tools, and manually executing the following SQL statements:

INSERT INTO Sync (Type, ID, Action) SELECT 'Program', ProgramID, 'modified' FROM Program
INSERT INTO Sync (Type, ID, Action) SELECT 'Series', SeriesID, 'modified' FROM Series
INSERT INTO Sync (Type, ID, Action) SELECT 'Episode', EpisodeID, 'modified' FROM Episode

This experience has exposed some gaps in our data loss prevention:

  1. (Naive) assumption that by using a hosted Couch provider, our DBaaS vendor would be responsible for keeping backups of our data, and that at any time we could restore (or request a restore) to an earlier snapshot of the data. As we are on a free hosting plan, there are no user accessible backups (unlike, say, Heroku Postgres); so it is up to us to ensure that we periodically replicate our data to another location.
  2. Performing a full export of the data stored locally on a device is not possible without manual hacks.

To address 2, consider expanding the role of the "Fast import" slider so that in addition to toggling between full/incremental imports, it also toggles between full/incremental exports.

This would involve:

  1. Change the slider label from "Fast import" to something like "Changes Only"
  2. When slider is on, the export behaviour changes so that it ignores the Sync table and exports ALL data (and then clears the Sync table).
  3. We need to consider what happens to pending data on the server-side (e.g. documents that the client doesn't know about).
@scottohara
Copy link
Owner Author

Also need to consider throttling export.

Currently the app immediately dispatches N x HTTP POST requests when the export starts (N = the number of records to be exported).

It was noticed in Web Inspector that these requests would be queued and processed in blocks (presumably of up to 6 at a time, given browser limitations of 6 concurrent TCP connections).

After ~30s, requests that were still queued started failing. (This may explain why sometimes in the app we do an export and see some failures, and retrying the (now much smaller) export works a second time).

It is currently unclear if the 30s timeout is:

  1. jQuery default timeout for POST requests (docs don't specify a default timeout though)
  2. Safari / WebInspector killing long waiting requests
  3. Heroku 30s request timeout (e.g. request is blocked on the server, perhaps a limitation on connections through to the Couch database?)

Either way, a better approach (and perhaps more memory efficient approach for the client too) would be to have the app create up to a fixed number of ajax requests, and use the success/error/done callbacks of those requests to start the next one(s).

e.g.

Request #1 --> success --> Request #7 --> etc..
  Request #2 --> success --> Request #8 --> etc..
    Request #3 --> error --> Request #9 --> etc..
      Request #4 --> success --> Request #10 --> etc..
        Request #5 --> success --> Request #11 --> etc..
          Request #6 --> success --> Request #12 --> etc..
          (cap at 6)

@scottohara
Copy link
Owner Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant