Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TAR as new Export/Import format for speed and streamability #1729

Closed
hpk42 opened this issue Jul 17, 2020 · 8 comments · Fixed by #1749
Closed

TAR as new Export/Import format for speed and streamability #1729

hpk42 opened this issue Jul 17, 2020 · 8 comments · Fixed by #1749
Assignees
Labels
enhancement New feature or request

Comments

@hpk42
Copy link
Contributor

hpk42 commented Jul 17, 2020

The current export/import mechanism is slow and not streamable. This issue is a follow-up and supersedes #1727 after discussions there and on IRC. Note that while this issue talks about streamability, it is not discussing how to get a trusted/encrypted/authenticated stream between two devices.

My Measurements on desktop:

$ ls -la db.sqlite
-rw-rw-r-- 1 hpk hpk 56172544 Jul 17 15:02 db.sqlite
$ time tar cf ~/Downloads/DC-backup-hpk.tar db.sqlite db.sqlite-blobs/
real	0m17.269s
user	0m0.292s
sys	0m2.268s
$ ls -lah ~/Downloads/DC-backup-hpk.tar
-rw-rw-r-- 1 hpk hpk 1,8G Jul 17 15:14 /home/hpk/Downloads/DC-backup-hpk.tar

Exporting via desktop, took 68 seconds

2020-07-17T13:15:24.001Z	core/event            	INFO		"DC_EVENT_INFO"	0	"src/imex.rs:370: Import/export process started."
[... housekeeping was 1 seconds, otherwise lots of file copies into the DB]
2020-07-17T13:16:32.766Z	core/event            	INFO		"DC_EVENT_INFO"	0	"src/imex.rs:394: IMEX successfully completed"
2020-07-17T13:16:32.767Z	core/event            	DEBUG		"DC_EVENT_IMEX_PROGRESS"	1000	0

Importing the Desktop-exported file took 48 seconds (or 58 seconds) and peak >1GB RSS RAM

Importing the tar file took 13 seconds on the same machine

So for export tar is 4 times faster, and on import tar is 3 times faster than sqlite.

Moreover, "tar" is streamable on both sides, will not take any extra space and is RAM efficient.
A suitable tar implementation for use might be https://docs.rs/async-tar/0.1.1/async_tar/

Note that sqlar does not majorly modify the picture likely -- and it's lack of streamability does not make it a good candidate to go through the effort of introducing a new export/import format.

@hpk42 hpk42 changed the title New Export/Import format for speed and streamability TAR as new Export/Import format for speed and streamability Jul 17, 2020
@hpk42 hpk42 added the enhancement New feature or request label Jul 17, 2020
@csb0730
Copy link

csb0730 commented Jul 18, 2020

@hpk42:

  • Streamability:
    What is the benefit to have this? Maybe You can explain why this is important? At the end You can stream every file?

To all:

  • Memory efficiency:
    As I said in a former post: command line r-core (repl) needs a constant memory of around 66mb while importing an 1.6G backup. Isn't that efficient?
  • General flexibility and key points:
    Please read carful section 2. (Advantages) and 3. (Disadvantages) of this page https://sqlite.org/sqlar.html
  • Dependency of one more library
    to support another format than sqlite.
  • Speed:
    How often is backup function needed? Is incremental access to backup required? Maybe in future?
  • Metadata:
    Is more information in backup required than raw file data?

I mean: tar is a good proposal but does it fulfill all requirements?

@csb0730
Copy link

csb0730 commented Jul 18, 2020

Note that sqlar does not majorly modify the picture likely -- and it's lack of streamability does not make it a good candidate to go through the effort of introducing a new export/import format.

The effort in using sqlar (sqlite based backup than now) is quite zero.

@hpk42
Copy link
Contributor Author

hpk42 commented Jul 19, 2020 via email

@csb0730
Copy link

csb0730 commented Jul 19, 2020

Regarding metadata: I speak about additional data like we have now: File version, backup date, backup duration, ...
Ok, you can keep this in a special file in archive. But is is accessable easily enough? ... ;-)

Yes I think all key arguments are on the table!

But I think You see, that I'm still not convinced that tar or zip is the right way ;-)

@csb0730
Copy link

csb0730 commented Jul 21, 2020

@hpk42

So for export tar is 4 times faster, and on import tar is 3 times faster than sqlite.

Compare of speed measurements:

I think you use for import current DC backup_blobs format in sql-file. This means that you need to do the vacuum at the end which is the big disadvantage for import! So, I suspect there is no real speed advantage for tar at import when you're using sqlar as an archive!

@link2xt
Copy link
Collaborator

link2xt commented Jul 23, 2020

I think you use for import current DC backup_blobs format in sql-file.

No, the plan is to pack blobdir into the tar archive as-is, blob table will not be created.

@Hocuri Hocuri self-assigned this Jul 23, 2020
@Hocuri Hocuri mentioned this issue Jul 24, 2020
@csb0730
Copy link

csb0730 commented Jul 31, 2020

I understand that you intend to pack blobdir into archive.

But my comment related speed compare pointed to the fact, what format of sql-file is used. I want to say: you are using current DC backup format (with db-blobs table in database), not sqlar!
Right?

@link2xt
Copy link
Collaborator

link2xt commented Aug 1, 2020

No, there is no blobs table in the database anymore. The database is packed into the tar archive as is, without adding a blob table inside of it.

Hocuri added a commit that referenced this issue Aug 18, 2020
Fix #1729
Co-authored-by: holger krekel  <holger@merlinux.eu>
Co-authored-by: Alexander Krotov <ilabdsf@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants