CorruptError with backup #158

rogerbinns · 2024-05-30T23:25:31Z

This uses a disk database testdb and sets an encryption key - dbmain

A memory database is then created and filled with gunk - dbtmp and filldb

The memory database is backed up into (overwriting) the disk database.

A SELECT on dbmain gives CorruptError.

If the upper limit in range in filldb is changed to 3 then the problem doesn't happen, so this is size dependent.

The disk database is permanently corrupt. If you open it separately again, set the key, and try to read the contents you get CorruptError.

Reproduction script

import apsw


def filldb(db):
    db.execute("create table a(x)")
    for i in range(1, 4):
        db.execute("insert into a values(?)", ("aaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb" * i * 8192,))


dbmain = apsw.Connection("testdb")
dbmain.pragma("key", "hello world")

dbtmp = apsw.Connection("")
filldb(dbtmp)

with dbmain.backup("main", dbtmp, "main") as b:
    b.step()

dbmain.execute("select * from a").get

The text was updated successfully, but these errors were encountered:

utelle · 2024-05-31T20:25:49Z

This uses a disk database testdb and sets an encryption key - dbmain

This step is ok, of course.

A memory database is then created and filled with gunk - dbtmp and filldb

The memory database is backed up into (overwriting) the disk database.

However, this is a problem in the context of encrypted databases.

One restriction of the sqlite3mc encryption extension (and actually of SEE and SQLCipher as well) is that memory databases and temporary databases are not encrypted.

Additionally, a restriction of the backup API is that you can't change page size and/or number of reserved bytes per page. That is, the page size and the number of reserved bytes per page of the source and target database need to be the same.

The default cipher is chacha20 and the number of reserved bytes per page is 32. Memory and temporary databases are unencrypted and have zero reserved bytes. And that's why backup doesn't work properly.

To overcome this obstacle you can use one of the following approaches:

Use a cipher with zero reserved bytes - namely aes128cbc or aes256cbc - for the main database
Avoid the use of a memory or temporary databases as the target or source of backup. Use a normal database and enable encryption for it with the same cipher as the main database.

A SELECT on dbmain gives CorruptError.

If you use one of the above solutions the error will go away.

If the upper limit in range in filldb is changed to 3 then the problem doesn't happen, so this is size dependent.

This can be explained. dbtmp is unencrypted with zero reserved bytes. If the size of the content data is bigger than page size minus reserved bytes, part of the content will be overwritten by the reserved bytes during encryption. Hence, the "restored" database will be corrupted, because content data are partially missing.

The disk database is permanently corrupt. If you open it separately again, set the key, and try to read the contents you get CorruptError.

See the explanation above.

rogerbinns · 2024-05-31T22:38:55Z

I just did a test with two disk databases (no encryption) with different page sizes, and backed up one into the other. The content backup definitely happened when examining the content, and pragma page size continued to report the pre-backup page sizes. After a vacuum the reported page size did change to that of the backup source.

Doing other tests where one of the disk databases is encrypted and the other isn't results in no errors, but then the encrypted one ends up corrupt.

The bug here is not that there are combinations of memory, disk, plain, encrypted, different ciphers etc that cannot be backed up into each other, but that if you do so no error is returned with the backup API saying everything succeeded. You only later get a corrupt error after it is too late.

utelle · 2024-06-01T13:37:21Z

The bug here is not that there are combinations of memory, disk, plain, encrypted, different ciphers etc that cannot be backed up into each other, but that if you do so no error is returned with the backup API saying everything succeeded. You only later get a corrupt error after it is too late.

IMHO this is a deficiency of the backup API, and I currently see no solution, except documenting what works and what not.

rogerbinns · 2024-06-01T13:54:57Z

You are producing a modified version of SQLite, so can't you detect this via code modification and return errors?

Are you able to tell from within the VFS code if things like reserved bytes are correctly configured? Perhaps you could add a file control so the VFS level can be queried to check.

except documenting what works and what not

It would be desirable to have a code snippet to run to confirm the backup is ok. vacuum may be sufficient.

utelle · 2024-06-01T14:35:07Z

You are producing a modified version of SQLite, so can't you detect this via code modification and return errors?

In principal, you are right. But it means to apply yet another patch to SQLite's code.

The SQLCipher devs prohibit use of the backup API unless source and target database are both plain databases or both encrypted.

I can introduce a similar check, so that sqlite3_backup_init returns NULL if source and target database are incompatible.

Are you able to tell from within the VFS code if things like reserved bytes are correctly configured? Perhaps you could add a file control so the VFS level can be queried to check.

If the SQLite3MC VFS is used and a file is encrypted, it is possible to find out the number of reserved bytes (and other attributes of the cipher scheme).

That is, it would be possible to provide the necessary information to an application. However, a user could then still start a backup operation, even if source and target database do not match.

Therefore I think it is better to intercept the start of the backup operation in sqlite3_backup_init.

except documenting what works and what not

It would be desirable to have a code snippet to run to confirm the backup is ok. vacuum may be sufficient.

vacuum into will always work. This can be used for example to convert a database from one cipher scheme to another.

rogerbinns · 2024-06-01T14:50:00Z

I can introduce a similar check, so that sqlite3_backup_init returns NULL if source and target database are incompatible.

That is by far the most desirable, and how I would expect things to work.

vacuum into will always work.

What I meant was a check if backup did not return an error which is what caused this issue in the first place. It resulted in a silently unusable database. Running vacuum on the result database looked like it would detect the problem and make a silent failure become an explicit failure, unless there is a better way?

It is probably worth having a separate Tips or similar page in the doc that suggests using vacuum into instead of backup since it will always work, will occupy the least amount of space, can change cipher etc.

You could also move the text relevant to #159 into there instead of duplicating it in both the C API and pragma sections. The #153 stuff is probably appropriate too.

utelle · 2024-06-01T20:33:22Z

I can introduce a similar check, so that sqlite3_backup_init returns NULL if source and target database are incompatible.

That is by far the most desirable, and how I would expect things to work.

I have added a patch to perform this check. Commit utelle/apsw-sqlite3mc@b456b80 updates sqlite3.c to contain this patch. Please give it a try to check whether it is working as expected.

vacuum into will always work.

What I meant was a check if backup did not return an error which is what caused this issue in the first place. It resulted in a silently unusable database. Running vacuum on the result database looked like it would detect the problem and make a silent failure become an explicit failure, unless there is a better way?

The check introduced in sqlite3_backup_init should spare you from performing vacuum.

It is probably worth having a separate Tips or similar page in the doc that suggests using vacuum into instead of backup since it will always work, will occupy the least amount of space, can change cipher etc.

It certainly makes sense to add such a page.

You could also move the text relevant to #159 into there instead of duplicating it in both the C API and pragma sections. The #153 stuff is probably appropriate too.

Well, I think it is better to repeat the information in the pragma section. This saves a user from searching elsewhere. Nevertheless it can and probably should be added to a Tips page, too.

utelle/SQLite3MultipleCiphers#158

rogerbinns · 2024-06-02T15:06:36Z

Test code in apsw-sqlite3 committed.

sqlite3_backup_init is reporting memory to disk errors as expected. I am also having no trouble backing up between databases with different reserved bytes, which is unexpected - the data does validate.

This is one case where the backup works but you get CorruptError on the vacuum.

import apsw

con = apsw.Connection("testdb")
con.pragma("cipher", "rc4")
con.pragma("key", "hello")
con.execute("create table x(y); insert into x values(randomblob(656536))")


con2 = apsw.Connection("testdb2")
con2.pragma("cipher", "aes128cbc")
con2.pragma("legacy", 1)
con2.pragma("legacy_page_size", 16384)
con2.pragma("key", "world")
con2.execute("create table x(y); insert into x values(randomblob(656536))")

with con2.backup("main", con, "main") as b:
    b.step()

con2.execute("vacuum")

- #158: add check to verify compatibility of source and target database in backup operation - #160: fix accessing memory out of array bounds - #162: fix loading/storing misaligned data

utelle · 2024-06-03T07:53:04Z

Test code in apsw-sqlite3 committed.

sqlite3_backup_init is reporting memory to disk errors as expected. I am also having no trouble backing up between databases with different reserved bytes, which is unexpected - the data does validate.

Yes, indeed, it is unexpected that a backup with different numbers of reserved bytes succeeds. It could be that it works if the cipher scheme of the target database requires less reserved bytes per page than the source database. But this is not officially supported.

In sqlite3_backup_init it is checked whether the number of reserved bytes per page match. Therefore it is surprising, if no error is reported, if the numbers do not match.

This is one case where the backup works but you get CorruptError on the vacuum.

import apsw

con = apsw.Connection("testdb")
con.pragma("cipher", "rc4")
con.pragma("key", "hello")
con.execute("create table x(y); insert into x values(randomblob(656536))")


con2 = apsw.Connection("testdb2")
con2.pragma("cipher", "aes128cbc")
con2.pragma("legacy", 1)
con2.pragma("legacy_page_size", 16384)
con2.pragma("key", "world")
con2.execute("create table x(y); insert into x values(randomblob(656536))")

with con2.backup("main", con, "main") as b:
    b.step()

con2.execute("vacuum")

Here the page sizes are different. AFAIK SQLite backup tries to set the page size of the target database to that of the source database, but this will not work for encrypted databases.

The added check does not yet look at the page sizes. So, the check needs to be extended.

utelle · 2024-06-07T11:07:31Z

Commit 879f4c6 added a check for compatible page sizes in the backup function. Closing.

rogerbinns added a commit to utelle/apsw-sqlite3mc that referenced this issue Jun 2, 2024

Test backup error reporting

99469e9

utelle/SQLite3MultipleCiphers#158

utelle added a commit that referenced this issue Jun 3, 2024

Fix issues #158, #160, and #162

879f4c6

- #158: add check to verify compatibility of source and target database in backup operation - #160: fix accessing memory out of array bounds - #162: fix loading/storing misaligned data

utelle closed this as completed Jun 7, 2024

nasi-lemak mentioned this issue Jul 22, 2024

Calling the backup function throws an exception for encrypted databases. m4heshd/better-sqlite3-multiple-ciphers#99

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CorruptError with backup #158

CorruptError with backup #158

rogerbinns commented May 30, 2024

utelle commented May 31, 2024

rogerbinns commented May 31, 2024

utelle commented Jun 1, 2024

rogerbinns commented Jun 1, 2024

utelle commented Jun 1, 2024

rogerbinns commented Jun 1, 2024

utelle commented Jun 1, 2024

rogerbinns commented Jun 2, 2024

utelle commented Jun 3, 2024

utelle commented Jun 7, 2024

CorruptError with backup #158

CorruptError with backup #158

Comments

rogerbinns commented May 30, 2024

utelle commented May 31, 2024

rogerbinns commented May 31, 2024

utelle commented Jun 1, 2024

rogerbinns commented Jun 1, 2024

utelle commented Jun 1, 2024

rogerbinns commented Jun 1, 2024

utelle commented Jun 1, 2024

rogerbinns commented Jun 2, 2024

utelle commented Jun 3, 2024

utelle commented Jun 7, 2024