Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indices get corrupted if keys are too big #1

Open
JohannesMunk opened this issue May 17, 2015 · 2 comments
Open

indices get corrupted if keys are too big #1

JohannesMunk opened this issue May 17, 2015 · 2 comments

Comments

@JohannesMunk
Copy link

CREATE TABLE IF NOT EXISTS T1 (B TEXT, C TEXT);
CREATE INDEX IF NOT EXISTS T1_idx on T1 (C,B);
INSERT INTO T1 (B, C) VALUES ('A', 'tms_doc_cfg');
INSERT INTO T1 (B, C) VALUES (RANDOMBLOB(17000), 'tms_doc_cfg_element_trans_xslt');
SELECT * FROM T1 WHERE C = "tms_doc_cfg";

returns nothing.

SELECT C,C="tms_doc_cfg" FROM T1 WHERE C > "tms_doc_cfg";

returns 1 in the 2nd column on 'A' .. The index clearly fails..

If the key length is reduced, the select

CREATE TABLE IF NOT EXISTS T1 (B TEXT, C TEXT);
CREATE INDEX IF NOT EXISTS T1_idx on T1 (C,B);
INSERT INTO T1 (B, C) VALUES ('A', 'tms_doc_cfg');
INSERT INTO T1 (B, C) VALUES (RANDOMBLOB(1000), 'tms_doc_cfg_element_trans_xslt');
SELECT * FROM T1 WHERE C = "tms_doc_cfg";

correctly returns 'A', 'tms_doc_cfg'.

@hyc
Copy link
Contributor

hyc commented Jun 3, 2015

The underlying problem is a known issue - LMDB keys must be small enough to fit on a single DB page. SQLite uses complete values verbatim as its index keys, so very long values didn't work. The latest SQLightning code attempts to workaround this limit by replacing long values with a hash, to save space. The current solution is incomplete, however. SQLite expects to be able to read the original values from the index key, and since LMDB is only returning the hash the index lookup isn't what SQLite expected. This is the same reason for issue #4 . I am going to close that as a dup of this issue.

@hyc
Copy link
Contributor

hyc commented Aug 8, 2015

The hash approach is obviously a dead end. The new approach will be to adopt an OpenLDAP-style nested index. Long keys will be broken into chunks and chained together. This will handle arbitrarily long keys and will preserve the ordering as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants