-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
integrate cr-sqlite #147
Comments
Hi! Yeah, ext/ is the right place, although feel free to also hook it into the autotools build system, e.g. so that you can Ad 1, let's create a separate issue for the custom syntax, and we can bikeshed the exact syntax in there. /cc @penberg Ad 2, I think we can figure out the internal hooks later. It's ok for the first iteration to work with the original |
…sion link to libsql -- tursodatabase/libsql#147
…sion link to libsql -- tursodatabase/libsql#147
…sion link to libsql -- tursodatabase/libsql#147
@psarna - any docs (or just a code pointer) on how I might use the internal commit and update hooks? Also, here are some ideas on the custom syntax for defining CRDTs / CRRs. Table SyntaxEach table can be modeled as some sort of set CRDT. Grow only set, observe remove set, causal length set, etc. Given that, we could add a modified Example: CREATE [SetType] foo ( ... ); Other alternatives would be to keep CREATE TABLE foo ( ... ) AS [SetType]; -- option 1
CREATE TABLE foo ( ... ) CRDT = [SetType]; -- option 2 Where
Column SyntaxWithin a row, each column can be a specific type of CRDT. The type of CRDT chosen controls the semantics of how that given cell is merged. Some options for a column:
One idea is to add an extra type (the crdt type) to the column type. Example: CREATE TABLE foo (
id PRIMARY KEY,
views COUNTER,
content PERITEXT,
owner_id LWW INTEGER
) AS CausalLengthSet; Or generally: CREATE TABLE [table_name] (
[col_name] [CRDTType] [Type],
...
) AS [SetType]; where * Distributed Fractional Index would need to be parameterized. I can go into more detail on this one after we talk through what has been proposed so far. |
I'm curious: would it be possible to use virtual tables instead of implementing custom syntax? For example, your create virtual table foo using CausalLengthSet(
id PRIMARY KEY,
views COUNTER,
content PERITEXT,
owner_id LWW INTEGER
); The SQLite virtual table syntax is very relaxed, and you can enforce any syntax rules as you like. So your virtual table logic for Personally, I would think that an extension with virtual tables would be preferable to new syntax, but I might be missing something! |
Hey Alex. I think so but there were two problems we ran into when trying
this a ways back.
- you can’t alter a virtual table.
- creating a virtual table that just proxied a real table had a large perf
overhead on read.
For (2), maybe it’s possible for a create virtual table statement to create
a real table and not actually create the virtual one at all? In the end we
just want a real table + some triggers and metadata tables.
Solving (2) would go a long way but we’d still have some alter table
problems to overcome. Altering a crdt table requires also altering some
related tables and triggers. I currently handle this by requiring the user
to run ‘select crsql_begin_alter(‘tbl’)’ before issuing alter statements. I
don’t know of a better way to handle this other than a custom syntax or
plugging into the alter table code itself.
I’m keeping the base SQLite extension around and would like it to be as
ergonomic as possible so would love to hear any suggestions you may have on
these two problems.
…On Thu, Apr 6, 2023 at 12:30 AM Alex Garcia ***@***.***> wrote:
I'm curious: would it be possible to use virtual tables instead of
implementing custom syntax? For example, your foo example could be
represented as a virtual table like so:
create virtual table foo using CausalLengthSet(
id PRIMARY KEY,
views COUNTER,
content PERITEXT,
owner_id LWW INTEGER
);
The SQLite virtual table syntax is very relaxed, and you can enforce any
syntax rules as you like
<https://www.sqlite.org/vtab.html#:~:text=The%20format%20of,on%20the%20arguments.>.
So your virtual table logic for CausalLengthSet (or any other "virtual
table modules" that you define like GrowOnlySet or DeleteWinsSet) can
validate and enforce any CRDTType in column definitions.
Personally, I would think that an extension with virtual tables would be
preferable to new syntax, but I might be missing something!
—
Reply to this email directly, view it on GitHub
<#147 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAHWK2Z45ITNPZFHRTKOLHTW7ZBFJANCNFSM6AAAAAAV7OYVWU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I did some experimentation and it looks mostly possible. A virtual table table is still created but maybe I can do something like drop the vtab in the commit hook such that the crr vtab just becomes a new syntax for creating real tables rather than creating vtabs. Another problem is that the virtual table gets the desired table name and the real table has to get some other name :( So the vtab route might look like:
where the user must suffix their table name with It does seem better than |
Alex made a really good point here: tursodatabase/libsql#147 (comment) this'll make `crr` creation a bit more ergonomic for the base extension
This seems odd to me - do you mind sharing the setup of this virtual table when you tried it out? If your virtual table is backed by a regular SQLite table, queries should be as fast as querying the underlying table. You have to jump through a few hoops, but any That being said, if you're doing analytical queries on a virtual table, then the current implementation isn't great at those. Any
Shadow tables could be a solution to this. You'd have a virtual table named Same with triggers, you could create those in the
Is there a reason why you would want the desired table name to be a real table, and not a virtual one? Even if you have a virtual table named
Yea that's the big issue here, you can't use Do you expect that people would alter columns in their crdt-backed tables a lot? One trick you might be able to use is similar to SQLite's FTS5 special insert commands, where they reserve a special hidden column for higher-level manipulation of the virtual table, like so: INSERT INTO ft(ft, rank) VALUES('automerge', 8);
INSERT INTO ft(ft, rank) VALUES('crisismerge', 16);
INSERT INTO ft(ft) VALUES('integrity-check'); Where So you could possibly do something like: insert into foo(foo, a, b) values ('drop-column', 1, 1); Would "drop" the a and b columns. You wouldn't be retroactively change the schema of the already-existing select a, b, c from foo; -- works fine
insert into foo(foo, a, b) values ('drop-column', 1, 1); -- drops columns a and b
select a, b, c from foo; -- fails with custom error "a and b were dropped"
select c from foo; -- works fine And for future connections, you can possibly store the "new' schema in a virtual table, and on This also has the added benefit of not altering tables with a scalar Happy to provide more examples if needed! |
It was a surprise to myself and @ivertom as well. We discovered that ~75% of query time for some queries is spent in the parsing and planning phase. Issuing a query against the virtual table then re-constructing a query for the base table was a significant hit since it involved running that phase twice. Once at the virtual layer, again when querying the real table from the vtab implementation. >4x slowdown in some cases. Unfortunately I don't think we ever bothered merging the vtab changes from Iver's fork to the main repo once the perf problems surfaced. @ivertom, any chance you still have the virtual table approach you took to creating CRRs?
mainly for the perf reasons. I don't want reads slowed down at all so querying the real table, rather than going through a virtual layer first, is the optimal situation.
Yeah, I'm aware. crsql_changes is currently a virtual table that is used to read and write patch sets out of and into the database. I'm also using virtual tables to provide a fractional indexing API so users can
This is interesting. I'll need to mull it over a bit.
I hadn't thought about this angle. Another good argument for moving all mutations to be insert driven. |
Alex made a really good point here: tursodatabase/libsql#147 (comment) this'll make `crr` creation a bit more ergonomic for the base extension
I'll have a look for the old code -- but chances are it's gone. I also
created some perf tests of a noop virtual tables, to see the overhead of
querying through virtual table, and compared it to a normal query. That
part should be pretty easy to reproduce if we want to revisit performance
of virtual tables.
tor. 6. apr. 2023, 19:12 skrev Matt ***@***.***>:
… "creating a virtual table that just proxied a real table had a large perf
overhead on read." - This seems odd to me
It was a surprise to myself and @ivertom <https://github.com/ivertom> as
well. We discovered that ~75% of query time for some queries is spent in
the parsing and planning phase. Issuing a query against the virtual table
then re-constructing a query for the base table was a significant hit. 3x
slowdown in some cases.
I'll try to track down the original code. I believe we started in a
different repository then moved things b/c I can't find it in the current
commit log. @ivertom <https://github.com/ivertom>, any chance you still
have the virtual table approach you took to creating CRRs?
Is there a reason why you would want the desired table name to be a real
table
mainly for the perf reasons. I don't want reads slowed down at all so
querying the real table, rather than going through a virtual layer first,
is the optimal situation.
Even if you have a virtual table named foo and the "real" table is stored
in "foo_data", user's dont have to worry about that - they can just query
and INSERT into the foo table, and never even care about "foo_data".
Yeah, I'm aware. crsql_changes
<https://vlcn.io/cr-sqlite/api-methods/crsql_changes> is currently a
virtual table that is used to read and write patch sets into/out of the
database. I'm also using virtual tables to provide a fractional indexing
API <https://vlcn.io/cr-sqlite/api-methods/crsql_fract_as_ordered> so
users can INSERT x AFTER Y and get a correctly generated fractional index
for the insertion.
Where ft is the special hidden column (same name as the ft virtual table
that it's a part of), and 'automerge' / 'crisismerge' / 'integrity-check'
are special values that perform specific tasks to the ft virtual table.
This is interesting. I'll need to mull it over a bit.
It also plays better with tools that allow arbritary read-only queries
like Datasette.
I hadn't thought about this angle. Another good argument for moving all
mutations to be insert driven.
—
Reply to this email directly, view it on GitHub
<#147 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AWFZ5225Q23YA2EPEG5MKW3W732QHANCNFSM6AAAAAAV7OYVWU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Alex made a really good point here: tursodatabase/libsql#147 (comment) this'll make `crr` creation a bit more ergonomic for the base extension
Alex made a really good point here: tursodatabase/libsql#147 (comment) this'll make `crr` creation a bit more ergonomic for the base extension
Alex made a really good point here: tursodatabase/libsql#147 (comment) this'll make `crr` creation a bit more ergonomic for the base extension
…sion link to libsql -- tursodatabase/libsql#147
Alex made a really good point here: tursodatabase/libsql#147 (comment) this'll make `crr` creation a bit more ergonomic for the base extension
147: End to end testing r=MarinPostma a=MarinPostma This PR boostraps end-to-end testing for sqld. I haven't yet setup CI, and plan to do it in a subsequent PR. enough suffering for now, and I need to shift focus to other higher priority tasks. Co-authored-by: ad hoc <postma.marin@protonmail.com>
The process of syncing to libsql is fully automated via Well it still requires submitting PR so not quite fully. |
integrate cr-sqlite
Lmk if you'd like to track this some other way but --
Opening a high level task to track and discuss the integration.
I'm assuming we'd want to drop the extension in under the
ext
folder. If so I can start moving in that direction.A few issues were raised on the doc, however:
Should we work those out first?
The text was updated successfully, but these errors were encountered: