Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: access sqlite dbs from object storage #2772

Merged
merged 26 commits into from
Mar 19, 2024
Merged

Conversation

tychoish
Copy link
Contributor

I only wired this up for the function and not the
db/virtual-listing/create example, (which I think will fall out of a
refactoring: I just put the code in a weird place in the first pass.)

I want to run this through all of the CI tests, and want to check:

  • this thing I'm doing with using the tempdir's lifesycle to manage
    the cached image is correct.

  • all of the other plumbing on the sqlite side of things.

Copy link
Contributor

@vrongmeal vrongmeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. Looks good otherwise.

crates/protogen/proto/metastore/options.proto Show resolved Hide resolved
crates/datasources/src/sqlite/wrapper.rs Outdated Show resolved Hide resolved
crates/datasources/src/sqlite/mod.rs Outdated Show resolved Hide resolved
@tychoish
Copy link
Contributor Author

This is basically done, and I think we just need a test where we pull a database off of a URL.

Comment on lines +4 to +7
# Originally these operations failed because GlareDB validated that
# the table existed when the table was created in the catalog, but
# these should error at query time: they may fail to exist now but may
# exist later.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you create an external (anything) in (many? most?) sources, they attempt to connect to the provider to verify that the resource exists. This is cool, particularly for data sources where the "connect" option is relatively inexpensive, and means that we can catch some errors early (potentially,) but this check doesn't mean that the resource will still exist at query time, and everything could change (of course.)

Access a non-local sqlite database is comparatively expensive, and because we release the table providers at the end of a query, we'd (in the current formulation) download the entire database just to make sure that it existed. That seems bad, and being lazy here seems like the right answer.

(as I wrote this, I'm overcome with some doubt about this,)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, doubt resolved:

https://github.com/glaredb/glaredb/blob/tycho/sqlite-from-cloud/crates/sqlexec/src/planner/session_planner.rs#L698-L756

The SqliteAccess object is now the thing that downloads the remote files, and all of the create_external_ operations establish test connections and check that the resource exists. This is probably cheap enough to be fine (though for postgres if you were doing this often DBAs/admins would probably notice connection churn. We could do some generic object store access thing and check that there's exactly one file that matches the location, but files, credentials, paths, can all change between resource creation and first query.

The external mongodb table support doesn't create a test connection, but everything else seems to.

(Interestingly, of course, if you create-external-table-{gcs,s3,azure} for a .sqlite extension it'll test for the existence of the file and not download it.)

@tychoish tychoish requested a review from scsmithr March 19, 2024 13:01
@tychoish tychoish merged commit 589b9e3 into main Mar 19, 2024
25 checks passed
@tychoish tychoish deleted the tycho/sqlite-from-cloud branch March 19, 2024 15:03
@tychoish
Copy link
Contributor Author

resolved: #2643

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants