Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Data source cache #2164

Merged
merged 13 commits into from
Dec 5, 2023
Merged

feat: Data source cache #2164

merged 13 commits into from
Dec 5, 2023

Conversation

scsmithr
Copy link
Member

@scsmithr scsmithr commented Nov 24, 2023

Adds caching of external database table info.

> CREATE EXTERNAL DATABASE my_pg
::: FROM postgres OPTIONS (
:::   host = 'pg.demo.glaredb.com',
:::   port = '5432',
:::   user = 'demo',
:::   password = 'demo',
:::   database = 'postgres',
::: );
Database created
> select * from cache_external_database_tables();
┌────────────────────────────────┬─────────────────────────┬─────────────────────────┐
│ system_operation_name          │ started_at              │ finished_at             │
│ ──                             │ ──                      │ ──                      │
│ Utf8                           │ Date64                  │ Date64                  │
╞════════════════════════════════╪═════════════════════════╪═════════════════════════╡
│ cache_external_database_tables │ 2023-12-05T03:10:17.086 │ 2023-12-05T03:10:42.756 │
└────────────────────────────────┴─────────────────────────┴─────────────────────────┘
> select * from glare_catalog.cached_external_database_tables;
┌──────────────┬─────────────┬──────────────────┬───────────────────────┬────────────────────────────────────┐
│ database_oid │ schema_name │ table_name       │ column_name           │ data_type                          │
│           ── │ ──          │ ──               │ ──                    │ ──                                 │
│        Int32 │ Utf8        │ Utf8             │ Utf8                  │ Utf8                               │
╞══════════════╪═════════════╪══════════════════╪═══════════════════════╪════════════════════════════════════╡
│        20000 │ pg_catalog  │ pg_stat_archiver │ archived_count        │ Int64                              │
│        20000 │ pg_catalog  │ pg_stat_archiver │ last_archived_wal     │ Utf8                               │
│        20000 │ pg_catalog  │ pg_stat_archiver │ last_archived_time    │ Timestamp(Nanosecond, Some("UTC")) │
│        20000 │ pg_catalog  │ pg_stat_archiver │ failed_count          │ Int64                              │
│        20000 │ pg_catalog  │ pg_stat_archiver │ last_failed_wal       │ Utf8                               │
│        20000 │ pg_catalog  │ pg_stat_archiver │ last_failed_time      │ Timestamp(Nanosecond, Some("UTC")) │
│        20000 │ pg_catalog  │ pg_stat_archiver │ stats_reset           │ Timestamp(Nanosecond, Some("UTC")) │
│        20000 │ pg_catalog  │ pg_stat_bgwriter │ checkpoints_timed     │ Int64                              │
│        20000 │ pg_catalog  │ pg_stat_bgwriter │ checkpoints_req       │ Int64                              │
│        20000 │ pg_catalog  │ pg_stat_bgwriter │ checkpoint_write_time │ Float64                            │
│            … │ …           │ …                │ …                     │ …                                  │
│        20000 │ public      │ orders           │ o_comment             │ Utf8                               │
│        20000 │ public      │ part             │ p_partkey             │ Int32                              │
│        20000 │ public      │ part             │ p_name                │ Utf8                               │
│        20000 │ public      │ part             │ p_mfgr                │ Utf8                               │
│        20000 │ public      │ part             │ p_brand               │ Utf8                               │
│        20000 │ public      │ part             │ p_type                │ Utf8                               │
│        20000 │ public      │ part             │ p_size                │ Int32                              │
│        20000 │ public      │ part             │ p_container           │ Utf8                               │
│        20000 │ public      │ part             │ p_retailprice         │ Decimal128(38, 9)                  │
│        20000 │ public      │ part             │ p_comment             │ Utf8                               │
└──────────────┴─────────────┴──────────────────┴───────────────────────┴────────────────────────────────────┘
 131 rows (20 shown)

///
/// This means we have ~3600 OIDs to play with for builtin objects. Note that
/// once a builtin object is given a stable OID, it **must not** be changed ever
/// (unless you're the person willing to write a migration system).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which we should do in a while?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely, just want to try to delay it as much as possible to better understand what needs to happen for a system like that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1000% agree.

/// First glaredb builtin OID: 16384
/// First user object OID: 20000
///
/// This means we have ~3600 OIDs to play with for builtin objects. Note that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do they have to be sequentially grouped?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't, it's just easier to track what the next OID should be when adding a table/schema/whatever. If we have more than 16 builtin schemas (unlikely), we can just jump forward to some other available OID.

@tychoish
Copy link
Contributor

tychoish commented Dec 4, 2023

What's the relationship of this pr to #2201

@scsmithr
Copy link
Member Author

scsmithr commented Dec 4, 2023

What's the relationship of this pr to #2201

I expect Cloud to use the simple query interface to call the cache function(s) in this PR for updating the cache, but they're two independent features.

///
/// This should be focused on operations that do not require user interactions
/// (e.g. background operations like optimizing tables or running caching jobs).
pub trait SystemOperation: Sync + Send {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we ever have an unknown number of SystemOperation's, or custom operations? Considering we are dealing with an exhaustive list of operations, I'm wondering if it'd make sense to use an enum here instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, will change

impl ConstBuiltinFunction for CacheExternalDatabaseTables {
const NAME: &'static str = "cache_external_database_tables";
const DESCRIPTION: &'static str = "Cache tables from external databases.";
const EXAMPLE: &'static str = "select * from cache_external_database_tables();";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something for another day, but it'd be cool to support the CALL syntax for this.

while valid, select * from cache_external_database_tables(); seems weird to me as im selecting from an action.

CALL seems fitting for something that is just executing a function call.

CALL cache_external_database_tables();`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I originally wanted to use CALL, but sqlparser doesn't support it.

@scsmithr scsmithr marked this pull request as ready for review December 5, 2023 03:08
@scsmithr scsmithr enabled auto-merge (squash) December 5, 2023 15:05
@scsmithr scsmithr disabled auto-merge December 5, 2023 15:30
@scsmithr
Copy link
Member Author

scsmithr commented Dec 5, 2023

Force merging, CI failure due to #2168

@scsmithr scsmithr merged commit 82b9d81 into main Dec 5, 2023
8 of 9 checks passed
@scsmithr scsmithr deleted the sean/cache branch December 5, 2023 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants