Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create data cube index tables in a schema. #519

Merged
merged 4 commits into from
Sep 15, 2024
Merged

Create data cube index tables in a schema. #519

merged 4 commits into from
Sep 15, 2024

Conversation

jheer
Copy link
Member

@jheer jheer commented Sep 13, 2024

This PR redesigns the data cube indexer to write index tables into a named schema (default 'mosaic'). This allows data cube index tables to be managed in a largely isolated environment and to be reused across users and sessions. Temporary index tables are no longer supported, as temporary tables can not be created within a named schema.

  • Breaking: Remove support for temporary tables for data cube index tables.
  • Breaking: Remove data cube indexer enabled method, instead use get/set properties.
  • Add data cube indexer schema get/set properties.
  • Add data cube indexer dropIndexTables() method. This method issues a query to remove the entire data cube index table schema. It should be called if base tables are updated, causing index tables to become stale and inaccurate.
  • Update Jupyter widget, remove temp_indexes property and add dataCubeSchema property.
  • Update types, method signatures in Coordinator and DataCubeIndexer.
  • Update Coordinator API documentation.

@jheer jheer requested a review from domoritz September 13, 2024 17:13
@jheer
Copy link
Member Author

jheer commented Sep 13, 2024

@domoritz Can you please spot check the changes to the Jupyter widget? Is the model state initialization (setting the default data cube index table schema) correct? Also, in this context should we prefer camelCase (dataCubeSchema) or underscore_case (data_cube_schema) for model property names?

@domoritz
Copy link
Member

domoritz commented Sep 14, 2024

It wasn't quite correct. The idea with temp_indexes was that you can set whether indexes should be created as temp indexes or not from the python side.

I updated the code to let you get and set the index schema from the widget. We should prefer data_cube_schema since the variable is accessible from Python (where users interact with it).

Screenshot 2024-09-14 at 18 39 07

Copy link
Member

@domoritz domoritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think the idea schema functionality in the python widget is for advanced use cases. Right now, we don't automatically call dropIndexTables which means indexes will not be cleaned up when the user switches where indexes are created. I think that makes sense as default behavior.

@jheer jheer merged commit 09d292c into main Sep 15, 2024
3 checks passed
@jheer jheer deleted the jh/data-cube-schema branch September 15, 2024 14:22
@willium
Copy link
Member

willium commented Sep 18, 2024

Exciting to see dropIndexTables()! thanks for adding that.

I believe the most straightforward way for MotherDuck to know that you want some table to live in the browser/WASM (rather than upstream in the warehouse) is by specifying it as a TEMP table. The alternative is mounting a local db. I wonder if this approach will have performance implications when MotherDuck is the server.

@jheer
Copy link
Member Author

jheer commented Sep 18, 2024

We decided to move away from temporary tables in part to also support persistence of index cubes across multiples users and sessions. So that is another aspect to weigh among the trade-offs and design space here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants