Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] First draft of PARTITION BY documentation #30485

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

bkirwi
Copy link
Contributor

@bkirwi bkirwi commented Nov 14, 2024

Motivation

A first draft of public documentation for https://github.com/MaterializeInc/database-issues/issues/7188.

Tips for reviewer

To be merged only when the feature hits private preview.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@bkirwi bkirwi force-pushed the partition-by-docs branch 2 times, most recently from 9e9442c to fb753f7 Compare November 14, 2024 21:11
@kay-kim
Copy link
Contributor

kay-kim commented Nov 19, 2024

Hi Ben -- just double-checking -- for things in DRAFT status, do we hold off reviewing until things are marked as ready for review? (Just wanted to make sure you weren't waiting).

@bkirwi
Copy link
Contributor Author

bkirwi commented Nov 19, 2024

do we hold off reviewing until things are marked as ready for review?

Yes that's right! Normally folks only review drafts on special request. Will definitely tag you in when it's ready for a look, though... hopefully soon!


## Syntax

The option `PARTITION BY <column list>` declares that a [materialized view](/sql/create-materialized-view/#with_options), [table](/sql/create-table/#with_options), or source table should be ordered by the listed columns. For example, a table that stores an append-only collection of events may look like:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the MV docs as part of this PR as well.

I have not updated the table docs yet: those do not have a separate section for "with options" yet. I'm happy to add such a section and move the retain-history option to it if that sounds good to you.

AFAICT "create table ... from source` is not documented anywhere yet, so I have nothing to link to here. I'm open to suggestions!

Internally, Materialize stores these durable collections in an [LSM-tree](https://en.wikipedia.org/wiki/Log-structured_merge-tree)-like structure. Each collection is made up of a set of
**runs** of data, each run is sorted and then split up into individual **parts**, and those parts are written to object storage and retrieved only when necessary to satisfy a query. Materialize will also periodically **compact** the data it stores to consolidate small parts into larger ones or discard deleted rows.

Materialize lets you specify the ordering it will use to sort these runs of data internally. A well-chosen sort order can unlock optimizations like [filter pushdown](#filter-pushdown), which in turn can make queries and other operations more efficient.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the "patterns" section is where we put this sort of doc that covers features that aren't specific to a single type and involve some discussion of implementation details, but let me know if you'd like me to move it!

```mzsql
EXPLAIN FILTER PUSHDOWN FOR
SELECT * FROM events WHERE event_ts + '2 minutes' > mz_now();
```
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we'd link to EXPLAIN FILTER PUSHDOWN, but that's not documented anywhere to my knowledge yet. That may make sense as a separate PR?

@@ -0,0 +1,143 @@
---
title: "PARTITION BY"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc covers both the PARTITION BY syntax and the filter pushdown optimization, which are strictly speaking two separate things but likely to be used together I think?

@bkirwi bkirwi marked this pull request as ready for review November 21, 2024 21:20
@bkirwi bkirwi requested a review from a team as a code owner November 21, 2024 21:20
@bkirwi bkirwi requested a review from kay-kim November 21, 2024 21:20
@bkirwi
Copy link
Contributor Author

bkirwi commented Nov 21, 2024

Okay, I think this is ready for a first look?

I've left comments on some places where I'm particularly uncertain about my choices, but I'd also be happy for general feedback on "does this make sense" etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants