-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[doc] First draft of PARTITION BY documentation #30485
base: main
Are you sure you want to change the base?
Conversation
9e9442c
to
fb753f7
Compare
Hi Ben -- just double-checking -- for things in |
Yes that's right! Normally folks only review drafts on special request. Will definitely tag you in when it's ready for a look, though... hopefully soon! |
fb753f7
to
38f5f40
Compare
Okay, I think this is ready for a first look? I've left comments on some places where I'm particularly uncertain about my choices, but I'd also be happy for general feedback on "does this make sense" etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks ever so much for this ❤️ ❤️ ❤️ Left some suggestions. Feel free to pick and choose.
4543b22
to
9c5b076
Compare
@kay-kim - Thank you for the detailed review! I've reworked the doc in line with your suggestions... where I was pretty confident in the changes I've marked those suggestions resolved, but I've left a couple open where I was less sure, in case you have any follow-on thoughts. (And feel free to unresolve anything you think I got wrong of course!) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm mod some nits.
| Field | Value | Description | | ||
|-------------------------------------------|---------------------| ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| **ASSERT NOT NULL** _col_ident_ | `text` | The column identifier for which to create a [non-null assertion](#non-null-assertions). To specify multiple columns, use the option multiple times. | | ||
| **PARTITION BY _columns_** | `(ident [, ident]*)` | ***Private preview.** This option has known performance or stability issues and is under active development.* The key by which Materialize should internally order this durable collection. See the [partitioning](/transform-data/patterns/partition-by/) docs for details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just wondering if we should also include the restrictions about the prefix and the types of columns on this page as well since it is the reference page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried this, but it was hard to summarize in a way that fit comfortably in the table. I've added some stronger language encouraging users to follow the link to read the restrictions, though, like we have for the REFRESH
option.
Thanks @kay-kim! Followed up on your suggestions. I don't know if we firmly settled on the approach to the table docs... but for now I've just restructured them to match the MV docs more closely and give it an easy place to slot in, as a separate commit. I'll leave the PR open until EOD for any last thoughts... and of course happy to iterate on this as it sits in private preview. |
Thanks for this, @bkirwi!!! QQ: is the feature currently in the "Private Preview" state? Wasn't sure if this draft was in prep for getting things into private preview status or if it is in Private Preview. If currently in private preview, we probably should add a mention in Releases/Previews page in the docs . |
Great point! I probably have a couple more boxes to check before we can officially declare this Private Preview. I'll leave it unmerged until that's settled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The structure and content makes sense. I've added a few comments.
|
||
A few types of Materialize collections are durably written to storage: [materialized views](/sql/create-materialized-view/), [tables](/sql/create-table), and [sources](/sql/create-source). | ||
|
||
Internally, Materialize stores these durable collections in an [LSM-tree](https://en.wikipedia.org/wiki/Log-structured_merge-tree)-like structure. Each collection is made up of a set of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is technically correct, but I wonder if we should simplify the explanation? If we just skipped the first 1 and 1/2 sencence and said something like
Data is sorted and then partitioned up into individual parts, ...
and then later
Using the
PARTITION BY
option, you can specify the internal ordering that Materialize will use to sort, partition, and store the data.
we would could avoid talking about some internals that don't seem relevant for this particular topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing I worry about is questions like: "I know my query only matches a single record; if all the data is sorted, why does my query need to fetch more than one part?" Really hard to explain without the concept of "runs" here.
I'm definitely sensitive to this point - so far these are implementation details we haven't spoken much about publicly, and it makes sense to keep the description as light as possible. But I don't want to give the impression that everything is one sorted set of parts...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Would it make sense to add a tiny example at the end that helps clarify the concepts and shows how filter pushdown can skip certain runs?
ca6e9a8
to
38e959d
Compare
Motivation
A first draft of public documentation for https://github.com/MaterializeInc/database-issues/issues/7188.
Tips for reviewer
To be merged only when the feature hits private preview.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.