Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] First draft of PARTITION BY documentation #30485

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

bkirwi
Copy link
Contributor

@bkirwi bkirwi commented Nov 14, 2024

Motivation

A first draft of public documentation for https://github.com/MaterializeInc/database-issues/issues/7188.

Tips for reviewer

To be merged only when the feature hits private preview.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@bkirwi bkirwi force-pushed the partition-by-docs branch 2 times, most recently from 9e9442c to fb753f7 Compare November 14, 2024 21:11
@kay-kim
Copy link
Contributor

kay-kim commented Nov 19, 2024

Hi Ben -- just double-checking -- for things in DRAFT status, do we hold off reviewing until things are marked as ready for review? (Just wanted to make sure you weren't waiting).

@bkirwi
Copy link
Contributor Author

bkirwi commented Nov 19, 2024

do we hold off reviewing until things are marked as ready for review?

Yes that's right! Normally folks only review drafts on special request. Will definitely tag you in when it's ready for a look, though... hopefully soon!

@bkirwi bkirwi marked this pull request as ready for review November 21, 2024 21:20
@bkirwi bkirwi requested a review from a team as a code owner November 21, 2024 21:20
@bkirwi bkirwi requested a review from kay-kim November 21, 2024 21:20
@bkirwi
Copy link
Contributor Author

bkirwi commented Nov 21, 2024

Okay, I think this is ready for a first look?

I've left comments on some places where I'm particularly uncertain about my choices, but I'd also be happy for general feedback on "does this make sense" etc.

Copy link
Contributor

@kay-kim kay-kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ever so much for this ❤️ ❤️ ❤️ Left some suggestions. Feel free to pick and choose.

doc/user/content/transform-data/patterns/partition-by.md Outdated Show resolved Hide resolved
doc/user/content/transform-data/patterns/partition-by.md Outdated Show resolved Hide resolved
doc/user/content/transform-data/patterns/partition-by.md Outdated Show resolved Hide resolved
doc/user/content/transform-data/patterns/partition-by.md Outdated Show resolved Hide resolved
doc/user/content/transform-data/patterns/partition-by.md Outdated Show resolved Hide resolved
doc/user/content/transform-data/patterns/partition-by.md Outdated Show resolved Hide resolved
doc/user/content/transform-data/patterns/partition-by.md Outdated Show resolved Hide resolved
doc/user/content/transform-data/patterns/partition-by.md Outdated Show resolved Hide resolved
doc/user/content/transform-data/patterns/partition-by.md Outdated Show resolved Hide resolved
doc/user/content/transform-data/patterns/partition-by.md Outdated Show resolved Hide resolved
@bkirwi bkirwi force-pushed the partition-by-docs branch from 4543b22 to 9c5b076 Compare December 2, 2024 22:56
@bkirwi
Copy link
Contributor Author

bkirwi commented Dec 2, 2024

@kay-kim - Thank you for the detailed review! I've reworked the doc in line with your suggestions... where I was pretty confident in the changes I've marked those suggestions resolved, but I've left a couple open where I was less sure, in case you have any follow-on thoughts. (And feel free to unresolve anything you think I got wrong of course!)

Copy link
Contributor

@kay-kim kay-kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm mod some nits.

| Field | Value | Description |
|-------------------------------------------|---------------------| ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **ASSERT NOT NULL** _col_ident_ | `text` | The column identifier for which to create a [non-null assertion](#non-null-assertions). To specify multiple columns, use the option multiple times. |
| **PARTITION BY _columns_** | `(ident [, ident]*)` | ***Private preview.** This option has known performance or stability issues and is under active development.* The key by which Materialize should internally order this durable collection. See the [partitioning](/transform-data/patterns/partition-by/) docs for details.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering if we should also include the restrictions about the prefix and the types of columns on this page as well since it is the reference page.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried this, but it was hard to summarize in a way that fit comfortably in the table. I've added some stronger language encouraging users to follow the link to read the restrictions, though, like we have for the REFRESH option.

doc/user/content/transform-data/patterns/partition-by.md Outdated Show resolved Hide resolved
doc/user/content/transform-data/patterns/partition-by.md Outdated Show resolved Hide resolved
doc/user/content/transform-data/patterns/partition-by.md Outdated Show resolved Hide resolved
doc/user/content/transform-data/patterns/partition-by.md Outdated Show resolved Hide resolved
doc/user/content/transform-data/patterns/partition-by.md Outdated Show resolved Hide resolved
@bkirwi
Copy link
Contributor Author

bkirwi commented Dec 3, 2024

Thanks @kay-kim! Followed up on your suggestions.

I don't know if we firmly settled on the approach to the table docs... but for now I've just restructured them to match the MV docs more closely and give it an easy place to slot in, as a separate commit.

I'll leave the PR open until EOD for any last thoughts... and of course happy to iterate on this as it sits in private preview.

@kay-kim
Copy link
Contributor

kay-kim commented Dec 3, 2024

Thanks for this, @bkirwi!!!

QQ: is the feature currently in the "Private Preview" state? Wasn't sure if this draft was in prep for getting things into private preview status or if it is in Private Preview. If currently in private preview, we probably should add a mention in Releases/Previews page in the docs .

@morsapaes
Copy link
Contributor

I'll take a peep later tonight or tomorrow morning, if you don't mind, @bkirwi! I think @sthm would also have valuable feedback. We can take this from here if you just want closure and not wait around for clicking the merge button, though!

@bkirwi
Copy link
Contributor Author

bkirwi commented Dec 3, 2024

Wasn't sure if this draft was in prep for getting things into private preview status or if it is in Private Preview. If currently in private preview, we probably should add a mention in Releases/Previews page in the docs.

Great point! I probably have a couple more boxes to check before we can officially declare this Private Preview. I'll leave it unmerged until that's settled.

Copy link
Contributor

@sthm sthm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure and content makes sense. I've added a few comments.


A few types of Materialize collections are durably written to storage: [materialized views](/sql/create-materialized-view/), [tables](/sql/create-table), and [sources](/sql/create-source).

Internally, Materialize stores these durable collections in an [LSM-tree](https://en.wikipedia.org/wiki/Log-structured_merge-tree)-like structure. Each collection is made up of a set of
Copy link
Contributor

@sthm sthm Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically correct, but I wonder if we should simplify the explanation? If we just skipped the first 1 and 1/2 sencence and said something like

Data is sorted and then partitioned up into individual parts, ...

and then later

Using the PARTITION BY option, you can specify the internal ordering that Materialize will use to sort, partition, and store the data.

we would could avoid talking about some internals that don't seem relevant for this particular topic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing I worry about is questions like: "I know my query only matches a single record; if all the data is sorted, why does my query need to fetch more than one part?" Really hard to explain without the concept of "runs" here.

I'm definitely sensitive to this point - so far these are implementation details we haven't spoken much about publicly, and it makes sense to keep the description as light as possible. But I don't want to give the impression that everything is one sorted set of parts...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Would it make sense to add a tiny example at the end that helps clarify the concepts and shows how filter pushdown can skip certain runs?

doc/user/content/transform-data/patterns/partition-by.md Outdated Show resolved Hide resolved
doc/user/content/transform-data/patterns/partition-by.md Outdated Show resolved Hide resolved
@bkirwi bkirwi force-pushed the partition-by-docs branch from ca6e9a8 to 38e959d Compare December 5, 2024 19:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants