Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Draft docs for self-hosted (WIP) #30491

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

kay-kim
Copy link
Contributor

@kay-kim kay-kim commented Nov 15, 2024

Draft. Just placing here to facilitate some specific questions as I work on this.
Added patch for general information architecture changes + updated self-hosted -> self-managed:
Since not part of the left-hand nav:

@kay-kim kay-kim requested a review from a team as a code owner November 15, 2024 00:49
@kay-kim kay-kim marked this pull request as draft November 15, 2024 00:51
@kay-kim
Copy link
Contributor Author

kay-kim commented Nov 15, 2024

Every single time I want to mark as Draft ... I fat finger to PR. drat.

1. Set up the Materialize operator Helm repository.

a. <red>TBD whether this is needed</red>. Add Helm to install charts that are
hosted in the Materialize operator Helm repository:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if we'd be having people add materialize or we expect people to just have the repo so that they can just install.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right now the helm chart is just in the materialize repo... at some point soon we'll host it on a separate endpoint. for now I think we just tell people to download the materialize repo, check out a specific tag, and add the local path

This command removes all the Kubernetes components associated with the chart and
deletes the release.

## Deploying Materialize Environments
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have no idea what is meant by "deploy a Materialize environment"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point -- we should scrub "environment" from the docs. that's what we use internally to describe an individual customer region within our cloud product, but is a pretty overloaded term

with the Helm chart / in public facing docs, I think we can just talk about "Deploying Materialize" and the "Materialize CR" (custom resource)

parameters:
- parameter: clusterd.nodeSelector
description: |
<red>Replace with description content here.</red>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need devs to:

  1. Specify which of these parameters should be user-facing.
  2. For those that are user facing, should update the descriptions.

@kay-kim kay-kim force-pushed the docs-self-hosted-draft-for-meeting branch from feea0d6 to d82a1c0 Compare November 15, 2024 01:01
title: "Materialize Kubernetes Operator"
description: ""

---
Copy link
Contributor Author

@kay-kim kay-kim Nov 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also make this an overview page.
And move the content to an Install on AWS page (and possibly an Install locally on kind -- since that's how I'm running through the steps anyhow).

But, put the PR up quickly, so that at least you can get an idea about the content.


You can configure the Materialize operator chart. For example:

- **RBAC**
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below where I have the ## Parameters section, I could section off the parameters ... so that we don't need to "For example" here.

That is, in the parameters section, I could create subsections

## Parameters

### Network policies

### Observability

### RBAC

@@ -0,0 +1,242 @@
---
title: "Materialize Kubernetes Operator"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,61 @@
---
title: "Materialize Operator Configuration"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -0,0 +1,21 @@
---
title: "Troubleshooting"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


This tutorial uses `kubectl`. To install, refer to the [`kubectl` documentationq](https://kubernetes.io/docs/tasks/tools/).

### Kubernetes Storage Configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noting that this is lacking context. we should have an explainer on why local storage is valuable (spilling to disk, operating on datasets larger than main memory, more graceful degradation rather than OOMing), and also note that this is optional (though highly recommended)

requestRollout: 22222222-2222-2222-2222-222222222222
forceRollout: 33333333-3333-3333-3333-333333333333
inPlaceRollout: false
backendSecretName: materialize-backend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is how Materialize gets the connection string to talk to its metadata database (postgres or crdb)... which makes me realize we have no mention of that whole set up here :)

noting that we'll need sections about setting up blob storage + metadata database

@pH14
Copy link
Contributor

pH14 commented Nov 15, 2024

Thanks for getting this off the ground @kay-kim !

@kay-kim kay-kim force-pushed the docs-self-hosted-draft-for-meeting branch from d82a1c0 to fa0f19e Compare November 20, 2024 01:47
kubectl apply -f misc/helm-charts/testing/minio.yaml
```

1. Install the following metrics service:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that the testing/readme.md says we need this, but this errors for me on my mac.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics-server should be optional... it's needed for some system table metrics to work, but not blocking to testing materialize

service/mzfhj38ptdjs-console NodePort 10.96.97.5 <none> 9000:30847/TCP
```

1. Forward the Materialize console service to your local machine:
Copy link
Contributor Author

@kay-kim kay-kim Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to forward on my mac.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this some more on the Cloud team, and likely will keep forwarding as the solution here, and generally let users decide what ingress strategy they want

@kay-kim kay-kim force-pushed the docs-self-hosted-draft-for-meeting branch from fa0f19e to 10a65be Compare November 20, 2024 01:56
name: materialize-backend
namespace: materialize-environment
stringData:
metadata_backend_url: "postgres:// materialize_user:materialize_pass@postgres.materialize.svc.cluster. local:5432/materialize_db?sslmode=disable"
Copy link
Contributor Author

@kay-kim kay-kim Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't sure if we think we'd have people use postgres as the metadata db (at least in the beginning).
If not ... then, we can make this more general.

(ditto for the blob storage)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this seems reasonable. we could add a comment that these params match if using the sample yamls

## Deploying Materialize

### Set up the metadata database

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you want me to stub this with our testing yaml file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add generic content here tomorrow. But wanted to push up quickly the information architecture changes along with the self-hosted -> self-managed nomenclature changes.



### Set up blob storage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you want me to stub this with our testing yaml file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine to point to our yaml files for setting up basic metadata db/blob storage

Copy link
Contributor

@pH14 pH14 Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohoh, now I understand this is on the generic page. hm... maybe we can add an explainer about what's needed for each of these (postgres or cockroach for metadata + s3-compatible blob storage)? and have the more specific install in kind, install on aws pages fill in with more detailed recommendations

apiVersion: v1
kind: Namespace
metadata:
name: materialize-environment
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we had the discussion about scrubbing "Environment" -- not sure if we actually want to do that with our namespace and such.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote we do... IMO it's worth removing the term entirely. From the discussion earlier this week, we can just reference "Materialize" or the Materialize custom resource when possible (and omit "environment"), and "Materialize instance" when we must refer to the resources associated with a specific Materialize CR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we already have a materialize namespace for the operator, is there a preference for this?

---
title: "Appendix: Install locally on kind"
description: ""
---
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apiVersion: v1
kind: Namespace
metadata:
name: materialize-environment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote we do... IMO it's worth removing the term entirely. From the discussion earlier this week, we can just reference "Materialize" or the Materialize custom resource when possible (and omit "environment"), and "Materialize instance" when we must refer to the resources associated with a specific Materialize CR.



### Set up blob storage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine to point to our yaml files for setting up basic metadata db/blob storage

name: materialize-backend
namespace: materialize-environment
stringData:
metadata_backend_url: "postgres:// materialize_user:materialize_pass@postgres.materialize.svc.cluster. local:5432/materialize_db?sslmode=disable"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this seems reasonable. we could add a comment that these params match if using the sample yamls

kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "materialize.cloud/disk=true,workload=materialize-instance"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these shouldn't be needed now

kubectl apply -f misc/helm-charts/testing/minio.yaml
```

1. Install the following metrics service:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

metrics-server should be optional... it's needed for some system table metrics to work, but not blocking to testing materialize

service/mzfhj38ptdjs-console NodePort 10.96.97.5 <none> 9000:30847/TCP
```

1. Forward the Materialize console service to your local machine:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this some more on the Cloud team, and likely will keep forwarding as the solution here, and generally let users decide what ingress strategy they want

@kay-kim kay-kim force-pushed the docs-self-hosted-draft-for-meeting branch from 929ef2a to 1571ceb Compare November 22, 2024 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants