Skip to content

Commit

Permalink
Add Dekaf integrations to docs (#1614)
Browse files Browse the repository at this point in the history
  • Loading branch information
danthelion authored Sep 6, 2024
1 parent fd860a8 commit 5d3502e
Show file tree
Hide file tree
Showing 11 changed files with 172 additions and 35 deletions.
10 changes: 3 additions & 7 deletions site/docs/guides/dekaf_reading_collections_from_kafka.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,18 +35,14 @@ To connect to Estuary Flow via Dekaf, you need the following connection details:
- **Security Protocol**: `SASL_SSL`
- **SASL Mechanism**: `PLAIN`
- **SASL Username**: `{}`
- **SASL Password**: Estuary Refresh Token (Generate your token in
the [Estuary Admin Dashboard](https://dashboard.estuary.dev/admin/api))
- **SASL Password**: Estuary Refresh Token ([Generate a refresh token](/guides/how_to_generate_refresh_token) in
the dashboard)
- **Schema Registry Username**: `{}`
- **Schema Registry Password**: The same Estuary Refresh Token as above

## How to Connect to Dekaf

### 1. Generate an Estuary Refresh Token:

1. Log in to the Estuary Admin Dashboard.
2. Navigate to the section where you can generate tokens.
3. Generate a new refresh token and note it down securely.
### 1. [Generate an Estuary Flow refresh token](/guides/how_to_generate_refresh_token)

### 2. Set Up Your Kafka Client

Expand Down
4 changes: 2 additions & 2 deletions site/docs/guides/flowctl/create-derivation.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ You'll write your derivation using GitPod, a cloud development environment integ

When you first connect to GitPod, you will have already authenticated Flow, but if you leave GitPod opened for too long, you may have to reauthenticate Flow. To do this:

1. Go to the [CLI-API tab of the web app](https://dashboard.estuary.dev/admin/api) and copy your access token.
1. [Generate an Estuary Flow refresh token](/guides/how_to_generate_refresh_token).

2. Run `flowctl auth token --token <paste-token-here>` in the GitPod terminal.
:::
Expand Down Expand Up @@ -226,7 +226,7 @@ Creating a derivation locally is largely the same as using GitPod, but has some

1. Authorize flowctl.

1. Go to the [CLI-API tab of the web app](https://dashboard.estuary.dev/admin/api) and copy your access token.
1. [Generate an Estuary Flow refresh token](/guides/how_to_generate_refresh_token).

2. Run `flowctl auth token --token <paste-token-here>` in your local environment.

Expand Down
2 changes: 1 addition & 1 deletion site/docs/guides/flowctl/edit-draft-from-webapp.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ Drafts aren't currently visible in the Flow web app, but you can get a list with

1. Authorize flowctl.

1. Go to the [CLI-API tab of the web app](https://dashboard.estuary.dev/admin/api) and copy your access token.
1. [Generate an Estuary Flow refresh token](/guides/how_to_generate_refresh_token).

2. Run `flowctl auth token --token <paste-token-here>`

Expand Down
9 changes: 9 additions & 0 deletions site/docs/guides/how_to_generate_refresh_token.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# How to generate an Estuary Flow Refresh Token

To generate a Refresh Token, navigate to the Admin page, then head over to the CLI-API section.

Press the Generate token button to bing up the modal where you are able to give your token a name.
Choose a name that you will be able to use to identify which service your token is meant to give access to.

![Export Dekaf Access Token](https://storage.googleapis.com/estuary-marketing-strapi-uploads/uploads//Group_22_95a85083d4/Group_22_95a85083d4.png)

3 changes: 3 additions & 0 deletions site/docs/reference/Connectors/dekaf/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,6 @@ functionality enables integrations with the Kafka ecosystem.
- [Tinybird](/reference/Connectors/dekaf/dekaf-tinybird)
- [Materialize](/reference/Connectors/dekaf/dekaf-materialize)
- [StarTree](/reference/Connectors/dekaf/dekaf-startree)
- [SingleStore](/reference/Connectors/dekaf/dekaf-singlestore)
- [Imply](/reference/Connectors/dekaf/dekaf-imply)
- [Bytewax](/reference/Connectors/dekaf/dekaf-bytewax)
67 changes: 67 additions & 0 deletions site/docs/reference/Connectors/dekaf/dekaf-bytewax.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Bytewax

This guide demonstrates how to use Estuary Flow to stream data to Bytewax using the Kafka-compatible Dekaf API.

[Bytewax](https://bytewax.io/) is a Python framework for building scalable dataflow applications, designed for
high-throughput, low-latency data processing tasks.

## Connecting Estuary Flow to Bytewax

1. [Generate a refresh token](/guides/how_to_generate_refresh_token) for the Bytewax connection from the Estuary Admin
Dashboard.

2. Install Bytewax and the Kafka Python client:

```
pip install bytewax kafka-python
```

3. Create a Python script for your Bytewax dataflow, using the following template:

```python
import json
from datetime import timedelta
from bytewax.dataflow import Dataflow
from bytewax.inputs import KafkaInputConfig
from bytewax.outputs import StdOutputConfig
from bytewax.window import TumblingWindowConfig, SystemClockConfig

# Estuary Flow Dekaf configuration
KAFKA_BOOTSTRAP_SERVERS = "dekaf.estuary.dev:9092"
KAFKA_TOPIC = "/full/nameof/your/collection"

# Parse incoming messages
def parse_message(msg):
data = json.loads(msg)
# Process your data here
return data

# Define your dataflow
src = KafkaSource(brokers=KAFKA_BOOTSTRAP_SERVERS, topics=[KAFKA_TOPIC], add_config={
"security.protocol": "SASL_SSL",
"sasl.mechanism": "PLAIN",
"sasl.username": "{}",
"sasl.password": os.getenv("DEKAF_TOKEN"),
})

flow = Dataflow()
flow.input("input", src)
flow.input("input", KafkaInputConfig(KAFKA_BOOTSTRAP_SERVERS, KAFKA_TOPIC))
flow.map(parse_message)
# Add more processing steps as needed
flow.output("output", StdOutputConfig())

if __name__ == "__main__":
from bytewax.execution import run_main
run_main(flow)
```

4. Replace `"/full/nameof/your/collection"` with your actual collection name from Estuary Flow.

5. Run your Bytewax dataflow:

```
python your_dataflow_script.py
```

6. Your Bytewax dataflow is now processing data from Estuary Flow in real-time.
40 changes: 40 additions & 0 deletions site/docs/reference/Connectors/dekaf/dekaf-imply.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Imply Polaris

This guide demonstrates how to use Estuary Flow to stream data to Imply Polaris using the Kafka-compatible Dekaf API.

[Imply Polaris](https://imply.io/polaris) is a fully managed, cloud-native Database-as-a-Service (DBaaS) built on Apache
Druid, designed for real-time analytics on streaming and batch data.

## Connecting Estuary Flow to Imply Polaris

1. [Generate a refresh token](/guides/how_to_generate_refresh_token) for the Imply Polaris connection from the Estuary
Admin Dashboard.

2. Log in to your Imply Polaris account and navigate to your project.

3. In the left sidebar, click on "Tables" and then "Create Table".

4. Choose "Kafka" as the input source for your new table.

5. In the Kafka configuration section, enter the following details:

- **Bootstrap Servers**: `dekaf.estuary.dev:9092`
- **Topic**: Your Estuary Flow collection name (e.g., `/my-organization/my-collection`)
- **Security Protocol**: `SASL_SSL`
- **SASL Mechanism**: `PLAIN`
- **SASL Username**: `{}`
- **SASL Password**: `Your generated Estuary Access Token`

6. For the "Input Format", select "avro".

7. Configure the Schema Registry settings:
- **Schema Registry URL**: `https://dekaf.estuary.dev`
- **Schema Registry Username**: `{}` (same as SASL Username)
- **Schema Registry Password**: `The same Estuary Access Token as above`

8. In the "Schema" section, Imply Polaris should automatically detect the schema from your Avro data. Review and adjust
the column definitions as needed.

9. Review and finalize your table configuration, then click "Create Table".

10. Your Imply Polaris table should now start ingesting data from Estuary Flow.
11 changes: 2 additions & 9 deletions site/docs/reference/Connectors/dekaf/dekaf-materialize.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,10 @@ In this guide, you'll learn how to use Materialize to ingest data from Estuary F
[Materialize](https://materialize.com/) is an operational data warehouse for real-time analytics that uses standard SQL
for defining transformations and queries.

## Prerequisites

- An [Estuary Flow](https://dashboard.estuary.dev/register) account & collection
- A Materialize account

## Connecting Estuary Flow to Materialize

1. **Create a new access token** to use for the Materialize connection. You can generate this token from the Estuary
Admin Dashboard.

![Export Dekaf Access Token](https://storage.googleapis.com/estuary-marketing-strapi-uploads/uploads//Group_22_95a85083d4/Group_22_95a85083d4.png)
1. [Generate a refresh token](/guides/how_to_generate_refresh_token) to use for the Materialize connection. You can
generate this token from the Estuary Admin Dashboard.

2. In your Materialize dashboard, use the SQL shell to create a new secret and connection using the Kafka source
connector. Use the following SQL commands to configure the connection to Estuary Flow:
Expand Down
40 changes: 40 additions & 0 deletions site/docs/reference/Connectors/dekaf/dekaf-singlestore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# SingleStore (Cloud)

This guide demonstrates how to use Estuary Flow to stream data to SingleStore using the Kafka-compatible Dekaf API.

[SingleStore](https://www.singlestore.com/) is a distributed SQL database designed for data-intensive applications,
offering high performance for both transactional and analytical workloads.

## Connecting Estuary Flow to SingleStore

1. [Generate a refresh token](/guides/how_to_generate_refresh_token) for the SingleStore connection from the Estuary
Admin Dashboard.

2. In the SingleStore Cloud Portal, navigate to the SQL Editor section of the Data Studio.

3. Execute the following script to create a table and an ingestion pipeline to hydrate it.

This example will ingest data from the demo wikipedia collection in Estuary Flow.

```sql
CREATE TABLE test_table (id NUMERIC, server_name VARCHAR(255), title VARCHAR(255));

CREATE PIPELINE test AS
LOAD DATA KAFKA "dekaf.estuary.dev:9092/demo/wikipedia/recentchange-sampled"
CONFIG '{
"security.protocol":"SASL_SSL",
"sasl.mechanism":"PLAIN",
"sasl.username":"{}",
"broker.address.family": "v4",
"schema.registry.username": "{}",
"fetch.wait.max.ms": "2000"
}'
CREDENTIALS '{
"sasl.password": "ESTUARY_ACCESS_TOKEN",
"schema.registry.password": "ESTUARY_ACCESS_TOKEN"
}'
INTO table test_table
FORMAT AVRO SCHEMA REGISTRY 'https://dekaf.estuary.dev'
( id <- id, server_name <- server_name, title <- title );
```
4. Your pipeline should now start ingesting data from Estuary Flow into SingleStore.
11 changes: 2 additions & 9 deletions site/docs/reference/Connectors/dekaf/dekaf-startree.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,17 +5,10 @@ In this guide, you'll learn how to use Estuary Flow to push data streams to Star
[StarTree](https://startree.ai/) is a real-time analytics platform built on Apache Pinot, designed for performing fast,
low-latency analytics on large-scale data.

## Prerequisites

- An Estuary Flow account & collection
- A StarTree account

## Connecting Estuary Flow to StarTree

1. **Create a new access token** to use for the StarTree connection. You can generate this token from the Estuary Admin
Dashboard.

![Export Dekaf Access Token](https://storage.googleapis.com/estuary-marketing-strapi-uploads/uploads//Group_22_95a85083d4/Group_22_95a85083d4.png)
1. [Generate a refresh token](/guides/how_to_generate_refresh_token) to use for the StarTree connection. You can
generate this token from the Estuary Admin Dashboard.

2. In the StarTree UI, navigate to the **Data Sources** section and choose **Add New Data Source**.

Expand Down
10 changes: 3 additions & 7 deletions site/docs/reference/Connectors/dekaf/dekaf-tinybird.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,11 @@ In this guide, you'll learn how to use Estuary Flow to push data streams to Tiny

[Tinybird](https://www.tinybird.co/) is a data platform for user-facing analytics.

## Prerequisites
## Connecting Estuary Flow to Tinybird

- An Estuary Flow account & collection
- A Tinybird account & Workspace
1. [Generate a refresh token](/guides/how_to_generate_refresh_token) to use for the Tinybird connection. You can do this
from the Estuary Admin Dashboard.

# Connecting Estuary Flow to Tinybird

1. Create a new access token to use for the Tinybird connection. You can do this from the Estuary Admin Dashboard.
![Export Dekaf Access Token](https://storage.googleapis.com/estuary-marketing-strapi-uploads/uploads//Group_22_95a85083d4/Group_22_95a85083d4.png)
2. In your Tinybird Workspace, create a new Data Source and use the Kafka Connector.
![Configure Estuary Flow Data Source](https://storage.googleapis.com/estuary-marketing-strapi-uploads/uploads//Screenshot_2024_08_23_at_15_16_39_35b06dad77/Screenshot_2024_08_23_at_15_16_39_35b06dad77.png)

Expand Down

0 comments on commit 5d3502e

Please sign in to comment.