From 7081d9d302aee59872b396371a4e7d1486c5290f Mon Sep 17 00:00:00 2001 From: Dani Palma Date: Fri, 6 Sep 2024 10:45:49 -0300 Subject: [PATCH] Add Dekaf integrations to docs --- .../dekaf_reading_collections_from_kafka.md | 10 +-- site/docs/guides/flowctl/create-derivation.md | 4 +- .../guides/flowctl/edit-draft-from-webapp.md | 2 +- .../guides/how_to_generate_refresh_token.md | 9 +++ .../docs/reference/Connectors/dekaf/README.md | 3 + .../Connectors/dekaf/dekaf-bytewax.md | 67 +++++++++++++++++++ .../reference/Connectors/dekaf/dekaf-imply.md | 40 +++++++++++ .../Connectors/dekaf/dekaf-materialize.md | 11 +-- .../Connectors/dekaf/dekaf-singlestore.md | 40 +++++++++++ .../Connectors/dekaf/dekaf-startree.md | 11 +-- .../Connectors/dekaf/dekaf-tinybird.md | 10 +-- 11 files changed, 172 insertions(+), 35 deletions(-) create mode 100644 site/docs/guides/how_to_generate_refresh_token.md create mode 100644 site/docs/reference/Connectors/dekaf/dekaf-bytewax.md create mode 100644 site/docs/reference/Connectors/dekaf/dekaf-imply.md create mode 100644 site/docs/reference/Connectors/dekaf/dekaf-singlestore.md diff --git a/site/docs/guides/dekaf_reading_collections_from_kafka.md b/site/docs/guides/dekaf_reading_collections_from_kafka.md index aa56d40fdb..d09371ba63 100644 --- a/site/docs/guides/dekaf_reading_collections_from_kafka.md +++ b/site/docs/guides/dekaf_reading_collections_from_kafka.md @@ -35,18 +35,14 @@ To connect to Estuary Flow via Dekaf, you need the following connection details: - **Security Protocol**: `SASL_SSL` - **SASL Mechanism**: `PLAIN` - **SASL Username**: `{}` -- **SASL Password**: Estuary Refresh Token (Generate your token in - the [Estuary Admin Dashboard](https://dashboard.estuary.dev/admin/api)) +- **SASL Password**: Estuary Refresh Token ([Generate a refresh token](/guides/how_to_generate_refresh_token) in + the dashboard) - **Schema Registry Username**: `{}` - **Schema Registry Password**: The same Estuary Refresh Token as above ## How to Connect to Dekaf -### 1. Generate an Estuary Refresh Token: - -1. Log in to the Estuary Admin Dashboard. -2. Navigate to the section where you can generate tokens. -3. Generate a new refresh token and note it down securely. +### 1. [Generate an Estuary Flow refresh token](/guides/how_to_generate_refresh_token) ### 2. Set Up Your Kafka Client diff --git a/site/docs/guides/flowctl/create-derivation.md b/site/docs/guides/flowctl/create-derivation.md index cd8637c8fd..2a261b90ae 100644 --- a/site/docs/guides/flowctl/create-derivation.md +++ b/site/docs/guides/flowctl/create-derivation.md @@ -70,7 +70,7 @@ You'll write your derivation using GitPod, a cloud development environment integ When you first connect to GitPod, you will have already authenticated Flow, but if you leave GitPod opened for too long, you may have to reauthenticate Flow. To do this: -1. Go to the [CLI-API tab of the web app](https://dashboard.estuary.dev/admin/api) and copy your access token. +1. [Generate an Estuary Flow refresh token](/guides/how_to_generate_refresh_token). 2. Run `flowctl auth token --token ` in the GitPod terminal. ::: @@ -226,7 +226,7 @@ Creating a derivation locally is largely the same as using GitPod, but has some 1. Authorize flowctl. - 1. Go to the [CLI-API tab of the web app](https://dashboard.estuary.dev/admin/api) and copy your access token. + 1. [Generate an Estuary Flow refresh token](/guides/how_to_generate_refresh_token). 2. Run `flowctl auth token --token ` in your local environment. diff --git a/site/docs/guides/flowctl/edit-draft-from-webapp.md b/site/docs/guides/flowctl/edit-draft-from-webapp.md index 999ae3df55..61f3fe60ef 100644 --- a/site/docs/guides/flowctl/edit-draft-from-webapp.md +++ b/site/docs/guides/flowctl/edit-draft-from-webapp.md @@ -35,7 +35,7 @@ Drafts aren't currently visible in the Flow web app, but you can get a list with 1. Authorize flowctl. - 1. Go to the [CLI-API tab of the web app](https://dashboard.estuary.dev/admin/api) and copy your access token. + 1. [Generate an Estuary Flow refresh token](/guides/how_to_generate_refresh_token). 2. Run `flowctl auth token --token ` diff --git a/site/docs/guides/how_to_generate_refresh_token.md b/site/docs/guides/how_to_generate_refresh_token.md new file mode 100644 index 0000000000..20cabc29a8 --- /dev/null +++ b/site/docs/guides/how_to_generate_refresh_token.md @@ -0,0 +1,9 @@ +# How to generate an Estuary Flow Refresh Token + +To generate a Refresh Token, navigate to the Admin page, then head over to the CLI-API section. + +Press the Generate token button to bing up the modal where you are able to give your token a name. +Choose a name that you will be able to use to identify which service your token is meant to give access to. + +![Export Dekaf Access Token](https://storage.googleapis.com/estuary-marketing-strapi-uploads/uploads//Group_22_95a85083d4/Group_22_95a85083d4.png) + diff --git a/site/docs/reference/Connectors/dekaf/README.md b/site/docs/reference/Connectors/dekaf/README.md index e45fd8a5a2..15914af5bf 100644 --- a/site/docs/reference/Connectors/dekaf/README.md +++ b/site/docs/reference/Connectors/dekaf/README.md @@ -9,3 +9,6 @@ functionality enables integrations with the Kafka ecosystem. - [Tinybird](/reference/Connectors/dekaf/dekaf-tinybird) - [Materialize](/reference/Connectors/dekaf/dekaf-materialize) - [StarTree](/reference/Connectors/dekaf/dekaf-startree) +- [SingleStore](/reference/Connectors/dekaf/dekaf-singlestore) +- [Imply](/reference/Connectors/dekaf/dekaf-imply) +- [Bytewax](/reference/Connectors/dekaf/dekaf-bytewax) \ No newline at end of file diff --git a/site/docs/reference/Connectors/dekaf/dekaf-bytewax.md b/site/docs/reference/Connectors/dekaf/dekaf-bytewax.md new file mode 100644 index 0000000000..422511e50d --- /dev/null +++ b/site/docs/reference/Connectors/dekaf/dekaf-bytewax.md @@ -0,0 +1,67 @@ +# Bytewax + +This guide demonstrates how to use Estuary Flow to stream data to Bytewax using the Kafka-compatible Dekaf API. + +[Bytewax](https://bytewax.io/) is a Python framework for building scalable dataflow applications, designed for +high-throughput, low-latency data processing tasks. + +## Connecting Estuary Flow to Bytewax + +1. [Generate a refresh token](/guides/how_to_generate_refresh_token) for the Bytewax connection from the Estuary Admin + Dashboard. + +2. Install Bytewax and the Kafka Python client: + + ``` + pip install bytewax kafka-python + ``` + +3. Create a Python script for your Bytewax dataflow, using the following template: + + ```python + import json + from datetime import timedelta + from bytewax.dataflow import Dataflow + from bytewax.inputs import KafkaInputConfig + from bytewax.outputs import StdOutputConfig + from bytewax.window import TumblingWindowConfig, SystemClockConfig + + # Estuary Flow Dekaf configuration + KAFKA_BOOTSTRAP_SERVERS = "dekaf.estuary.dev:9092" + KAFKA_TOPIC = "/full/nameof/your/collection" + + # Parse incoming messages + def parse_message(msg): + data = json.loads(msg) + # Process your data here + return data + + # Define your dataflow + src = KafkaSource(brokers=KAFKA_BOOTSTRAP_SERVERS, topics=[KAFKA_TOPIC], add_config={ + "security.protocol": "SASL_SSL", + "sasl.mechanism": "PLAIN", + "sasl.username": "{}", + "sasl.password": os.getenv("DEKAF_TOKEN"), + }) + + flow = Dataflow() + flow.input("input", src) + flow.input("input", KafkaInputConfig(KAFKA_BOOTSTRAP_SERVERS, KAFKA_TOPIC)) + flow.map(parse_message) + # Add more processing steps as needed + flow.output("output", StdOutputConfig()) + + if __name__ == "__main__": + from bytewax.execution import run_main + run_main(flow) + ``` + +4. Replace `"/full/nameof/your/collection"` with your actual collection name from Estuary Flow. + +5. Run your Bytewax dataflow: + + ``` + python your_dataflow_script.py + ``` + +6. Your Bytewax dataflow is now processing data from Estuary Flow in real-time. diff --git a/site/docs/reference/Connectors/dekaf/dekaf-imply.md b/site/docs/reference/Connectors/dekaf/dekaf-imply.md new file mode 100644 index 0000000000..626f09da91 --- /dev/null +++ b/site/docs/reference/Connectors/dekaf/dekaf-imply.md @@ -0,0 +1,40 @@ +# Imply Polaris + +This guide demonstrates how to use Estuary Flow to stream data to Imply Polaris using the Kafka-compatible Dekaf API. + +[Imply Polaris](https://imply.io/polaris) is a fully managed, cloud-native Database-as-a-Service (DBaaS) built on Apache +Druid, designed for real-time analytics on streaming and batch data. + +## Connecting Estuary Flow to Imply Polaris + +1. [Generate a refresh token](/guides/how_to_generate_refresh_token) for the Imply Polaris connection from the Estuary + Admin Dashboard. + +2. Log in to your Imply Polaris account and navigate to your project. + +3. In the left sidebar, click on "Tables" and then "Create Table". + +4. Choose "Kafka" as the input source for your new table. + +5. In the Kafka configuration section, enter the following details: + + - **Bootstrap Servers**: `dekaf.estuary.dev:9092` + - **Topic**: Your Estuary Flow collection name (e.g., `/my-organization/my-collection`) + - **Security Protocol**: `SASL_SSL` + - **SASL Mechanism**: `PLAIN` + - **SASL Username**: `{}` + - **SASL Password**: `Your generated Estuary Access Token` + +6. For the "Input Format", select "avro". + +7. Configure the Schema Registry settings: + - **Schema Registry URL**: `https://dekaf.estuary.dev` + - **Schema Registry Username**: `{}` (same as SASL Username) + - **Schema Registry Password**: `The same Estuary Access Token as above` + +8. In the "Schema" section, Imply Polaris should automatically detect the schema from your Avro data. Review and adjust + the column definitions as needed. + +9. Review and finalize your table configuration, then click "Create Table". + +10. Your Imply Polaris table should now start ingesting data from Estuary Flow. diff --git a/site/docs/reference/Connectors/dekaf/dekaf-materialize.md b/site/docs/reference/Connectors/dekaf/dekaf-materialize.md index c8262f74f7..d7028651b0 100644 --- a/site/docs/reference/Connectors/dekaf/dekaf-materialize.md +++ b/site/docs/reference/Connectors/dekaf/dekaf-materialize.md @@ -5,17 +5,10 @@ In this guide, you'll learn how to use Materialize to ingest data from Estuary F [Materialize](https://materialize.com/) is an operational data warehouse for real-time analytics that uses standard SQL for defining transformations and queries. -## Prerequisites - -- An [Estuary Flow](https://dashboard.estuary.dev/register) account & collection -- A Materialize account - ## Connecting Estuary Flow to Materialize -1. **Create a new access token** to use for the Materialize connection. You can generate this token from the Estuary - Admin Dashboard. - - ![Export Dekaf Access Token](https://storage.googleapis.com/estuary-marketing-strapi-uploads/uploads//Group_22_95a85083d4/Group_22_95a85083d4.png) +1. [Generate a refresh token](/guides/how_to_generate_refresh_token) to use for the Materialize connection. You can + generate this token from the Estuary Admin Dashboard. 2. In your Materialize dashboard, use the SQL shell to create a new secret and connection using the Kafka source connector. Use the following SQL commands to configure the connection to Estuary Flow: diff --git a/site/docs/reference/Connectors/dekaf/dekaf-singlestore.md b/site/docs/reference/Connectors/dekaf/dekaf-singlestore.md new file mode 100644 index 0000000000..cad4e37baa --- /dev/null +++ b/site/docs/reference/Connectors/dekaf/dekaf-singlestore.md @@ -0,0 +1,40 @@ +# SingleStore (Cloud) + +This guide demonstrates how to use Estuary Flow to stream data to SingleStore using the Kafka-compatible Dekaf API. + +[SingleStore](https://www.singlestore.com/) is a distributed SQL database designed for data-intensive applications, +offering high performance for both transactional and analytical workloads. + +## Connecting Estuary Flow to SingleStore + +1. [Generate a refresh token](/guides/how_to_generate_refresh_token) for the SingleStore connection from the Estuary + Admin Dashboard. + +2. In the SingleStore Cloud Portal, navigate to the SQL Editor section of the Data Studio. + +3. Execute the following script to create a table and an ingestion pipeline to hydrate it. + + This example will ingest data from the demo wikipedia collection in Estuary Flow. + + ```sql + CREATE TABLE test_table (id NUMERIC, server_name VARCHAR(255), title VARCHAR(255)); + + CREATE PIPELINE test AS + LOAD DATA KAFKA "dekaf.estuary.dev:9092/demo/wikipedia/recentchange-sampled" + CONFIG '{ + "security.protocol":"SASL_SSL", + "sasl.mechanism":"PLAIN", + "sasl.username":"{}", + "broker.address.family": "v4", + "schema.registry.username": "{}", + "fetch.wait.max.ms": "2000" + }' + CREDENTIALS '{ + "sasl.password": "ESTUARY_ACCESS_TOKEN", + "schema.registry.password": "ESTUARY_ACCESS_TOKEN" + }' + INTO table test_table + FORMAT AVRO SCHEMA REGISTRY 'https://dekaf.estuary.dev' + ( id <- id, server_name <- server_name, title <- title ); + ``` +4. Your pipeline should now start ingesting data from Estuary Flow into SingleStore. diff --git a/site/docs/reference/Connectors/dekaf/dekaf-startree.md b/site/docs/reference/Connectors/dekaf/dekaf-startree.md index f2ce8f45a9..c74d90f599 100644 --- a/site/docs/reference/Connectors/dekaf/dekaf-startree.md +++ b/site/docs/reference/Connectors/dekaf/dekaf-startree.md @@ -5,17 +5,10 @@ In this guide, you'll learn how to use Estuary Flow to push data streams to Star [StarTree](https://startree.ai/) is a real-time analytics platform built on Apache Pinot, designed for performing fast, low-latency analytics on large-scale data. -## Prerequisites - -- An Estuary Flow account & collection -- A StarTree account - ## Connecting Estuary Flow to StarTree -1. **Create a new access token** to use for the StarTree connection. You can generate this token from the Estuary Admin - Dashboard. - - ![Export Dekaf Access Token](https://storage.googleapis.com/estuary-marketing-strapi-uploads/uploads//Group_22_95a85083d4/Group_22_95a85083d4.png) +1. [Generate a refresh token](/guides/how_to_generate_refresh_token) to use for the StarTree connection. You can + generate this token from the Estuary Admin Dashboard. 2. In the StarTree UI, navigate to the **Data Sources** section and choose **Add New Data Source**. diff --git a/site/docs/reference/Connectors/dekaf/dekaf-tinybird.md b/site/docs/reference/Connectors/dekaf/dekaf-tinybird.md index ddf4ad003d..3554eab643 100644 --- a/site/docs/reference/Connectors/dekaf/dekaf-tinybird.md +++ b/site/docs/reference/Connectors/dekaf/dekaf-tinybird.md @@ -4,15 +4,11 @@ In this guide, you'll learn how to use Estuary Flow to push data streams to Tiny [Tinybird](https://www.tinybird.co/) is a data platform for user-facing analytics. -## Prerequisites +## Connecting Estuary Flow to Tinybird -- An Estuary Flow account & collection -- A Tinybird account & Workspace +1. [Generate a refresh token](/guides/how_to_generate_refresh_token) to use for the Tinybird connection. You can do this + from the Estuary Admin Dashboard. -# Connecting Estuary Flow to Tinybird - -1. Create a new access token to use for the Tinybird connection. You can do this from the Estuary Admin Dashboard. - ![Export Dekaf Access Token](https://storage.googleapis.com/estuary-marketing-strapi-uploads/uploads//Group_22_95a85083d4/Group_22_95a85083d4.png) 2. In your Tinybird Workspace, create a new Data Source and use the Kafka Connector. ![Configure Estuary Flow Data Source](https://storage.googleapis.com/estuary-marketing-strapi-uploads/uploads//Screenshot_2024_08_23_at_15_16_39_35b06dad77/Screenshot_2024_08_23_at_15_16_39_35b06dad77.png)