diff --git a/website/www/site/content/en/documentation/io/managed-io.md b/website/www/site/content/en/documentation/io/managed-io.md index 48cf2a28addb..8847e3704abf 100644 --- a/website/www/site/content/en/documentation/io/managed-io.md +++ b/website/www/site/content/en/documentation/io/managed-io.md @@ -59,31 +59,25 @@ and Beam SQL is invoked via the Managed API under the hood.
str)str)boolean)str)str)map[str, str])str)str)str)boolean)boolean)int32)boolean)str)str)str)map[str, str])map[str, str])list[str])str)int64)int64)list[str])int32)str)boolean)int64)int64)str)str)str)str)str)map[str, str])str)str)map[str, str])map[str, str])int32)list[str])list[str])str)str)str)map[str, str])map[str, str])list[str])str)int64)int64)list[str])int32)str)boolean)int64)int64)str)str)str)list[str])str)str)str)boolean)str)str)map[str, str])str)str)str)boolean)boolean)int32)boolean)str)str)list[str])list[str])str)str)int64)str)str)str)str)str)map[str, str])str)str)list[str])str)list[str])str)| Configuration | -Type | -Description | -|
|---|---|---|---|
| - bootstrap_servers - | -
- str
- |
- - A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. | Format: host1:port1,host2:port2,... - | -|
| - format - | -
- str
- |
- - The encoding format for the data stored in Kafka. Valid options are: RAW,JSON,AVRO,PROTO - | -|
| - topic - | -
- str
- |
- - n/a - | -|
| - file_descriptor_path - | -
- str
- |
- - The path to the Protocol Buffer File Descriptor Set file. This file is used for schema definition and message serialization. - | -|
| - message_name - | -
- str
- |
- - The name of the Protocol Buffer message to be used for schema extraction and data conversion. - | -|
| - producer_config_updates - | -
- map[str, str]
- |
- - A list of key-value pairs that act as configuration parameters for Kafka producers. Most of these configurations will not be needed, but if you need to customize your Kafka producer, you may use this. See a detailed list: https://docs.confluent.io/platform/current/installation/configuration/producer-configs.html - | -|
| - schema - | +BIGQUERY |
- str
+ kms_key (str)+ query ( str)+ row_restriction ( str)+ fields ( list[str])+ table ( str) |
- n/a
+ table (str)+ drop ( list[str])+ keep ( list[str])+ kms_key ( str)+ only ( str)+ triggering_frequency_seconds ( int64) |
| - bootstrap_servers + table |
str
|
- A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form `host1:port1,host2:port2,...` + Identifier of the Iceberg table. |
| - topic + catalog_name |
str
|
- n/a + Name of the catalog containing the table. |
| - allow_duplicates + catalog_properties |
- boolean
+ map[str, str]
|
- If the Kafka read allows duplicates. + Properties used to set up the Iceberg catalog. |
| - confluent_schema_registry_subject + config_properties |
- str
+ map[str, str]
|
- n/a + Properties passed to the Hadoop Configuration. |
| - confluent_schema_registry_url + drop |
- str
+ list[str]
|
- n/a + A subset of column names to exclude from reading. If null or empty, all columns will be read. |
| - consumer_config_updates + filter |
- map[str, str]
+ str
|
- A list of key-value pairs that act as configuration parameters for Kafka consumers. Most of these configurations will not be needed, but if you need to customize your Kafka consumer, you may use this. See a detailed list: https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html + SQL-like predicate to filter data at scan time. Example: "id > 5 AND status = 'ACTIVE'". Uses Apache Calcite syntax: https://calcite.apache.org/docs/reference.html |
| - file_descriptor_path + from_snapshot |
- str
+ int64
|
- The path to the Protocol Buffer File Descriptor Set file. This file is used for schema definition and message serialization. + Starts reading from this snapshot ID (inclusive). |
| - format + from_timestamp |
- str
+ int64
|
- The encoding format for the data stored in Kafka. Valid options are: RAW,STRING,AVRO,JSON,PROTO + Starts reading from the first snapshot (inclusive) that was created after this timestamp (in milliseconds). |
| - message_name + keep |
- str
+ list[str]
|
- The name of the Protocol Buffer message to be used for schema extraction and data conversion. + A subset of column names to read exclusively. If null or empty, all columns will be read. |
| - offset_deduplication + poll_interval_seconds |
- boolean
+ int32
|
- If the redistribute is using offset deduplication mode. + The interval at which to poll for new snapshots. Defaults to 60 seconds. |
| - redistribute_by_record_key + starting_strategy |
- boolean
+ str
|
- If the redistribute keys by the Kafka record key. + The source's starting strategy. Valid options are: "earliest" or "latest". Can be overriden by setting a starting snapshot or timestamp. Defaults to earliest for batch, and latest for streaming. |
| - redistribute_num_keys + streaming |
- int32
+ boolean
|
- The number of keys for redistributing Kafka inputs. + Enables streaming reads, where source continuously polls for snapshots forever. |
| - redistributed + to_snapshot |
- boolean
+ int64
|
- If the Kafka read should be redistributed. + Reads up to this snapshot ID (inclusive). |
| - schema + to_timestamp |
- str
+ int64
|
- The schema in which the data is encoded in the Kafka topic. For AVRO data, this is a schema defined with AVRO schema syntax (https://avro.apache.org/docs/1.10.2/spec.html#schemas). For JSON data, this is a schema defined with JSON-schema syntax (https://json-schema.org/). If a URL to Confluent Schema Registry is provided, then this field is ignored, and the schema is fetched from Confluent Schema Registry. + Reads up to the latest snapshot (inclusive) created before this timestamp (in milliseconds). |
int32
+ | - table + bootstrap_servers |
str
|
- Identifier of the Iceberg table. + A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form `host1:port1,host2:port2,...` |
| - catalog_name + topic |
str
|
- Name of the catalog containing the table. + n/a |
| - catalog_properties + allow_duplicates |
- map[str, str]
+ boolean
|
- Properties used to set up the Iceberg catalog. + If the Kafka read allows duplicates. |
| - config_properties + confluent_schema_registry_subject |
- map[str, str]
+ str
|
- Properties passed to the Hadoop Configuration. + n/a |
| - drop + confluent_schema_registry_url |
- list[str]
+ str
|
- A subset of column names to exclude from reading. If null or empty, all columns will be read. + n/a |
| - filter + consumer_config_updates |
- str
+ map[str, str]
|
- SQL-like predicate to filter data at scan time. Example: "id > 5 AND status = 'ACTIVE'". Uses Apache Calcite syntax: https://calcite.apache.org/docs/reference.html + A list of key-value pairs that act as configuration parameters for Kafka consumers. Most of these configurations will not be needed, but if you need to customize your Kafka consumer, you may use this. See a detailed list: https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html |
| - from_snapshot + file_descriptor_path |
- int64
+ str
|
- Starts reading from this snapshot ID (inclusive). + The path to the Protocol Buffer File Descriptor Set file. This file is used for schema definition and message serialization. |
| - from_timestamp + format |
- int64
+ str
|
- Starts reading from the first snapshot (inclusive) that was created after this timestamp (in milliseconds). + The encoding format for the data stored in Kafka. Valid options are: RAW,STRING,AVRO,JSON,PROTO |
| - keep + message_name |
- list[str]
+ str
|
- A subset of column names to read exclusively. If null or empty, all columns will be read. + The name of the Protocol Buffer message to be used for schema extraction and data conversion. |
| - poll_interval_seconds + offset_deduplication |
- int32
+ boolean
|
- The interval at which to poll for new snapshots. Defaults to 60 seconds. + If the redistribute is using offset deduplication mode. |
| - starting_strategy + redistribute_by_record_key |
- str
+ boolean
|
- The source's starting strategy. Valid options are: "earliest" or "latest". Can be overriden by setting a starting snapshot or timestamp. Defaults to earliest for batch, and latest for streaming. + If the redistribute keys by the Kafka record key. |
| - streaming + redistribute_num_keys |
- boolean
+ int32
|
- Enables streaming reads, where source continuously polls for snapshots forever. + The number of keys for redistributing Kafka inputs. |
| - to_snapshot + redistributed |
- int64
+ boolean
|
- Reads up to this snapshot ID (inclusive). + If the Kafka read should be redistributed. |
| - to_timestamp + schema |
- int64
+ str
|
- Reads up to the latest snapshot (inclusive) created before this timestamp (in milliseconds). + The schema in which the data is encoded in the Kafka topic. For AVRO data, this is a schema defined with AVRO schema syntax (https://avro.apache.org/docs/1.10.2/spec.html#schemas). For JSON data, this is a schema defined with JSON-schema syntax (https://json-schema.org/). If a URL to Confluent Schema Registry is provided, then this field is ignored, and the schema is fetched from Confluent Schema Registry. |
| - kms_key - | -
- str
- |
- - Use this Cloud KMS key to encrypt your data - | -
| - query - | -
- str
- |
- - The SQL query to be executed to read from the BigQuery table. - | -
| - row_restriction - | -
- str
- |
- - Read only rows that match this filter, which must be compatible with Google standard SQL. This is not supported when reading via query. - | -
| - fields - | -
- list[str]
- |
- - Read only the specified fields (columns) from a BigQuery table. Fields may not be returned in the order specified. If no value is specified, then all fields are returned. Example: "col1, col2, col3" - | -
| - table + bootstrap_servers |
str
|
- The fully-qualified name of the BigQuery table to read from. Format: [${PROJECT}:]${DATASET}.${TABLE} - | -
| Configuration | -Type | -Description | + A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. | Format: host1:port1,host2:port2,... +
|---|---|---|
| - table + format |
str
|
- The bigquery table to write to. Format: [${PROJECT}:]${DATASET}.${TABLE} + The encoding format for the data stored in Kafka. Valid options are: RAW,JSON,AVRO,PROTO |
| - drop + topic |
- list[str]
+ str
|
- A list of field names to drop from the input record before writing. Is mutually exclusive with 'keep' and 'only'. + n/a |
| - keep + file_descriptor_path |
- list[str]
+ str
|
- A list of field names to keep in the input record. All other fields are dropped before writing. Is mutually exclusive with 'drop' and 'only'. + The path to the Protocol Buffer File Descriptor Set file. This file is used for schema definition and message serialization. |
| - kms_key + message_name |
str
|
- Use this Cloud KMS key to encrypt your data + The name of the Protocol Buffer message to be used for schema extraction and data conversion. |
| - only + producer_config_updates |
- str
+ map[str, str]
|
- The name of a single record field that should be written. Is mutually exclusive with 'keep' and 'drop'. + A list of key-value pairs that act as configuration parameters for Kafka producers. Most of these configurations will not be needed, but if you need to customize your Kafka producer, you may use this. See a detailed list: https://docs.confluent.io/platform/current/installation/configuration/producer-configs.html |
| - triggering_frequency_seconds + schema |
- int64
+ str
|
- Determines how often to 'commit' progress into BigQuery. Default is every 5 seconds. + n/a |
| Configuration | +Type | +Description | +
|---|---|---|
| + jdbc_url + | +
+ str
+ |
+ + Connection URL for the JDBC sink. + | +
| + autosharding + | +
+ boolean
+ |
+ + If true, enables using a dynamically determined number of shards to write. + | +
| + batch_size + | +
+ int64
+ |
+ + n/a + | +
| + connection_init_sql + | +
+ list[str]
+ |
+ + Sets the connection init sql statements used by the Driver. Only MySQL and MariaDB support this. + | +
| + connection_properties + | +
+ str
+ |
+ + Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;". + | +
| + driver_class_name + | +
+ str
+ |
+ + Name of a Java Driver class to use to connect to the JDBC source. For example, "com.mysql.jdbc.Driver". + | +
| + driver_jars + | +
+ str
+ |
+ + Comma separated path(s) for the JDBC driver jar(s). This can be a local path or GCS (gs://) path. + | +
| + jdbc_type + | +
+ str
+ |
+ + Type of JDBC source. When specified, an appropriate default Driver will be packaged with the transform. One of mysql, postgres, oracle, or mssql. + | +
| + location + | +
+ str
+ |
+ + Name of the table to write to. + | +
| + password + | +
+ str
+ |
+ + Password for the JDBC source. + | +
| + username + | +
+ str
+ |
+ + Username for the JDBC source. + | +
| + write_statement + | +
+ str
+ |
+ + SQL query used to insert records into the JDBC sink. + | +
| - jdbc_url + kms_key |
str
|
- Connection URL for the JDBC sink. + Use this Cloud KMS key to encrypt your data |
| - autosharding + query |
- boolean
+ str
|
- If true, enables using a dynamically determined number of shards to write. + The SQL query to be executed to read from the BigQuery table. |
| - batch_size + row_restriction |
- int64
+ str
|
- n/a + Read only rows that match this filter, which must be compatible with Google standard SQL. This is not supported when reading via query. |
| - connection_init_sql + fields |
list[str]
|
- Sets the connection init sql statements used by the Driver. Only MySQL and MariaDB support this. + Read only the specified fields (columns) from a BigQuery table. Fields may not be returned in the order specified. If no value is specified, then all fields are returned. Example: "col1, col2, col3" |
| - connection_properties + table |
str
|
- Used to set connection properties passed to the JDBC driver not already defined as standalone parameter (e.g. username and password can be set using parameters above accordingly). Format of the string must be "key1=value1;key2=value2;". + The fully-qualified name of the BigQuery table to read from. Format: [${PROJECT}:]${DATASET}.${TABLE} |
| - driver_class_name - | -
- str
- |
- - Name of a Java Driver class to use to connect to the JDBC source. For example, "com.mysql.jdbc.Driver". - | +Configuration | +Type | +Description |
|---|---|---|---|---|---|
| - driver_jars + table |
str
|
- Comma separated path(s) for the JDBC driver jar(s). This can be a local path or GCS (gs://) path. + The bigquery table to write to. Format: [${PROJECT}:]${DATASET}.${TABLE} | |||
| - jdbc_type + drop |
- str
+ list[str]
|
- Type of JDBC source. When specified, an appropriate default Driver will be packaged with the transform. One of mysql, postgres, oracle, or mssql. + A list of field names to drop from the input record before writing. Is mutually exclusive with 'keep' and 'only'. | |||
| - location + keep |
- str
+ list[str]
|
- Name of the table to write to. + A list of field names to keep in the input record. All other fields are dropped before writing. Is mutually exclusive with 'drop' and 'only'. | |||
| - password + kms_key |
str
|
- Password for the JDBC source. + Use this Cloud KMS key to encrypt your data | |||
| - username + only |
str
|
- Username for the JDBC source. + The name of a single record field that should be written. Is mutually exclusive with 'keep' and 'drop'. | |||
| - write_statement + triggering_frequency_seconds |
- str
+ int64
|
- SQL query used to insert records into the JDBC sink. + Determines how often to 'commit' progress into BigQuery. Default is every 5 seconds. |