Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add descriptions #641

Merged
merged 1 commit into from
Sep 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions docs/modules/trino/pages/concepts.adoc
Original file line number Diff line number Diff line change
@@ -1,11 +1,18 @@
= Concepts
:description: Trino connects to diverse data sources via connectors and catalogs, enabling efficient distributed queries across multiple data stores.
:what-trino-is: https://trino.io/docs/current/overview/use-cases.html#what-trino-is
:trino-connector: https://trino.io/docs/current/connector.html

== [[connectors]]Connectors

https://trino.io/docs/current/overview/use-cases.html#what-trino-is[Trino] is a tool designed to efficiently query vast amounts of data using distributed queries. It is not a database with its own storage but rather interacts with many different data stores. Trino connects to these data stores - or data sources - via https://trino.io/docs/current/connector.html[connectors].
{what-trino-is}[Trino] is a tool designed to efficiently query vast amounts of data using distributed queries.
It is not a database with its own storage but rather interacts with many different data stores.
Trino connects to these data stores - or data sources - via {trino-connector}[connectors].
Each connector enables access to a specific underlying data source such as a Hive warehouse, a PostgreSQL database or a Druid instance.

A Trino cluster comprises two roles: the Coordinator, responsible for managing and monitoring work loads, and the Worker, which is responsible for executing specific tasks that together make up a work load. The workers fetch data from the connectors, execute tasks and share intermediate results. The coordinator collects and consolidates these results for the end-user.
A Trino cluster comprises two roles: the Coordinator, responsible for managing and monitoring work loads, and the Worker, which is responsible for executing specific tasks that together make up a work load.
The workers fetch data from the connectors, execute tasks and share intermediate results.
The coordinator collects and consolidates these results for the end-user.

== [[catalogs]]Catalogs

Expand All @@ -24,9 +31,12 @@ Currently, the following connectors are supported:

== Catalog references

Within Stackable a `TrinoCatalog` consists of one or more (mandatory or optional) components which are specific to that catalog. A catalog should be re-usable within multiple Trino clusters. Catalogs are referenced by Trino clusters with labels and label selectors: this is consistent with the Kubernetes paradigm and keeps the definitions simple and flexible.
Within Stackable a `TrinoCatalog` consists of one or more (mandatory or optional) components which are specific to that catalog.
A catalog should be re-usable within multiple Trino clusters.
Catalogs are referenced by Trino clusters with labels and label selectors: this is consistent with the Kubernetes paradigm and keeps the definitions simple and flexible.

The following diagram illustrates this. Two Trino catalogs - each an instance of a particular connector - are declared with labels that used to match them to a Trino cluster:
The following diagram illustrates this.
Two Trino catalogs - each an instance of a particular connector - are declared with labels that used to match them to a Trino cluster:

image::catalogs.drawio.svg[A TrinoCluster referencing two catalogs by label matching]

Expand Down
18 changes: 13 additions & 5 deletions docs/modules/trino/pages/getting_started/first_steps.adoc
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
= First steps
:description: Deploy and verify a Trino cluster with Stackable Operator. Access via CLI or web interface, and clean up after testing.

After going through the xref:getting_started/installation.adoc[] section and having installed all the operators, you will now deploy a Trino cluster and the required dependencies. Afterwards you can <<_verify_that_it_works, verify that it works>> by running some queries against Trino or visit the Trino web interface.
After going through the xref:getting_started/installation.adoc[] section and having installed all the operators, you will now deploy a Trino cluster and the required dependencies.

Check notice on line 4 in docs/modules/trino/pages/getting_started/first_steps.adoc

View workflow job for this annotation

GitHub Actions / LanguageTool

[LanguageTool] docs/modules/trino/pages/getting_started/first_steps.adoc#L4

In American English, ‘afterward’ is the preferred variant. ‘Afterwards’ is more commonly used in British English and other dialects. (AFTERWARDS_US[1]) Suggestions: `Afterward` Rule: https://community.languagetool.org/rule/show/AFTERWARDS_US?lang=en-US&subId=1 Category: BRITISH_ENGLISH
Raw output
docs/modules/trino/pages/getting_started/first_steps.adoc:4:180: In American English, ‘afterward’ is the preferred variant. ‘Afterwards’ is more commonly used in British English and other dialects. (AFTERWARDS_US[1])
 Suggestions: `Afterward`
 Rule: https://community.languagetool.org/rule/show/AFTERWARDS_US?lang=en-US&subId=1
 Category: BRITISH_ENGLISH
Afterwards you can <<_verify_that_it_works, verify that it works>> by running some queries against Trino or visit the Trino web interface.

== Setup Trino

A working Trino cluster and its web interface require only the commons, secret and listener operators to work. Simple tests are possible without an external data source (e.g. PostgreSQL, Hive or S3), as internal data can be used.
A working Trino cluster and its web interface require only the commons, secret and listener operators to work.
Simple tests are possible without an external data source (e.g. PostgreSQL, Hive or S3), as internal data can be used.

Create a file named `trino.yaml` with the following content:

Expand Down Expand Up @@ -54,7 +57,9 @@

=== Access the Trino cluster via CLI tool

We use the https://trino.io/download.html[Trino CLI tool] to access the Trino cluster. This link points to the latest Trino version. In this guide we keep Trino cluster and client versions in sync and download the CLI tool from the https://repo.stackable.tech/[Stackable repository]:
We use the https://trino.io/download.html[Trino CLI tool] to access the Trino cluster.
This link points to the latest Trino version.
In this guide we keep Trino cluster and client versions in sync and download the CLI tool from the https://repo.stackable.tech/[Stackable repository]:

[source,bash]
----
Expand Down Expand Up @@ -100,9 +105,12 @@

=== Access the Trino web interface

With the port-forward still active, you can connect to the Trino web interface. Enter `https://localhost:8443/ui` in your browser and login with the username `admin`. Since no authentication is enabled you do not need to enter a password.
With the port-forward still active, you can connect to the Trino web interface.
Enter `https://localhost:8443/ui` in your browser and login with the username `admin`.
Since no authentication is enabled you do not need to enter a password.

WARNING: Your browser will probably show a security risk warning because it does not trust the self generated TLS certificates. Just ignore that and continue.
WARNING: Your browser will probably show a security risk warning because it does not trust the self generated TLS certificates.
Just ignore that and continue.

After logging in you should see the Trino web interface:

Expand Down
4 changes: 3 additions & 1 deletion docs/modules/trino/pages/getting_started/index.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
= Getting started
:description: Get started with Trino on Kubernetes using the Stackable Operator. Follow steps for installation, setup, and resource recommendations.

This guide will get you started with Trino using the Stackable Operator. It will guide you through the installation of the operator and its dependencies and setting up your first Trino cluster.
This guide will get you started with Trino using the Stackable Operator.
It will guide you through the installation of the operator and its dependencies and setting up your first Trino cluster.

== Prerequisites

Expand Down
8 changes: 5 additions & 3 deletions docs/modules/trino/pages/getting_started/installation.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Installation
:description: Install the Stackable Operator for Trino using stackablectl or Helm. Includes optional setup for Hive, S3, and OPA integration.

On this page you will install the Stackable Operator for Trino as well as the commons, secret and listener operator which are
required by all Stackable Operators.
Expand Down Expand Up @@ -50,8 +51,8 @@ include::example$getting_started/code/getting_started.sh[tag=helm-install-operat

== Optional installation steps

Some Trino connectors like `hive` or `iceberg` work together with the Apache Hive metastore and S3 buckets. For these
components extra steps are required.
Some Trino connectors like `hive` or `iceberg` work together with the Apache Hive metastore and S3 buckets.
For these components extra steps are required.

* a Stackable Hive metastore
* an accessible S3 bucket
Expand All @@ -70,7 +71,8 @@ Please refer to the S3 provider.

=== Hive operator

Please refer to the xref:hive:index.adoc[Hive Operator] docs. Both Hive and Trino need the same S3 authentication.
Please refer to the xref:hive:index.adoc[Hive Operator] docs.
Both Hive and Trino need the same S3 authentication.

=== OPA operator

Expand Down
2 changes: 1 addition & 1 deletion docs/modules/trino/pages/index.adoc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
= Stackable Operator for Trino
:description: The Stackable operator for Trino is a Kubernetes operator that can manage Trino clusters. Learn about its features, resources, dependencies and demos, and see the list of supported Trino versions.
:description: Manage Trino clusters on Kubernetes with the Stackable operator, featuring resource management, demos, and support for custom Trino versions.
:keywords: Stackable operator, Trino, Kubernetes, k8s, operator, data science, data exploration
:trino: https://trino.io/
:github: https://github.com/stackabletech/trino-operator/
Expand Down
26 changes: 17 additions & 9 deletions docs/modules/trino/pages/usage-guide/configuration.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Configuration
:description: Configure Trino clusters with properties, environment variables, and resource requests. Customize settings for performance and storage efficiently.

The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).

Expand All @@ -8,11 +9,11 @@ IMPORTANT: Do not override port numbers. This will lead to faulty installations.

For a role or role group, at the same level of `config`, you can specify `configOverrides` for:

- `config.properties`
- `node.properties`
- `log.properties`
- `password-authenticator.properties`
- `security.properties`
* `config.properties`
* `node.properties`
* `log.properties`
* `password-authenticator.properties`
* `security.properties`

For a list of possible configuration properties consult the https://trino.io/docs/current/admin/properties.html[Trino Properties Reference].

Expand Down Expand Up @@ -46,9 +47,13 @@ All override property values must be strings. The properties will be passed on w

=== The security.properties file

The `security.properties` file is used to configure JVM security properties. It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.
The `security.properties` file is used to configure JVM security properties.
It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.

The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. As of version 414, Trino performs poorly if the positive cache is disabled. To cache resolved host names, and thus speeding up queries you can configure the TTL of entries in the positive cache like this:
The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved.
Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them.
As of version 414, Trino performs poorly if the positive cache is disabled.
To cache resolved host names, and thus speeding up queries you can configure the TTL of entries in the positive cache like this:

[source,yaml]
----
Expand Down Expand Up @@ -124,7 +129,9 @@ workers:
capacity: 3Gi
----

In the above example, all Trino workers in the default group will store data (the location of the property `--data-dir`) on a `3Gi` volume. Additional role groups not specifying any resources will inherit the config provided on the role level (`2Gi` volume). This works the same for memory or CPU requests.
In the above example, all Trino workers in the default group will store data (the location of the property `--data-dir`) on a `3Gi` volume.
Additional role groups not specifying any resources will inherit the config provided on the role level (`2Gi` volume).
This works the same for memory or CPU requests.

By default, in case nothing is configured in the custom resource for a certain role group, each Pod will have a `2Gi` large local volume mount for the data location containing mainly logs.

Expand Down Expand Up @@ -168,4 +175,5 @@ spec:
capacity: '1Gi'
----

WARNING: The default values are _most likely_ not sufficient to run a proper cluster in production. Please adapt according to your requirements.
WARNING: The default values are _most likely_ not sufficient to run a proper cluster in production.
Please adapt according to your requirements.
8 changes: 6 additions & 2 deletions docs/modules/trino/pages/usage-guide/connect_to_trino.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Connecting to Trino
:description: Learn how to connect to Trino using trino-cli, DBeaver, or Python. Includes setup for SSL/TLS, OpenID Connect, and basic authentication.

:trino-jdbc: https://trino.io/docs/current/client/jdbc.html
:starburst-odbc: https://docs.starburst.io/data-consumer/clients/odbc.html
Expand Down Expand Up @@ -29,7 +30,9 @@ The `--insecure` flag ignores the server TLS certificate and is required in this
$ java -jar ~/Downloads/trino-cli-403-executable.jar --server https://85.215.195.29:8443 --user admin --password --insecure
----

TIP: In case you are using OpenID connect, use `--external-authentication` instead of `--password`. A browser window will be opened, which might require you to log in. Please note that you still need to pass the `--user` argument because of https://github.com/trinodb/trino/issues/11547[this Trino issue].
TIP: In case you are using OpenID connect, use `--external-authentication` instead of `--password`.
A browser window will be opened, which might require you to log in.
Please note that you still need to pass the `--user` argument because of https://github.com/trinodb/trino/issues/11547[this Trino issue].

== Connect with DBeaver

Expand All @@ -53,7 +56,8 @@ image::connect-with-dbeaver-3.png[]

As the last step you can click on _Finish_ and start using the Trino connection.

TIP: In case you are using OpenID connect, set the `externalAuthentication` property to `true` and don't provide and username or password. A browser window will be opened, which might require you to log in.
TIP: In case you are using OpenID connect, set the `externalAuthentication` property to `true` and don't provide and username or password.
A browser window will be opened, which might require you to log in.

== Connect with Python

Expand Down
10 changes: 5 additions & 5 deletions docs/modules/trino/pages/usage-guide/log_aggregation.adoc
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
= Log aggregation
:description: The logs can be forwarded to a Vector log aggregator by providing a discovery ConfigMap for the aggregator and by enabling the log agent

The logs can be forwarded to a Vector log aggregator by providing a discovery
ConfigMap for the aggregator and by enabling the log agent:
The logs can be forwarded to a Vector log aggregator by providing a discovery ConfigMap for the aggregator and by enabling the log agent:

[source,yaml]
----
Expand All @@ -19,7 +19,7 @@ spec:
level: INFO
----

Currently, the logs are collected only for `server.log`. Logging for `http-request.log` is disabled by default.
Currently, the logs are collected only for `server.log`.
Logging for `http-request.log` is disabled by default.

Further information on how to configure logging, can be found in
xref:concepts:logging.adoc[].
Further information on how to configure logging, can be found in xref:concepts:logging.adoc[].
5 changes: 3 additions & 2 deletions docs/modules/trino/pages/usage-guide/monitoring.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Monitoring
:description: The managed Trino instances are automatically configured to export Prometheus metrics.

The managed Trino instances are automatically configured to export Prometheus metrics. See
xref:operators:monitoring.adoc[] for more details.
The managed Trino instances are automatically configured to export Prometheus metrics.
See xref:operators:monitoring.adoc[] for more details.
4 changes: 3 additions & 1 deletion docs/modules/trino/pages/usage-guide/query.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
= Testing Trino with Hive and S3
:description: Test Trino with Hive and S3 by creating a schema and table for Iris data in Parquet format, then querying the dataset.

Create a schema and a table for the Iris data located in S3 and query data. This assumes to have the Iris data set in the `PARQUET` format available in the S3 bucket which can be downloaded https://www.kaggle.com/gpreda/iris-dataset/version/2?select=iris.parquet[here].
Create a schema and a table for the Iris data located in S3 and query data.
This assumes to have the Iris data set in the `PARQUET` format available in the S3 bucket which can be downloaded https://www.kaggle.com/gpreda/iris-dataset/version/2?select=iris.parquet[here].

== Create schema
[source,sql]
Expand Down
1 change: 1 addition & 0 deletions docs/modules/trino/pages/usage-guide/s3.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= Connecting Trino to S3
:description: Configure S3 connections in Trino either inline within the TrinoCatalog or via an external S3Connection resource for centralized management.

You can specify S3 connection details directly inside the TrinoCatalog specification or by referring to an external S3Connection custom resource.
This mechanism used used across the whole Stackable Data Platform, read the xref:concepts:s3.adoc[S3 concepts page] to learn more.
Expand Down
Loading
Loading