stackabletech · fhennig · Sep 18, 2024 · Sep 17, 2024
diff --git a/docs/modules/trino/pages/concepts.adoc b/docs/modules/trino/pages/concepts.adoc
@@ -1,11 +1,18 @@
 = Concepts
+:description: Trino connects to diverse data sources via connectors and catalogs, enabling efficient distributed queries across multiple data stores.
+:what-trino-is: https://trino.io/docs/current/overview/use-cases.html#what-trino-is
+:trino-connector: https://trino.io/docs/current/connector.html
 
 == [[connectors]]Connectors
 
-https://trino.io/docs/current/overview/use-cases.html#what-trino-is[Trino] is a tool designed to efficiently query vast amounts of data using distributed queries. It is not a database with its own storage but rather interacts with many different data stores. Trino connects to these data stores - or data sources - via https://trino.io/docs/current/connector.html[connectors].
+{what-trino-is}[Trino] is a tool designed to efficiently query vast amounts of data using distributed queries.
+It is not a database with its own storage but rather interacts with many different data stores.
+Trino connects to these data stores - or data sources - via {trino-connector}[connectors].
 Each connector enables access to a specific underlying data source such as a Hive warehouse, a PostgreSQL database or a Druid instance.
 
-A Trino cluster comprises two roles: the Coordinator, responsible for managing and monitoring work loads, and the Worker, which is responsible for executing specific tasks that together make up a work load. The workers fetch data from the connectors, execute tasks and share intermediate results. The coordinator collects and consolidates these results for the end-user.
+A Trino cluster comprises two roles: the Coordinator, responsible for managing and monitoring work loads, and the Worker, which is responsible for executing specific tasks that together make up a work load.
+The workers fetch data from the connectors, execute tasks and share intermediate results.
+The coordinator collects and consolidates these results for the end-user.
 
 == [[catalogs]]Catalogs
 
@@ -24,9 +31,12 @@ Currently, the following connectors are supported:
 
 == Catalog references
 
-Within Stackable a `TrinoCatalog` consists of one or more (mandatory or optional) components which are specific to that catalog. A catalog should be re-usable within multiple Trino clusters. Catalogs are referenced by Trino clusters with labels and label selectors: this is consistent with the Kubernetes paradigm and keeps the definitions simple and flexible.
+Within Stackable a `TrinoCatalog` consists of one or more (mandatory or optional) components which are specific to that catalog.
+A catalog should be re-usable within multiple Trino clusters.
+Catalogs are referenced by Trino clusters with labels and label selectors: this is consistent with the Kubernetes paradigm and keeps the definitions simple and flexible.
 
-The following diagram illustrates this. Two Trino catalogs - each an instance of a particular connector - are declared with labels that used to match them to a Trino cluster:
+The following diagram illustrates this.
+Two Trino catalogs - each an instance of a particular connector - are declared with labels that used to match them to a Trino cluster:
 
 image::catalogs.drawio.svg[A TrinoCluster referencing two catalogs by label matching]
 

diff --git a/docs/modules/trino/pages/getting_started/first_steps.adoc b/docs/modules/trino/pages/getting_started/first_steps.adoc
@@ -1,10 +1,13 @@
 = First steps
+:description: Deploy and verify a Trino cluster with Stackable Operator. Access via CLI or web interface, and clean up after testing.
 
-After going through the xref:getting_started/installation.adoc[] section and having installed all the operators, you will now deploy a Trino cluster and the required dependencies. Afterwards you can <<_verify_that_it_works, verify that it works>> by running some queries against Trino or visit the Trino web interface.
+After going through the xref:getting_started/installation.adoc[] section and having installed all the operators, you will now deploy a Trino cluster and the required dependencies.
+Afterwards you can <<_verify_that_it_works, verify that it works>> by running some queries against Trino or visit the Trino web interface.
 
 == Setup Trino
 
-A working Trino cluster and its web interface require only the commons, secret and listener operators to work. Simple tests are possible without an external data source (e.g. PostgreSQL, Hive or S3), as internal data can be used.
+A working Trino cluster and its web interface require only the commons, secret and listener operators to work.
+Simple tests are possible without an external data source (e.g. PostgreSQL, Hive or S3), as internal data can be used.
 
 Create a file named `trino.yaml` with the following content:
 
@@ -54,7 +57,9 @@
 
 === Access the Trino cluster via CLI tool
 
-We use the https://trino.io/download.html[Trino CLI tool] to access the Trino cluster. This link points to the latest Trino version. In this guide we keep Trino cluster and client versions in sync and download the CLI tool from the https://repo.stackable.tech/[Stackable repository]:
+We use the https://trino.io/download.html[Trino CLI tool] to access the Trino cluster.
+This link points to the latest Trino version.
+In this guide we keep Trino cluster and client versions in sync and download the CLI tool from the https://repo.stackable.tech/[Stackable repository]:
 
 [source,bash]
 ----
@@ -100,9 +105,12 @@
 
 === Access the Trino web interface
 
-With the port-forward still active, you can connect to the Trino web interface. Enter `https://localhost:8443/ui` in your browser and login with the username `admin`. Since no authentication is enabled you do not need to enter a password.
+With the port-forward still active, you can connect to the Trino web interface.
+Enter `https://localhost:8443/ui` in your browser and login with the username `admin`.
+Since no authentication is enabled you do not need to enter a password.
 
-WARNING: Your browser will probably show a security risk warning because it does not trust the self generated TLS certificates. Just ignore that and continue.
+WARNING: Your browser will probably show a security risk warning because it does not trust the self generated TLS certificates.
+Just ignore that and continue.
 
 After logging in you should see the Trino web interface:
 

diff --git a/docs/modules/trino/pages/getting_started/index.adoc b/docs/modules/trino/pages/getting_started/index.adoc
@@ -1,6 +1,8 @@
 = Getting started
+:description: Get started with Trino on Kubernetes using the Stackable Operator. Follow steps for installation, setup, and resource recommendations.
 
-This guide will get you started with Trino using the Stackable Operator. It will guide you through the installation of the operator and its dependencies and setting up your first Trino cluster.
+This guide will get you started with Trino using the Stackable Operator.
+It will guide you through the installation of the operator and its dependencies and setting up your first Trino cluster.
 
 == Prerequisites
 

diff --git a/docs/modules/trino/pages/getting_started/installation.adoc b/docs/modules/trino/pages/getting_started/installation.adoc
@@ -1,4 +1,5 @@
 = Installation
+:description: Install the Stackable Operator for Trino using stackablectl or Helm. Includes optional setup for Hive, S3, and OPA integration.
 
 On this page you will install the Stackable Operator for Trino as well as the commons, secret and listener operator which are
 required by all Stackable Operators.
@@ -50,8 +51,8 @@ include::example$getting_started/code/getting_started.sh[tag=helm-install-operat
 
 == Optional installation steps
 
-Some Trino connectors like `hive` or `iceberg` work together with the Apache Hive metastore and S3 buckets. For these
-components extra steps are required.
+Some Trino connectors like `hive` or `iceberg` work together with the Apache Hive metastore and S3 buckets.
+For these components extra steps are required.
 
 * a Stackable Hive metastore
 * an accessible S3 bucket
@@ -70,7 +71,8 @@ Please refer to the S3 provider.
 
 === Hive operator
 
-Please refer to the xref:hive:index.adoc[Hive Operator] docs. Both Hive and Trino need the same S3 authentication.
+Please refer to the xref:hive:index.adoc[Hive Operator] docs.
+Both Hive and Trino need the same S3 authentication.
 
 === OPA operator
 

diff --git a/docs/modules/trino/pages/index.adoc b/docs/modules/trino/pages/index.adoc
@@ -1,5 +1,5 @@
 = Stackable Operator for Trino
-:description: The Stackable operator for Trino is a Kubernetes operator that can manage Trino clusters. Learn about its features, resources, dependencies and demos, and see the list of supported Trino versions.
+:description: Manage Trino clusters on Kubernetes with the Stackable operator, featuring resource management, demos, and support for custom Trino versions.
 :keywords: Stackable operator, Trino, Kubernetes, k8s, operator, data science, data exploration
 :trino: https://trino.io/
 :github: https://github.com/stackabletech/trino-operator/

diff --git a/docs/modules/trino/pages/usage-guide/configuration.adoc b/docs/modules/trino/pages/usage-guide/configuration.adoc
@@ -1,4 +1,5 @@
 = Configuration
+:description: Configure Trino clusters with properties, environment variables, and resource requests. Customize settings for performance and storage efficiently.
 
 The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).
 
@@ -8,11 +9,11 @@ IMPORTANT: Do not override port numbers. This will lead to faulty installations.
 
 For a role or role group, at the same level of `config`, you can specify `configOverrides` for:
 
-- `config.properties`
-- `node.properties`
-- `log.properties`
-- `password-authenticator.properties`
-- `security.properties`
+* `config.properties`
+* `node.properties`
+* `log.properties`
+* `password-authenticator.properties`
+* `security.properties`
 
 For a list of possible configuration properties consult the https://trino.io/docs/current/admin/properties.html[Trino Properties Reference].
 
@@ -46,9 +47,13 @@ All override property values must be strings. The properties will be passed on w
 
 === The security.properties file
 
-The `security.properties` file is used to configure JVM security properties. It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.
+The `security.properties` file is used to configure JVM security properties.
+It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.
 
-The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. As of version 414, Trino performs poorly if the positive cache is disabled. To cache resolved host names, and thus speeding up queries you can configure the TTL of entries in the positive cache like this:
+The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved.
+Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them.
+As of version 414, Trino performs poorly if the positive cache is disabled.
+To cache resolved host names, and thus speeding up queries you can configure the TTL of entries in the positive cache like this:
 
 [source,yaml]
 ----
@@ -124,7 +129,9 @@ workers:
  capacity: 3Gi
 ----
 
-In the above example, all Trino workers in the default group will store data (the location of the property `--data-dir`) on a `3Gi` volume. Additional role groups not specifying any resources will inherit the config provided on the role level (`2Gi` volume). This works the same for memory or CPU requests.
+In the above example, all Trino workers in the default group will store data (the location of the property `--data-dir`) on a `3Gi` volume.
+Additional role groups not specifying any resources will inherit the config provided on the role level (`2Gi` volume).
+This works the same for memory or CPU requests.
 
 By default, in case nothing is configured in the custom resource for a certain role group, each Pod will have a `2Gi` large local volume mount for the data location containing mainly logs.
 
@@ -168,4 +175,5 @@ spec:
  capacity: '1Gi'
 ----
 
-WARNING: The default values are _most likely_ not sufficient to run a proper cluster in production. Please adapt according to your requirements.
+WARNING: The default values are _most likely_ not sufficient to run a proper cluster in production.
+Please adapt according to your requirements.
diff --git a/docs/modules/trino/pages/usage-guide/connect_to_trino.adoc b/docs/modules/trino/pages/usage-guide/connect_to_trino.adoc
@@ -1,4 +1,5 @@
 = Connecting to Trino
+:description: Learn how to connect to Trino using trino-cli, DBeaver, or Python. Includes setup for SSL/TLS, OpenID Connect, and basic authentication.
 
 :trino-jdbc: https://trino.io/docs/current/client/jdbc.html
 :starburst-odbc: https://docs.starburst.io/data-consumer/clients/odbc.html
@@ -29,7 +30,9 @@ The `--insecure` flag ignores the server TLS certificate and is required in this
 $ java -jar ~/Downloads/trino-cli-403-executable.jar --server https://85.215.195.29:8443 --user admin --password --insecure
 ----
 
-TIP: In case you are using OpenID connect, use `--external-authentication` instead of `--password`. A browser window will be opened, which might require you to log in. Please note that you still need to pass the `--user` argument because of https://github.com/trinodb/trino/issues/11547[this Trino issue].
+TIP: In case you are using OpenID connect, use `--external-authentication` instead of `--password`.
+A browser window will be opened, which might require you to log in.
+Please note that you still need to pass the `--user` argument because of https://github.com/trinodb/trino/issues/11547[this Trino issue].
 
 == Connect with DBeaver
 
@@ -53,7 +56,8 @@ image::connect-with-dbeaver-3.png[]
 
 As the last step you can click on _Finish_ and start using the Trino connection.
 
-TIP: In case you are using OpenID connect, set the `externalAuthentication` property to `true` and don't provide and username or password. A browser window will be opened, which might require you to log in.
+TIP: In case you are using OpenID connect, set the `externalAuthentication` property to `true` and don't provide and username or password.
+A browser window will be opened, which might require you to log in.
 
 == Connect with Python
 

diff --git a/docs/modules/trino/pages/usage-guide/log_aggregation.adoc b/docs/modules/trino/pages/usage-guide/log_aggregation.adoc
@@ -1,7 +1,7 @@
 = Log aggregation
+:description: The logs can be forwarded to a Vector log aggregator by providing a discovery ConfigMap for the aggregator and by enabling the log agent
 
-The logs can be forwarded to a Vector log aggregator by providing a discovery
-ConfigMap for the aggregator and by enabling the log agent:
+The logs can be forwarded to a Vector log aggregator by providing a discovery ConfigMap for the aggregator and by enabling the log agent:
 
 [source,yaml]
 ----
@@ -19,7 +19,7 @@ spec:
  level: INFO
 ----
 
-Currently, the logs are collected only for `server.log`. Logging for `http-request.log` is disabled by default.
+Currently, the logs are collected only for `server.log`.
+Logging for `http-request.log` is disabled by default.
 
-Further information on how to configure logging, can be found in
-xref:concepts:logging.adoc[].
+Further information on how to configure logging, can be found in xref:concepts:logging.adoc[].
diff --git a/docs/modules/trino/pages/usage-guide/monitoring.adoc b/docs/modules/trino/pages/usage-guide/monitoring.adoc
@@ -1,4 +1,5 @@
 = Monitoring
+:description: The managed Trino instances are automatically configured to export Prometheus metrics.
 
-The managed Trino instances are automatically configured to export Prometheus metrics. See
-xref:operators:monitoring.adoc[] for more details.
+The managed Trino instances are automatically configured to export Prometheus metrics.
+See xref:operators:monitoring.adoc[] for more details.
diff --git a/docs/modules/trino/pages/usage-guide/query.adoc b/docs/modules/trino/pages/usage-guide/query.adoc
@@ -1,6 +1,8 @@
 = Testing Trino with Hive and S3
+:description: Test Trino with Hive and S3 by creating a schema and table for Iris data in Parquet format, then querying the dataset.
 
-Create a schema and a table for the Iris data located in S3 and query data. This assumes to have the Iris data set in the `PARQUET` format available in the S3 bucket which can be downloaded https://www.kaggle.com/gpreda/iris-dataset/version/2?select=iris.parquet[here].
+Create a schema and a table for the Iris data located in S3 and query data.
+This assumes to have the Iris data set in the `PARQUET` format available in the S3 bucket which can be downloaded https://www.kaggle.com/gpreda/iris-dataset/version/2?select=iris.parquet[here].
 
 == Create schema
 [source,sql]

diff --git a/docs/modules/trino/pages/usage-guide/s3.adoc b/docs/modules/trino/pages/usage-guide/s3.adoc
@@ -1,4 +1,5 @@
 = Connecting Trino to S3
+:description: Configure S3 connections in Trino either inline within the TrinoCatalog or via an external S3Connection resource for centralized management.
 
 You can specify S3 connection details directly inside the TrinoCatalog specification or by referring to an external S3Connection custom resource.
 This mechanism used used across the whole Stackable Data Platform, read the xref:concepts:s3.adoc[S3 concepts page] to learn more.