diff --git a/.reuse/dep5 b/.reuse/dep5 index 3a75fe3d7..43c31215a 100644 --- a/.reuse/dep5 +++ b/.reuse/dep5 @@ -48,10 +48,6 @@ Files: */pnpm-lock.yaml Copyright: 2024 The ORT Server Authors (See ) License: Apache-2.0 -Files: */README.adoc -Copyright: 2022 The ORT Server Authors (See ) -License: Apache-2.0 - Files: */README.md Copyright: 2022 The ORT Server Authors (See ) License: Apache-2.0 @@ -60,10 +56,6 @@ Files: */tsconfig.*json Copyright: 2024 The ORT Server Authors (See ) License: Apache-2.0 -Files: config/*.adoc -Copyright: 2022 The ORT Server Authors (See ) -License: Apache-2.0 - Files: config/*.txt Copyright: 2022 The ORT Server Authors (See ) License: Apache-2.0 diff --git a/config/README.adoc b/config/README.adoc deleted file mode 100644 index e425d5476..000000000 --- a/config/README.adoc +++ /dev/null @@ -1,82 +0,0 @@ -= Advanced Configuration Access - -This document describes the configuration abstraction used within the ORT server to allow access to configuration data stored externally. - -== Purpose -While simple configuration properties in ORT Server are accessed via the https://github.com/lightbend/config[Typesafe Config library], there is other configuration information which needs to be treated in a special way. Reasons for this could be that the information is confidential (e.g. credentials for accessing specific infrastructure services) and is therefore managed by a secret storage. Or the size of the configuration data is bigger, so that it is not feasible to inject it via environment variables, but an external storage is needed (maybe a version control system). - -According to the philosophy of ORT Server, it should be flexible to be integrated with different mechanisms for accessing such special configuration data. A concrete runtime environment may offer specific services that are suitable for the use cases at hand, for instance key vaults for storing credentials. By implementing the corresponding interfaces, it should be possible to integrate such mechanisms with ORT server. - -The use cases addressed by this abstraction, that go beyond simple configuration properties, are the following: - -* Loading files. Especially for the configuration and customization of workers, files are sometimes needed. Examples include rule sets for the Evaluator, template files for generating reports, or other special-purpose ORT configuration files. Since this data can affect the results produced by ORT runs, it is often desired to keep it under version control, so that changes with undesired effects can be rolled back if necessary. -* Reading secrets for infrastructure services. This is required when accessing external services from a worker, such as external advisor services or remote scanners. Note that this is not related to the secrets ORT Server manages on behalf of the users to access source code or artifact repositories. The secrets in this context are managed centrally by the administrators of ORT Server. - -== Service Provider Interfaces -This section describes the interfaces that need to be implemented in order to obtain configuration data from special sources. There are dedicated interfaces for different use cases that are discussed in their own subsections. - -=== Access to Configuration Files -One service provider interface deals with loading configuration files from an external storage. It is used for instance to read template or script files. - -==== Access Interface -For dealing with configuration files, the abstraction defines a basic interface, link:spi/src/main/kotlin/ConfigFileProvider.kt[ConfigFileProvider]. It defines operations for obtaining the content of a configuration file as a stream, for checking whether a specific file exists, and for listing the configuration files under a specific path. - -Paths to configuration files are represented by a special value class named `Path`. The interpretation of such a path is implementation-specific. Typically, it will reference some kind of relative path below a root folder. - -In order to uniquely identify a specific version of a configuration file, there is another property involved, the so-called _context_. In terms of the interface, this is another string-based value class whose meaning depends on a concrete implementation. The idea behind this property is that there could be multiple sets of configuration files that could change over time (e.g. when they are stored in a version control system) or apply to different (staging) environments. A concrete implementation can assign a suitable semantic to this string value. For instance, if configuration data is loaded from a version control system, the context could be interpreted as the revision. The interface has a `resolveContext()` function that allows transforming a given context value into a normalized or resolved form. More information about the intention of this function can be found in the <> section. - -Regarding error handling, an implementation is free to throw arbitrary exceptions. They are caught by the wrapper class and rethrown as standard `ConfigException` exceptions. - -==== Factory Interface -The factory interface for the configuration file provider abstraction is defined by the link:spi/src/main/kotlin/ConfigFileProviderFactory.kt[ConfigFileProviderFactory] interface. It works analogously to typical factory interfaces for other abstractions used within ORT Server. - -This means that instances are loaded via the _service loader_ mechanism from the classpath. The interface defines a `name` property that is used to select a specific factory. It has a `createProvider()` function to create the actual provider object based on passed in configuration. - -=== Access to Secrets -Another abstraction allows reading the values of secrets from the configuration that can be used for instance to access external systems ORT Server has to interact with. - -==== Access Interface -The service provider interface is defined by the link:spi/src/main/kotlin/ConfigSecretProvider.kt[ConfigSecretProvider] interface. It is quite simple and contains only a single function to query the value of a secret. The secret in question is identified using the `Path` value type, which is also used by other configuration abstractions. The function returns the value of the secret as a String. It should throw an exception if the requested secret does not exist or cannot be resolved. - -Other functionality, like contains checks or listing secrets, is not required. In typical use cases, the secrets to be looked up are directly referenced by their names; so it is sufficient to resolve these names and obtain the corresponding values. This also simplifies concrete implementations, which could for instance lookup the values from environment variables, from files, or from a specific storage for secrets. - -==== Factory Interface -The factory interface for creating concrete `ConfigSecretProvider` instances has exactly the same structure as the one already discussed for `ConfigFileProvider`. So, everything mentioned there applies to this interface as well. - -[#config_using] -== Using Advanced Configuration -As usual, the configuration abstraction provides a facade class that is responsible for loading the configured provider implementations and that simplifies the interactions with them. This is the link:spi/src/main/kotlin/ConfigManager.kt[ConfigManager] class. - -Instances are created via the `create()` function of the companion object. The function expects the application configuration, so that it can determine the provider implementations to load and their specific configuration settings. In addition, the `create()` function requires a `Context` object as argument. The context is stored and used for interactions with the `ConfigFileProvider` object. Hence, it is not necessary to deal with this parameter manually. - -But except for convenience, there is another reason for storing the context: it should remain constant during a whole ORT run to warrant consistency. Consider the case that configuration data is stored in a version control system. The context could then reference a branch that contains the configuration files. This branch may change, however, while an ORT run is in progress, so that a worker executed later may see a different configuration than a one started earlier. To address this issue, the `ConfigFileProvider` interface defines a `resolveContext()` function that expects a `Context` argument and returns a normalized or resolved context. When constructing a `ConfigManager` instance, a flag can be passed whether this function should be called and the resolved context should be stored. In the example of the version control system, the provider implementation could in its `resolveContext` operation replace a branch name by the corresponding commit ID to pinpoint the configuration files. To support such constellations, at the beginning of an ORT run the context should be resolved once and then stored in the database. Workers started later in the pipeline should obtain it from there. - -The interface of the `ConfigManager` class is similar to the ones of the wrapped provider interfaces with a few convenience functions. The class supports error handling by catching all the exceptions thrown by providers and wrapping them in a standard `ConfigException`. - -The configuration passed to the `create()` function must contain a section named `configManager` that at least defines the names of the supported provider implementations to be loaded. Those are specified by the following properties: - -.Mandatory properties to define config providers -[cols="1,3",options=header] -|=== -|Property -|Description - -|fileProvider -|The name of the provider factory implementation for creating the `ConfigFileProvider` instance. - -|secretProvider -|The name of the provider factory implementation for creating the `ConfigSecretProvider` instance. -|=== - -In addition, the section can then contain further, provider-specific properties. The following fragment gives an example: - -[source] ----- -configManager { - fileProvider = gitHub - fileProviderRepository = ort-server-config - fileProviderDefaultRevision = main - - secretProvider = env -} ----- diff --git a/config/README.md b/config/README.md new file mode 100644 index 000000000..fadf8c548 --- /dev/null +++ b/config/README.md @@ -0,0 +1,128 @@ +This document describes the configuration abstraction used within the ORT server to allow access to configuration data stored externally. + +## Purpose + +While simple configuration properties in ORT Server are accessed via the [Typesafe Config library](https://github.com/lightbend/config), there is other configuration information which needs to be treated in a special way. +Reasons for this could be that the information is confidential (e.g. credentials for accessing specific infrastructure services) and is therefore managed by a secret storage. +Or the size of the configuration data is bigger, so that it is not feasible to inject it via environment variables, but an external storage is needed (maybe a version control system). + +According to the philosophy of ORT Server, it should be flexible to be integrated with different mechanisms for accessing such special configuration data. +A concrete runtime environment may offer specific services that are suitable for the use cases at hand, for instance key vaults for storing credentials. +By implementing the corresponding interfaces, it should be possible to integrate such mechanisms with ORT server. + +The use cases addressed by this abstraction, that go beyond simple configuration properties, are the following: + +- **Loading files:** + Especially for the configuration and customization of workers, files are sometimes needed. + Examples include rule sets for the Evaluator, template files for generating reports, or other special-purpose ORT configuration files. + Since this data can affect the results produced by ORT runs, it is often desired to keep it under version control, so that changes with undesired effects can be rolled back if necessary. +- **Reading secrets for infrastructure services:** + This is required when accessing external services from a worker, such as external advisor services or remote scanners. + Note that this is not related to the secrets ORT Server manages on behalf of the users to access source code or artifact repositories. + The secrets in this context are managed centrally by the administrators of ORT Server. + +## Service Provider Interfaces + +This section describes the interfaces that need to be implemented in order to obtain configuration data from special sources. +There are dedicated interfaces for different use cases that are discussed in their own subsections. + +### Access to Configuration Files + +One service provider interface deals with loading configuration files from an external storage. +It is used for instance to read template or script files. + +#### Access Interface + +For dealing with configuration files, the abstraction defines a basic interface, [ConfigFileProvider](spi/src/main/kotlin/ConfigFileProvider.kt). +It defines operations for obtaining the content of a configuration file as a stream, for checking whether a specific file exists, and for listing the configuration files under a specific path. + +Paths to configuration files are represented by a special value class named `Path`. +The interpretation of such a path is implementation-specific. +Typically, it will reference some kind of relative path below a root folder. + +In order to uniquely identify a specific version of a configuration file, there is another property involved, the so-called *context*. +In terms of the interface, this is another string-based value class whose meaning depends on a concrete implementation. +The idea behind this property is that there could be multiple sets of configuration files that could change over time (e.g. when they are stored in a version control system) or apply to different (staging) environments. +A concrete implementation can assign a suitable semantic to this string value. +For instance, if configuration data is loaded from a version control system, the context could be interpreted as the revision. +The interface has a `resolveContext()` function that allows transforming a given context value into a normalized or resolved form. +More information about the intention of this function can be found in the [Using Advanced Configuration](#using-advanced-configuration) section. + +Regarding error handling, an implementation is free to throw arbitrary exceptions. +They are caught by the wrapper class and rethrown as standard `ConfigException` exceptions. + +#### Factory Interface + +The factory interface for the configuration file provider abstraction is defined by the [ConfigFileProviderFactory](spi/src/main/kotlin/ConfigFileProviderFactory.kt) interface. +It works analogously to typical factory interfaces for other abstractions used within ORT Server. + +This means that instances are loaded via the *service loader* mechanism from the classpath. +The interface defines a `name` property that is used to select a specific factory. +It has a `createProvider()` function to create the actual provider object based on passed in configuration. + +### Access to Secrets + +Another abstraction allows reading the values of secrets from the configuration that can be used for instance to access external systems ORT Server has to interact with. + +#### Access Interface + +The service provider interface is defined by the [ConfigSecretProvider](spi/src/main/kotlin/ConfigSecretProvider.kt) interface. +It is quite simple and contains only a single function to query the value of a secret. +The secret in question is identified using the `Path` value type, which is also used by other configuration abstractions. +The function returns the value of the secret as a String. +It should throw an exception if the requested secret does not exist or cannot be resolved. + +Other functionality, like contains checks or listing secrets, is not required. +In typical use cases, the secrets to be looked up are directly referenced by their names; so it is sufficient to resolve these names and obtain the corresponding values. +This also simplifies concrete implementations, which could for instance lookup the values from environment variables, from files, or from a specific storage for secrets. + +#### Factory Interface + +The factory interface for creating concrete `ConfigSecretProvider` instances has exactly the same structure as the one already discussed for `ConfigFileProvider`. +So, everything mentioned there applies to this interface as well. + +## Using Advanced Configuration + +As usual, the configuration abstraction provides a facade class that is responsible for loading the configured provider implementations and that simplifies the interactions with them. +This is the [ConfigManager](spi/src/main/kotlin/ConfigManager.kt) class. + +Instances are created via the `create()` function of the companion object. +The function expects the application configuration, so that it can determine the provider implementations to load and their specific configuration settings. +In addition, the `create()` function requires a `Context` object as argument. +The context is stored and used for interactions with the `ConfigFileProvider` object. +Hence, it is not necessary to deal with this parameter manually. + +But except for convenience, there is another reason for storing the context: +it should remain constant during a whole ORT run to warrant consistency. +Consider the case that configuration data is stored in a version control system. +The context could then reference a branch that contains the configuration files. +This branch may change, however, while an ORT run is in progress, so that a worker executed later may see a different configuration than a one started earlier. +To address this issue, the `ConfigFileProvider` interface defines a `resolveContext()` function that expects a `Context` argument and returns a normalized or resolved context. +When constructing a `ConfigManager` instance, a flag can be passed whether this function should be called and the resolved context should be stored. +In the example of the version control system, the provider implementation could in its `resolveContext` operation replace a branch name by the corresponding commit ID to pinpoint the configuration files. +To support such constellations, at the beginning of an ORT run the context should be resolved once and then stored in the database. +Workers started later in the pipeline should obtain it from there. + +The interface of the `ConfigManager` class is similar to the ones of the wrapped provider interfaces with a few convenience functions. +The class supports error handling by catching all the exceptions thrown by providers and wrapping them in a standard `ConfigException`. + +The configuration passed to the `create()` function must contain a section named `configManager` that at least defines the names of the supported provider implementations to be loaded. +Those are specified by the following properties: + +| Property | Description | +|----------------|---------------------------------------------------------------------------------------------------| +| fileProvider | The name of the provider factory implementation for creating the `ConfigFileProvider` instance. | +| secretProvider | The name of the provider factory implementation for creating the `ConfigSecretProvider` instance. | + +In addition, the section can then contain further, provider-specific properties. +The following fragment gives an example: + +``` +configManager { + fileProvider = gitHub + fileProviderRepository = ort-server-config + fileProviderDefaultRevision = main + + secretProvider = env +} +``` diff --git a/config/github/README.adoc b/config/github/README.adoc deleted file mode 100644 index 6d2dc5eed..000000000 --- a/config/github/README.adoc +++ /dev/null @@ -1,65 +0,0 @@ -= GitHub config file provider - -This module provides an implementation of the `ConfigFileProvider` interface defined by the link:../README.adoc[Configuration abstraction] that reads configuration files from GitHub. - -== Synopsis -The implementation is located in the link:src/main/kotlin/GitHubConfigFileProvider.kt[GitHubConfigFileProvider] class. The provider is using GitHub REST API to access the configuration files stored in repositories. - -The implementation is using the provided context to make sure that the branch specified in it is present in the repository. If the branch is present, the `resolveContext` function returns the SHA-1 ID of the last commit in the branch, which can later be utilized to make sure that the same configuration is used for the whole ORT run. - -The `listFiles` function can be used to get the list of all the objects of type `file` located in the given path. It requires the provided path to refer to a directory, otherwise a `ConfigException` exception is thrown. - -In order to make sure that a configuration file is present in the given path, the `contains` function can be used. It accepts a branch name and a path to a file and returns `true` if the file is present or `false` if the returned object is not a file, or if the specified path does not exist in the given repository at all. - -The `getFile` function allows to download a file from the provided path and branch. This function sends a GET request to GitHub API with the header `Accept` set to GitHub's custom content type `application/vnd.github.raw` in order to receive a raw content of the referenced file. If the provided path refers a directory, GitHub API will ignore the `Accept` header and return a JSON array with the directory content. In this case, as well as in the case when the returned 'Content Type' header is neither one of `application/vnd.github.raw` or `application/json`, or it is missing, a [ConfigException] is thrown with the description of the cause. - -== Configuration -In order to activate `GitHubConfigFileProvider`, the application configuration must contain a section `configManager` with a property `fileProvider` set to the value "github-config". In addition, there are several configuration properties required by the provider. The fragment below shows an example: - -[source] ----- -configManager { - fileProvider = "github-config" - gitHubApiUrl = "https://api.github.com" - gitHubRepositoryOwner = "ownername" - gitHubRepositoryName = "reponame" - gitHubDefaultBranch = "config" -} ----- - -Table <> contains a description of the supported configuration properties: - -[#tab_github_config] -.Supported configuration options -[cols="1,3,1,1",options=header] -|=== -|Property |Description |Default |Secret - -|gitHubApiUrl -|Defines the base URL of the GitHub REST API. Typically, this property does not need to be specified, since the default value should work. -|https://api.github.com -|no - -|gitHubRepositoryOwner -|The name of the owner of the repository that contains the configuration information. This corresponds to the `OWNER` parameter of the GitHub REST API. -|none -|no - -|gitHubRepositoryName -|The name of the repository that contains the configuration information. Together with the `gitHubRepositoryOwner` property, the repository is uniquely identified. This corresponds to the `REPO` parameter of the GitHub REST API. -|none -|no - -|gitHubDefaultBranch -|The default branch in the repository that contains the configuration information. Users can select a specific branch by passing a corresponding `Context` to the `resolveContext()` function. If here the default context is provided, this provider implementation uses the configured default branch. -|main -|no - -|gitHubApiToken -|The personal access token to authorize against the GitHub REST API. -|none -|yes - -|=== - -The provider implementation is using the Bearer Token authorization. The token is obtained from the `ConfigSecretProvider` via the `gitHubApiToken` path. For the details on GitHub API authorization see the link:https://docs.github.com/en/rest/overview/authenticating-to-the-rest-api?apiVersion=2022-11-28[Documentation on Authenticating to the GitHub REST API]. diff --git a/config/github/README.md b/config/github/README.md new file mode 100644 index 000000000..0fb99e0b0 --- /dev/null +++ b/config/github/README.md @@ -0,0 +1,52 @@ +# GitHub config file provider + +This module provides an implementation of the `ConfigFileProvider` interface defined by the [Configuration abstraction](../README.md) that reads configuration files from GitHub. + +## Synopsis + +The implementation is located in the [GitHubConfigFileProvider](src/main/kotlin/GitHubConfigFileProvider.kt) class. +The provider is using GitHub REST API to access the configuration files stored in repositories. + +The implementation is using the provided context to make sure that the branch specified in it is present in the repository. +If the branch is present, the `resolveContext` function returns the SHA-1 ID of the last commit in the branch, which can later be utilized to make sure that the same configuration is used for the whole ORT run. + +The `listFiles` function can be used to get the list of all the objects of type `file` located in the given path. +It requires the provided path to refer to a directory, otherwise a `ConfigException` exception is thrown. + +In order to make sure that a configuration file is present in the given path, the `contains` function can be used. +It accepts a branch name and a path to a file and returns `true` if the file is present or `false` if the returned object is not a file, or if the specified path does not exist in the given repository at all. + +The `getFile` function allows to download a file from the provided path and branch. +This function sends a GET request to GitHub API with the header `Accept` set to GitHub’s custom content type `application/vnd.github.raw` in order to receive a raw content of the referenced file. +If the provided path refers a directory, GitHub API will ignore the `Accept` header and return a JSON array with the directory content. +In this case, as well as in the case when the returned 'Content Type' header is neither one of `application/vnd.github.raw` or `application/json`, or it is missing, a \[ConfigException\] is thrown with the description of the cause. + +## Configuration + +In order to activate `GitHubConfigFileProvider`, the application configuration must contain a section `configManager` with a property `fileProvider` set to the value "github-config". +In addition, there are several configuration properties required by the provider. +The fragment below shows an example: + +``` +configManager { + fileProvider = "github-config" + gitHubApiUrl = "https://api.github.com" + gitHubRepositoryOwner = "ownername" + gitHubRepositoryName = "reponame" + gitHubDefaultBranch = "config" +} +``` + +This table contains a description of the supported configuration properties: + +| Property | Description | Default | Secret | +|-----------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------|--------| +| gitHubApiUrl | Defines the base URL of the GitHub REST API. Typically, this property does not need to be specified, since the default value should work. | | no | +| gitHubRepositoryOwner | The name of the owner of the repository that contains the configuration information. This corresponds to the `OWNER` parameter of the GitHub REST API. | none | no | +| gitHubRepositoryName | The name of the repository that contains the configuration information. Together with the `gitHubRepositoryOwner` property, the repository is uniquely identified. This corresponds to the `REPO` parameter of the GitHub REST API. | none | no | +| gitHubDefaultBranch | The default branch in the repository that contains the configuration information. Users can select a specific branch by passing a corresponding `Context` to the `resolveContext()` function. If here the default context is provided, this provider implementation uses the configured default branch. | main | no | +| gitHubApiToken | The personal access token to authorize against the GitHub REST API. | none | yes | + +The provider implementation is using the Bearer Token authorization. +The token is obtained from the `ConfigSecretProvider` via the `gitHubApiToken` path. +For the details on GitHub API authorization see the [Documentation on Authenticating to the GitHub REST API](https://docs.github.com/en/rest/overview/authenticating-to-the-rest-api?apiVersion=2022-11-28). diff --git a/config/secret-file/README.adoc b/config/secret-file/README.adoc deleted file mode 100644 index a8a246702..000000000 --- a/config/secret-file/README.adoc +++ /dev/null @@ -1,42 +0,0 @@ -= File-based config secret provider - -This module provides an implementation of the `ConfigSecretProvider` interface defined by the link:../README.adoc[Configuration abstraction] that reads secret values from files. - -== Synopsis -The implementation is located in the link:src/main/kotlin/ConfigSecretFileProvider.kt[ConfigSecretFileProvider] class. An instance is created with a collection of files that contain secrets. Each file is expected to contain key-value pairs separated by newline characters. Empty lines or lines starting with a hash character ('#') are ignored. So the format is close to typical properties files, e.g.: - -.Example of a file with secrets -[source] ----- -# Database credentials -dbUser=scott -dbPassword=tiger - -# Messaging credentials -rabbitMqUser=xxxx -rabbitMqPassword=yyyy ----- - -When receiving a query for a specific secret, the provider implementation reads the files in the order they have been provided line by line until it finds a key that matches the requested secret path. It then returns the corresponding value. If all files have been processed without finding a matching key, the provider throws an exception. This implementation has the following consequences: - -* Since no caching is performed, changes on secret files (via an external mechanism) are directly visible the next time the value of a secret is queried. -* If multiple secret files are configured, secrets in one file can override the values of secrets in other files. The values with the highest priority just have to be defined in the file that is listed first. This is useful for instance to support different testing environments. It can be achieved by having a file with default secret values; but before this file, another file is placed which overrides selected secrets that are specific for a test environment. - -The `ConfigSecretFileProvider` implementation is applicable in different scenarios. It can be used in a local test setup where files with secrets are stored on the local hard drive; here the absolute paths to the secret files have to be provided. - -It can also be used in production with mechanisms that inject secrets as files into containers. One example of such a mechanism is the https://developer.hashicorp.com/vault/docs/platform/k8s/injector[Agent sidecar injector] of HashiCorp Vault, which can create volume mounts for Kubernetes pods containing files with secrets. The content of such files, which secrets to export and in which format, can be configured using annotations in interested pods. - -== Configuration -In order to activate `ConfigSecretFileProvider`, the application configuration must contain a section `configManager` with a property `secretProvider` set to the value "secret-files". In addition, there is only one configuration property supported for the list of files with secrets to be consumed by the provider. Here the full paths must be specified in a comma-delimited list; this is a mandatory property. The fragment below shows an example: - -[source] ----- -configManager { - secretProvider = "secret-file" - configSecretFileList = "/mount/secrets/ort-server-dev, /mount/secrets/ort-server" -} ----- - -In this example, two files with secrets are configured. The file _ort-server-dev_ may contain specific values for a dev environment. Since it is listed first, these values override secrets with the same keys defined in the second file, _ort-server_. - -The `configSecretFileList` property can alternatively be set via the `SECRET_FILE_LIST` environment variable. diff --git a/config/secret-file/README.md b/config/secret-file/README.md new file mode 100644 index 000000000..ac22a956b --- /dev/null +++ b/config/secret-file/README.md @@ -0,0 +1,58 @@ +# File-based config secret provider + +This module provides an implementation of the `ConfigSecretProvider` interface defined by the [Configuration abstraction](../README.md) that reads secret values from files. + +## Synopsis + +The implementation is located in the [ConfigSecretFileProvider](src/main/kotlin/ConfigSecretFileProvider.kt) class. +An instance is created with a collection of files that contain secrets. +Each file is expected to contain key-value pairs separated by newline characters. +Empty lines or lines starting with a hash character ('\#') are ignored. +So the format is close to typical properties files, e.g.: + +``` +# Database credentials +dbUser=scott +dbPassword=tiger + +# Messaging credentials +rabbitMqUser=xxxx +rabbitMqPassword=yyyy +``` + +When receiving a query for a specific secret, the provider implementation reads the files in the order they have been provided line by line until it finds a key that matches the requested secret path. +It then returns the corresponding value. +If all files have been processed without finding a matching key, the provider throws an exception. +This implementation has the following consequences: + +- Since no caching is performed, changes on secret files (via an external mechanism) are directly visible the next time the value of a secret is queried. + +- If multiple secret files are configured, secrets in one file can override the values of secrets in other files. + The values with the highest priority just have to be defined in the file that is listed first. + This is useful for instance to support different testing environments. + It can be achieved by having a file with default secret values; but before this file, another file is placed which overrides selected secrets that are specific for a test environment. + +The `ConfigSecretFileProvider` implementation is applicable in different scenarios. +It can be used in a local test setup where files with secrets are stored on the local hard drive; here the absolute paths to the secret files have to be provided. + +It can also be used in production with mechanisms that inject secrets as files into containers. +One example of such a mechanism is the [Agent sidecar injector](https://developer.hashicorp.com/vault/docs/platform/k8s/injector) of HashiCorp Vault, which can create volume mounts for Kubernetes pods containing files with secrets. +The content of such files, which secrets to export and in which format, can be configured using annotations in interested pods. + +## Configuration + +In order to activate `ConfigSecretFileProvider`, the application configuration must contain a section `configManager` with a property `secretProvider` set to the value "secret-files". +In addition, there is only one configuration property supported for the list of files with secrets to be consumed by the provider. +Here the full paths must be specified in a comma-delimited list; this is a mandatory property. +The fragment below shows an example: + +``` +configManager { + secretProvider = "secret-file" + configSecretFileList = "/mount/secrets/ort-server-dev, /mount/secrets/ort-server" +} +``` + +In this example, two files with secrets are configured. The file *ort-server-dev* may contain specific values for a dev environment. Since it is listed first, these values override secrets with the same keys defined in the second file, *ort-server*. + +The `configSecretFileList` property can alternatively be set via the `SECRET_FILE_LIST` environment variable. diff --git a/logaccess/README.adoc b/logaccess/README.adoc deleted file mode 100644 index 2aba857cc..000000000 --- a/logaccess/README.adoc +++ /dev/null @@ -1,48 +0,0 @@ -= Access to External Logging Systems - -This document describes the log file abstraction used within the ORT server to allow the download of log files from external logging systems. - -== Purpose -In order to diagnose problems with ORT runs or other subsystems of ORT Server, it is essential to have access to the logs generated by the affected components. For complex deployments of ORT Server, it is therefore expected that an external log management and aggregation tool is used to have logs over a longer timeframe available with sophisticated search and filter capabilities. - -While this functionality and tooling is typically used by operations, it is not necessarily available to end users of ORT Server who trigger ORT runs on their repositories. Those users need access to the logs for their own runs on a reasonable level of detail. This information is available in the logging system, but users need an easy way to obtain it, which does not require to log into another system and to learn the syntax of search queries. - -This use case is handled by the log file abstraction. It defines an interface that allows downloading log files for specific ORT runs and workers from an external log management system. ORT Server offers an endpoint that exposes this functionality, so users can retrieve their log files via the API. Behind the scenes, the concrete implementation in use fetches the data from an external system or whatever other source. - -== Service Provider Interfaces -This section describes the interfaces that need to be implemented in order to integrate a source of log information with ORT Server. - -=== Access Interface -The interaction with the log source is done via the link:spi/src/main/kotlin/LogFileProvider.kt[LogFileProvider] interface. To reduce the effort required for a concrete implementation, the interface defines a single method only to retrieve a log file for a specific ORT run and a single worker. The function expects the following parameters: - -* The ID of the affected ORT run. -* A constant defining the worker for which the logs are to be retrieved. -* A set of log levels to include into the result. -* The time range for which logs are to be fetched. This is intended as a hint for an implementation. Some log systems may need a time range in order to perform efficient queries. Since the caller has access to the ORT Server database, it is straight-forward to obtain the start and end time for the affected ORT run. - -Since log data can become large, especially when including the level _DEBUG_, it makes sense to use a single worker as granularity for retrieving data. The logic to bundle the logs of a whole run can then be implemented by the caller. - -For the log sources (i.e. the workers) of a run and the log levels, enum classes are defined. Filtering based on a log level typically means that all logs of this level or higher levels should be included. This logic does not have to be implemented by the log file provider, as it is passed the complete set of levels to retrieve; so it can do a strict comparison of levels. - -An implementation is free to throw arbitrary exceptions if something goes wrong. They are caught by the abstraction and wrapped into a standard exception. - -=== Factory Interface -The abstraction defines a typical factory interface for creating `LogFileProvider` instances based on the service loader mechanism: link:spi/src/main/kotlin/LogFileProviderFactory.kt[LogFileProviderFactory]. - -The factory function is passed a `ConfigManager` as parameter. So the provider instance can be configured for the external system it has to access. Credentials that may be required to interact with the system can be obtained from the `ConfigManager` as well. - -== Using the Log File Abstraction -With link:spi/src/main/kotlin/LogFileService.kt[LogFileService], the Log File Abstraction provides a facade class that takes care of the creation of the underlying `LogFileProvider` and offers advanced functionality. - -While `LogFileProvider` supports downloading only a single log file at once, `LogFileService` can be queried for a set of log sources for an ORT run. So, the logs of all workers could be retrieved in a single step. The downloaded log files are automatically added to a Zip archive, which can then be sent to the caller. - -When creating a `LogFileService` instance via the static `create()` factory function, a `ConfigManager` object has to be provided. The configuration wrapped in this object must have a section named `logFileService`. This section defines the provider implementation to be used in a property named `name`. `LogFileService` reads this property and then searches on the classpath for a `LogFileProviderFactory` implementation with this name. The sub config manager for the `logFileService` section is then passed to the matched factory; thus it can contain additional properties to be evaluated by the concrete factory implementation. The listing below shows an example configuration: - -[source] ----- -logFileService { - name = loki - url = https://loki.example.org/ - ... -} ----- diff --git a/logaccess/README.md b/logaccess/README.md new file mode 100644 index 000000000..f83758ad2 --- /dev/null +++ b/logaccess/README.md @@ -0,0 +1,76 @@ +# Access to External Logging Systems + +This document describes the log file abstraction used within the ORT server to allow the download of log files from external logging systems. + +## Purpose + +In order to diagnose problems with ORT runs or other subsystems of ORT Server, it is essential to have access to the logs generated by the affected components. +For complex deployments of ORT Server, it is therefore expected that an external log management and aggregation tool is used to have logs over a longer timeframe available with sophisticated search and filter capabilities. + +While this functionality and tooling is typically used by operations, it is not necessarily available to end users of ORT Server who trigger ORT runs on their repositories. +Those users need access to the logs for their own runs on a reasonable level of detail. +This information is available in the logging system, but users need an easy way to obtain it, which does not require to log into another system and to learn the syntax of search queries. + +This use case is handled by the log file abstraction. +It defines an interface that allows downloading log files for specific ORT runs and workers from an external log management system. +ORT Server offers an endpoint that exposes this functionality, so users can retrieve their log files via the API. +Behind the scenes, the concrete implementation in use fetches the data from an external system or whatever other source. + +## Service Provider Interfaces + +This section describes the interfaces that need to be implemented in order to integrate a source of log information with ORT Server. + +### Access Interface + +The interaction with the log source is done via the [LogFileProvider](spi/src/main/kotlin/LogFileProvider.kt) interface. +To reduce the effort required for a concrete implementation, the interface defines a single method only to retrieve a log file for a specific ORT run and a single worker. +The function expects the following parameters: + +- The ID of the affected ORT run. +- A constant defining the worker for which the logs are to be retrieved. +- A set of log levels to include into the result. +- The time range for which logs are to be fetched. + This is intended as a hint for an implementation. + Some log systems may need a time range in order to perform efficient queries. + Since the caller has access to the ORT Server database, it is straight-forward to obtain the start and end time for the affected ORT run. + +Since log data can become large, especially when including the level *DEBUG*, it makes sense to use a single worker as granularity for retrieving data. +The logic to bundle the logs of a whole run can then be implemented by the caller. + +For the log sources (i.e. the workers) of a run and the log levels, enum classes are defined. +Filtering based on a log level typically means that all logs of this level or higher levels should be included. +This logic does not have to be implemented by the log file provider, as it is passed the complete set of levels to retrieve; so it can do a strict comparison of levels. + +An implementation is free to throw arbitrary exceptions if something goes wrong. +They are caught by the abstraction and wrapped into a standard exception. + +### Factory Interface + +The abstraction defines a typical factory interface for creating `LogFileProvider` instances based on the service loader mechanism: [LogFileProviderFactory](spi/src/main/kotlin/LogFileProviderFactory.kt). + +The factory function is passed a `ConfigManager` as parameter. +So the provider instance can be configured for the external system it has to access. +Credentials that may be required to interact with the system can be obtained from the `ConfigManager` as well. + +## Using the Log File Abstraction + +With [LogFileService](spi/src/main/kotlin/LogFileService.kt), the Log File Abstraction provides a facade class that takes care of the creation of the underlying `LogFileProvider` and offers advanced functionality. + +While `LogFileProvider` supports downloading only a single log file at once, `LogFileService` can be queried for a set of log sources for an ORT run. +So, the logs of all workers could be retrieved in a single step. +The downloaded log files are automatically added to a Zip archive, which can then be sent to the caller. + +When creating a `LogFileService` instance via the static `create()` factory function, a `ConfigManager` object has to be provided. +The configuration wrapped in this object must have a section named `logFileService`. +This section defines the provider implementation to be used in a property named `name`. +`LogFileService` reads this property and then searches on the classpath for a `LogFileProviderFactory` implementation with this name. +The sub config manager for the `logFileService` section is then passed to the matched factory; thus it can contain additional properties to be evaluated by the concrete factory implementation. +The listing below shows an example configuration: + +``` +logFileService { + name = loki + url = https://loki.example.org/ + ... +} +``` diff --git a/logaccess/loki/README.adoc b/logaccess/loki/README.adoc deleted file mode 100644 index 93f590d93..000000000 --- a/logaccess/loki/README.adoc +++ /dev/null @@ -1,76 +0,0 @@ -= Grafana Loki Log Access Implementation - -This module provides an implementation of the link:../README.adoc[Log access abstraction] based on -https://grafana.com/oss/loki/[Grafana Loki]. - -== Synopsis -The link:src/main/kotlin/LokiLogFileProvider.kt[LokiLogFileProvider] class provided by this module sends requests against the https://grafana.com/docs/loki/latest/reference/api/[HTTP API] of a configured Grafana Loki instance to retrieve the logs of a specific ORT run. - -In order to obtain the logs of a specific ORT run step, the provider sends a single https://grafana.com/docs/loki/latest/query/[LogQL] query to the server which selects the ORT run by its ID, a configurable (Kubernetes) namespace, and the log source - which corresponds to the worker responsible for this step. Typically, it is not possible to fetch all log statements in a single call. The responses from Loki are rather verbose and therefore consume a certain amount of memory. To prevent issues with excessive memory consumption, the Loki API always applies a limit to query results. The limit is set to 100 (log lines) per default, but can be overridden by a parameter. This module allows configuring this limit. The `LokiLogFileProvider` class passes the configured limit to the Loki API and evaluates the result size to determine whether more logs need to be fetched: if the number of returned log lines is greater than or equal to the specified limit, the provider fetches another chunk of data; otherwise, the log is considered complete. - -According to the documentation about https://grafana.com/docs/loki/latest/operations/authentication/[Authentication], Grafana Loki does not support authentication mechanisms by itself. Instead, the system can be secured via a reverse proxy which can offer different authentication schemes. This module currently implements support for an optional Basic Authentication: If username and password are specified in the configuration, the requests sent against the Loki API contain a corresponding `Authorization` header for Basic Authentication. - -Loki supports a https://grafana.com/docs/loki/latest/operations/multi-tenancy/[multi-tenancy] mode. When using this mode, every request must contain a special header that determines the current organization ID. The configuration of this module supports such a property: if it is defined, the header is added automatically. - -== Configuration -As defined by the Log Access SPI module, the configuration takes place in a section named `logFileProvider`. Here a -number of properties specific to this module can be set as shown in the listing below. Mandatory properties are the server URL and the namespace; the other properties are optional. - -.Configuration of the Loki log file provider -[source] ----- -logFileProvider { - name = "loki" - lokiServerUrl = https://loki.example.org/ - lokiNamespace = prod - lokiQueryLimit = 1500 - lokiUsername = scott - lokiPassword = tiger - lokiTenantId = 42 -} ----- - -Table <> contains a description of the supported configuration properties: - -[#tab_loki_config] -.Supported configuration options -[cols="1,1,3,1,1",options=header] -|=== -|Property |Variable |Description |Default |Secret - -|lokiServerUrl -|LOKI_SERVER_URL -|The URL under which the Loki HTTP API can be reached. This is just the base URL; the path for the endpoint (including `/loki/api/v1`) is appended automatically. -|mandatory -|no - -|lokiNamespace -|LOKI_NAMESPACE -|The name of the namespace in Kubernetes in which the worker pods are running. The namespace is added to the query sent to the Loki API to reduce the amount of data to search for. -|mandatory -|no - -|lokiQueryLimit -|LOKI_QUERY_LIMIT -|The value to be passed as `limit` parameter to the Loki query API. It determines the number of log lines that can be retrieved in a single call. If more logs are available, the provider sends another request. -|1000 -|no - -|lokiUserName -|LOKI_USER_NAME -|An optional username for Basic Auth authentication. -|undefined -|no - -|lokiPassword -|LOKI_PASSWORD -|An optional password for Basic Auth authentication. If credentials are defined, the provider implementation adds an `Authorization` header for Basic Auth to requests to the query API. -|undefined -|yes - -|lokiTenantId -|LOKI_TENANT_ID -|The ID of the tenant if Loki is running in multi-tenancy mode. -|undefined -|no -|=== diff --git a/logaccess/loki/README.md b/logaccess/loki/README.md new file mode 100644 index 000000000..28c909312 --- /dev/null +++ b/logaccess/loki/README.md @@ -0,0 +1,52 @@ +# Grafana Loki Log Access Implementation + +This module provides an implementation of the [Log access abstraction](../README.adoc) based on [Grafana Loki](https://grafana.com/oss/loki/). + +## Synopsis + +The [LokiLogFileProvider](src/main/kotlin/LokiLogFileProvider.kt) class provided by this module sends requests against the [HTTP API](https://grafana.com/docs/loki/latest/reference/api/) of a configured Grafana Loki instance to retrieve the logs of a specific ORT run. + +In order to obtain the logs of a specific ORT run step, the provider sends a single [LogQL](https://grafana.com/docs/loki/latest/query/) query to the server which selects the ORT run by its ID, a configurable (Kubernetes) namespace, and the log source - which corresponds to the worker responsible for this step. +Typically, it is not possible to fetch all log statements in a single call. +The responses from Loki are rather verbose and therefore consume a certain amount of memory. +To prevent issues with excessive memory consumption, the Loki API always applies a limit to query results. +The limit is set to 100 (log lines) per default, but can be overridden by a parameter. +This module allows configuring this limit. +The `LokiLogFileProvider` class passes the configured limit to the Loki API and evaluates the result size to determine whether more logs need to be fetched: if the number of returned log lines is greater than or equal to the specified limit, the provider fetches another chunk of data; otherwise, the log is considered complete. + +According to the documentation about [Authentication](https://grafana.com/docs/loki/latest/operations/authentication/), Grafana Loki does not support authentication mechanisms by itself. +Instead, the system can be secured via a reverse proxy which can offer different authentication schemes. +This module currently implements support for an optional Basic Authentication: If username and password are specified in the configuration, the requests sent against the Loki API contain a corresponding `Authorization` header for Basic Authentication. + +Loki supports a [multi-tenancy](https://grafana.com/docs/loki/latest/operations/multi-tenancy/) mode. +When using this mode, every request must contain a special header that determines the current organization ID. +The configuration of this module supports such a property: if it is defined, the header is added automatically. + +## Configuration + +As defined by the Log Access SPI module, the configuration takes place in a section named `logFileProvider`. +Here a number of properties specific to this module can be set as shown in the listing below. +Mandatory properties are the server URL and the namespace; the other properties are optional. + +``` +logFileProvider { + name = "loki" + lokiServerUrl = https://loki.example.org/ + lokiNamespace = prod + lokiQueryLimit = 1500 + lokiUsername = scott + lokiPassword = tiger + lokiTenantId = 42 +} +``` + +This table contains a description of the supported configuration properties: + +| Property | Variable | Description | Default | Secret | +|----------------|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|--------| +| lokiServerUrl | LOKI\_SERVER\_URL | The URL under which the Loki HTTP API can be reached. This is just the base URL; the path for the endpoint (including `/loki/api/v1`) is appended automatically. | mandatory | no | +| lokiNamespace | LOKI\_NAMESPACE | The name of the namespace in Kubernetes in which the worker pods are running. The namespace is added to the query sent to the Loki API to reduce the amount of data to search for. | mandatory | no | +| lokiQueryLimit | LOKI\_QUERY\_LIMIT | The value to be passed as `limit` parameter to the Loki query API. It determines the number of log lines that can be retrieved in a single call. If more logs are available, the provider sends another request. | 1000 | no | +| lokiUserName | LOKI\_USER\_NAME | An optional username for Basic Auth authentication. | undefined | no | +| lokiPassword | LOKI\_PASSWORD | An optional password for Basic Auth authentication. If credentials are defined, the provider implementation adds an `Authorization` header for Basic Auth to requests to the query API. | undefined | yes | +| lokiTenantId | LOKI\_TENANT\_ID | The ID of the tenant if Loki is running in multi-tenancy mode. | undefined | no | diff --git a/secrets/README.adoc b/secrets/README.adoc deleted file mode 100644 index df584b9b9..000000000 --- a/secrets/README.adoc +++ /dev/null @@ -1,107 +0,0 @@ -= Access to Secrets - -This document describes the secrets storage abstraction used within the ORT server to allow the integration with -different secret storage products. - -== Purpose -In order to access source code and artifact repositories for doing analysis runs, the ORT server must have correct -credentials. The infrastructure to be accessed is defined dynamically by users - by setting up the hierarchical -structures for organizations, products, and repositories. While doing this, the corresponding credentials must be -provided as well. This implies that an API is available to create, read, modify, and update secrets or credentials. -With such an API in place, users are enabled to fully manage the credentials required for their infrastructure -themselves - without needing support from server administrators. - -NOTE: This document treats the terms _secrets_ and _credentials_ as synonyms. - -This means that the ORT server needs to store secrets on behalf of its users. There is, however, a difference between -secrets and other entities managed by users: Secrets have to be kept strictly confidential. To achieve this, they are -typically stored in dedicated secret storages, and not in the database like other data. - -Analogously to the link:../transport/README.adoc[Transport layer abstraction], the ORT server should not set on a -specific secret storage product, but be agnostic to the environment it is running on. To support arbitrary products, -again an abstraction for a secret storage service has to be defined. - -== Service Provider Interfaces -This section describes the interfaces that need to be implemented in order to integrate a concrete secret storage -product. - -=== Access Interface -The secrets abstraction layer defines a basic interface, link:spi/src/main/kotlin/SecretsProvider.kt[SecretsProvider], -with CRUD operations on secrets. This interface has to be implemented in order to integrate a concrete storage -product. To simplify potential implementations, the interface is reduced to a bare minimum and just offers functions -for the basic use cases: - -* read secrets -* write secrets (create new ones or update existing ones) -* remove secrets -* list available secrets - -Secrets are identified by paths which are basically strings. This is the least common denominator over various -concrete secret storage products. While some of them (e.g. https://www.vaultproject.io/[HashiCorp Vault]) support a -hierarchical organization of secrets, others are quite restricted in this regard (for instance, -https://azure.microsoft.com/en-us/products/key-vault[Azure Key Vault] only offers a key-value storage with a limited -length of keys). So, the scope of the secrets storage abstraction lies only in storing the secret value under an -arbitrary (maybe even synthetic) key. Additional metadata that will be required to actually use the secret - such as a -human-readable name, a description, or the information to which organization/product/repository it is assigned - need -to be stored separately. - -There are a few further assumptions taken by the abstraction layer implementation to simplify concrete implementations -of the `SecretsProvider` interface: - -* When querying a secret for a non-existing path an implementation should return *null*. This result can be interpreted - by the abstraction, and a concrete implementation does not need to bother with throwing specific exceptions. -* A concrete implementation can throw arbitrary, proprietary exceptions. These are caught by the abstraction and - wrapped into a standard exception class. - -=== Factory Interface -The creation of a concrete `SecretsProvider` instance lies in the responsibility of a factory defined by the -link:spi/src/main/kotlin/SecretsProviderFactory.kt[SecretsProviderFactory] interface. - -The factories available are looked up via the Java ServiceLoader mechanism. Each factory is assigned a unique name by -which they can be identified in the application configuration; thus it can be configured easily which secrets storage -implementation to be used. - -There is one factory method that expects a configuration object and returns a `SecretsProvider` instance. The idea -here is that the properties required by a specific implementation can also be set in the application configuration; -they are then passed through to the factory, which can initialize the provider instance accordingly. - -== Using Secrets -Using the `SecretsProvider` interface directly would be rather inconvenient, due to its limited functionality and the -implicit assumptions described in the previous section. Therefore, the abstraction offers a different entry point in -form of the link:spi/src/main/kotlin/SecretStorage.kt[SecretStorage] class. - -`SecretStorage` is first a factory for creating and initializing a concrete `SecretsProvider` implementation. For this -purpose, it offers a `createStorage()` function in its companion object. The function does the following: - -* It reads the name of the secret storage implementation to be used from the application configuration. -* It uses a _service loader_ to obtain all the registered `SecretsProviderFactory` implementations available on the - classpath. -* It searches for the factory implementation with the configured name (and fails if it cannot be found). -* It invokes the factory function of this factory implementation passing in the application configuration to obtain a - `SecretsProvider` instance. -* It returns a new `SecretStorage` implementation that wraps this provider instance. - -The secrets abstraction consumes a section named `secretsProvider` from the application configuration. It has the -following structure: - -[source] ----- -secretsProvider { - name = - - # Properties specific to the selected secret storage implementation - ... -} ----- - -A `SecretStorage` instance then allows convenient interaction with the wrapped `SecretsProvider`. It offers a richer -interface for operations on secrets. Basically, it adds the following functionality on top of that provided by -`SecretsProvider`: - -* Provider-specific exceptions are caught and wrapped in generic `SecretStorageException` objects. So, client code - only has to handle this exception type. -* For all operations, there are variants returning a Kotlin `Result` instance instead of throwing an exception. They - can be used if a more functional style for exception handling is preferred. In case of a failure, the `Result` also - contains a `SecretStorageException` that wraps the original exception from the underlying provider. -* For querying secrets, there are functions that require the secret in question to exist and throw an exception or - return a failure `Result` if this is not the case. diff --git a/secrets/README.md b/secrets/README.md new file mode 100644 index 000000000..961476303 --- /dev/null +++ b/secrets/README.md @@ -0,0 +1,100 @@ +# Access to Secrets + +This document describes the secrets storage abstraction used within the ORT server to allow the integration with +different secret storage products. + +## Purpose + +In order to access source code and artifact repositories for doing analysis runs, the ORT server must have correct credentials. +The infrastructure to be accessed is defined dynamically by users - by setting up the hierarchical structures for organizations, products, and repositories. +While doing this, the corresponding credentials must be provided as well. +This implies that an API is available to create, read, modify, and update secrets or credentials. +With such an API in place, users are enabled to fully manage the credentials required for their infrastructure +themselves - without needing support from server administrators. + +> [!NOTE] +> This document treats the terms *secrets* and *credentials* as synonyms. + +This means that the ORT server needs to store secrets on behalf of its users. +There is, however, a difference between secrets and other entities managed by users: +Secrets have to be kept strictly confidential. +To achieve this, they are typically stored in dedicated secret storages, and not in the database like other data. + +Analogously to the [Transport layer abstraction](../transport/README.adoc), the ORT server should not set on a specific secret storage product, but be agnostic to the environment it is running on. +To support arbitrary products, again an abstraction for a secret storage service has to be defined. + +## Service Provider Interfaces + +This section describes the interfaces that need to be implemented in order to integrate a concrete secret storage product. + +### Access Interface + +The secrets abstraction layer defines a basic interface, [SecretsProvider](spi/src/main/kotlin/SecretsProvider.kt), with CRUD operations on secrets. +This interface has to be implemented in order to integrate a concrete storage product. +To simplify potential implementations, the interface is reduced to a bare minimum and just offers functions for the basic use cases: + +- read secrets +- write secrets (create new ones or update existing ones) +- remove secrets +- list available secrets + +Secrets are identified by paths which are basically strings. +This is the least common denominator over various concrete secret storage products. +While some of them (e.g. [HashiCorp Vault](https://www.vaultproject.io/)) support a hierarchical organization of secrets, others are quite restricted in this regard (for instance, [Azure Key Vault](https://azure.microsoft.com/en-us/products/key-vault) only offers a key-value storage with a limited length of keys). +So, the scope of the secrets storage abstraction lies only in storing the secret value under an arbitrary (maybe even synthetic) key. +Additional metadata that will be required to actually use the secret - such as a human-readable name, a description, or the information to which organization/product/repository it is assigned - need to be stored separately. + +There are a few further assumptions taken by the abstraction layer implementation to simplify concrete implementations of the `SecretsProvider` interface: + +- When querying a secret for a non-existing path an implementation should return **null**. + This result can be interpreted by the abstraction, and a concrete implementation does not need to bother with throwing specific exceptions. + +- A concrete implementation can throw arbitrary, proprietary exceptions. + These are caught by the abstraction and wrapped into a standard exception class. + +### Factory Interface + +The creation of a concrete `SecretsProvider` instance lies in the responsibility of a factory defined by the [SecretsProviderFactory](spi/src/main/kotlin/SecretsProviderFactory.kt) interface. + +The factories available are looked up via the Java ServiceLoader mechanism. +Each factory is assigned a unique name by which they can be identified in the application configuration; thus it can be configured easily which secrets storage implementation to be used. + +There is one factory method that expects a configuration object and returns a `SecretsProvider` instance. +The idea here is that the properties required by a specific implementation can also be set in the application configuration; they are then passed through to the factory, which can initialize the provider instance accordingly. + +## Using Secrets + +Using the `SecretsProvider` interface directly would be rather inconvenient, due to its limited functionality and the implicit assumptions described in the previous section. +Therefore, the abstraction offers a different entry point in form of the [SecretStorage](spi/src/main/kotlin/SecretStorage.kt) class. + +`SecretStorage` is first a factory for creating and initializing a concrete `SecretsProvider` implementation. +For this purpose, it offers a `createStorage()` function in its companion object. The function does the following: + +- It reads the name of the secret storage implementation to be used from the application configuration. +- It uses a *service loader* to obtain all the registered `SecretsProviderFactory` implementations available on the classpath. +- It searches for the factory implementation with the configured name (and fails if it cannot be found). +- It invokes the factory function of this factory implementation passing in the application configuration to obtain a `SecretsProvider` instance. +- It returns a new `SecretStorage` implementation that wraps this provider instance. + +The secrets abstraction consumes a section named `secretsProvider` from the application configuration. +It has the following structure: + +``` +secretsProvider { + name = + + # Properties specific to the selected secret storage implementation + ... +} +``` + +A `SecretStorage` instance then allows convenient interaction with the wrapped `SecretsProvider`. +It offers a richer interface for operations on secrets. +Basically, it adds the following functionality on top of that provided by `SecretsProvider`: + +- Provider-specific exceptions are caught and wrapped in generic `SecretStorageException` objects. + So, client code only has to handle this exception type. +- For all operations, there are variants returning a Kotlin `Result` instance instead of throwing an exception. + They can be used if a more functional style for exception handling is preferred. + In case of a failure, the `Result` also contains a `SecretStorageException` that wraps the original exception from the underlying provider. +- For querying secrets, there are functions that require the secret in question to exist and throw an exception or return a failure `Result` if this is not the case. diff --git a/secrets/vault/README.adoc b/secrets/vault/README.adoc deleted file mode 100644 index b715b9c0b..000000000 --- a/secrets/vault/README.adoc +++ /dev/null @@ -1,110 +0,0 @@ -= HashiCorp Vault Secret Provider Implementation - -This module provides an implementation of the link:../README.adoc[Secrets Abstraction Layer] based on -https://www.vaultproject.io/[HashiCorp Vault]. - -== Synopsis -The link:src/main/kotlin/VaultSecretsProvider.kt[VaultSecretsProvider] class implemented here communicates with the -REST API of HashiCorp Vault to access secrets managed by this service. - -The interaction with the Vault service is done via the -https://developer.hashicorp.com/vault/api-docs/secret/kv/kv-v2[KV Secrets Engine Version 2], which means that -versioning of secrets is available. - -For authentication, the https://developer.hashicorp.com/vault/api-docs/auth/approle[AppRole] authentication method is -used. The ORT Server application must be assigned a role with a policy that grants the required access rights to the -secrets to be managed. The ID of this role and a corresponding _secret Id_ must be provided as credentials. Based on -this, the provider implementation can obtain an access token from the Vault service. Refer to the -https://developer.hashicorp.com/vault/tutorials/auth-methods/approle[AppRole Pull Authentication Tutorial] for further -details. - -The Secrets Abstraction Layer operates on plain keys for secrets and does not support any hierarchical relations -between keys. To map those keys to specific paths in Vault, the provider implementation can be configured with a -_root path_ that is simply prefixed to the passed in paths for accessing secrets. Via this mechanism, it is possible -for instance that different provider instances (e.g. for production or test) access different parts of the Vault -storage. - -Another difference between the abstraction layer and Vault is that secrets in Vault can have an arbitrary number of -key value pairs stored under the secret's path, while the abstraction layer assigns only a single value to the secret. -This implementation handles this by using a default key internally. So, when writing a secret, `VaultSecretsProvider` -actually writes a secret at the path specified that has a specific key with the given value. Analogously, this default -key is read when querying the value of a secret. This speciality has to be taken into account when creating or updating -secrets directly in Vault that should be accessible by the Vault abstraction implementation. - -== Configuration -As defined by the Secrets SPI module, the configuration takes place in a section named `secretsProvider`. Here a -number of Vault-specific properties can be set as shown by the fragment below. The service URI and the credentials -are mandatory. - -.Configuration of the Vault secrets provider -[source] ----- -secretsProvider { - name = "vault" - vaultUri = "https://vault-service-uri.io" - vaultRoleId = "" - vaultSecretId = "" - vaultRootPath = "path/to/my/secrets" - vaultPrefix = "customPrefix" - vaultNamespace = "custom/namespace" -} ----- - -Table <> contains a description of the supported configuration properties: - -[#tab_vault_config] -.Supported configuration options -[cols="1,1,3,1,1",options=header] -|=== -|Property |Variable |Description |Default |Secret - -|vaultUri -|VAULT_URI -|The URI under which the Vault service can be reached. Here only the part up to the host name and optional port is -expected; no further URL paths. -|mandatory -|no - -|vaultRoleId -|VAULT_ROLE_ID -|The implementation uses the https://developer.hashicorp.com/vault/docs/auth/approle[AppRole] authentication method. -With this property the ID of the configured role is specified. -|mandatory -|yes - -|vaultSecretId -|VAULT_SECRET_ID -|The secret ID required for the https://developer.hashicorp.com/vault/docs/auth/approle[AppRole] authentication method. -|mandatory -|yes - -|vaultRootPath -|VAULT_ROOT_PATH -|Allows configuring a root path that is prepended to the paths provided to the secrets provider. Using this mechanism -makes it possible to store the managed secrets under a specific subtree of Vault's hierarchical structure. -|empty string -|no - -|vaultPrefix -|VAULT_PREFIX -|The different secret engines supported by Vault are mapped to specific paths that need to be specified in API -requests. There are default paths, but Vault allows custom configurations for secret engines leading to different -paths. In an environment with such a custom configuration, this property can be set accordingly. In case of the KV -secret engine, version 2, that is supported by this implementation, the default path for requests is -`/v1/secret/data/`. By setting the `vaultPrefix` to something different, e.g. `very-secret`, the URL in requests -changes to `/v1/very-secret/data/`. -|"secret" -|no - -|vaultNamespace -|VAULT_NAMESPACE -|The Enterprise version of HashiCorp Vault supports the -https://developer.hashicorp.com/vault/docs/enterprise/namespaces[namespaces] feature. It allows the clear separation -of secrets from multiple tenants. If namespaces are enabled, requests to the Vault API must contain a specific header -to select the current namespace. If this property is defined, the corresponding header is added. -|*null* -|no -|=== - -Since the role ID and the secret ID are actually credentials to access the Vault service, they are obtained as secrets -from the link:../../config/README.adoc[ConfigManager] under the same keys as listed in table <>. diff --git a/secrets/vault/README.md b/secrets/vault/README.md new file mode 100644 index 000000000..30746140d --- /dev/null +++ b/secrets/vault/README.md @@ -0,0 +1,58 @@ +# HashiCorp Vault Secret Provider Implementation + +This module provides an implementation of the [Secrets Abstraction Layer](../README.md) based on [HashiCorp Vault](https://www.vaultproject.io/). + +## Synopsis + +The [VaultSecretsProvider](src/main/kotlin/VaultSecretsProvider.kt) class implemented here communicates with the REST API of HashiCorp Vault to access secrets managed by this service. + +The interaction with the Vault service is done via the [KV Secrets Engine Version 2](https://developer.hashicorp.com/vault/api-docs/secret/kv/kv-v2), which means that versioning of secrets is available. + +For authentication, the [AppRole](https://developer.hashicorp.com/vault/api-docs/auth/approle) authentication method is used. +The ORT Server application must be assigned a role with a policy that grants the required access rights to the secrets to be managed. +The ID of this role and a corresponding *secret Id* must be provided as credentials. +Based on this, the provider implementation can obtain an access token from the Vault service. +Refer to the [AppRole Pull Authentication Tutorial](https://developer.hashicorp.com/vault/tutorials/auth-methods/approle) for further details. + +The Secrets Abstraction Layer operates on plain keys for secrets and does not support any hierarchical relations between keys. +To map those keys to specific paths in Vault, the provider implementation can be configured with a *root path* that is simply prefixed to the passed in paths for accessing secrets. +Via this mechanism, it is possible for instance that different provider instances (e.g. for production or test) access different parts of the Vault storage. + +Another difference between the abstraction layer and Vault is that secrets in Vault can have an arbitrary number of key value pairs stored under the secret’s path, while the abstraction layer assigns only a single value to the secret. +This implementation handles this by using a default key internally. +So, when writing a secret, `VaultSecretsProvider` actually writes a secret at the path specified that has a specific key with the given value. +Analogously, this default key is read when querying the value of a secret. +This speciality has to be taken into account when creating or updating secrets directly in Vault that should be accessible by the Vault abstraction implementation. + +## Configuration + +As defined by the Secrets SPI module, the configuration takes place in a section named `secretsProvider`. +Here a number of Vault-specific properties can be set as shown by the fragment below. +The service URI and the credentials are mandatory. + +This example shows the configuration of the Vault secrets provider: + +``` +secretsProvider { + name = "vault" + vaultUri = "https://vault-service-uri.io" + vaultRoleId = "" + vaultSecretId = "" + vaultRootPath = "path/to/my/secrets" + vaultPrefix = "customPrefix" + vaultNamespace = "custom/namespace" +} +``` + +This table contains a description of the supported configuration properties: + +| Property | Variable | Description | Default | Secret | +|----------------|-------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------|--------| +| vaultUri | VAULT\_URI | The URI under which the Vault service can be reached. Here only the part up to the host name and optional port is expected; no further URL paths. | mandatory | no | +| vaultRoleId | VAULT\_ROLE\_ID | The implementation uses the [AppRole](https://developer.hashicorp.com/vault/docs/auth/approle) authentication method. With this property the ID of the configured role is specified. | mandatory | yes | +| vaultSecretId | VAULT\_SECRET\_ID | The secret ID required for the [AppRole](https://developer.hashicorp.com/vault/docs/auth/approle) authentication method. | mandatory | yes | +| vaultRootPath | VAULT\_ROOT\_PATH | Allows configuring a root path that is prepended to the paths provided to the secrets provider. Using this mechanism makes it possible to store the managed secrets under a specific subtree of Vault’s hierarchical structure. | empty string | no | +| vaultPrefix | VAULT\_PREFIX | The different secret engines supported by Vault are mapped to specific paths that need to be specified in API requests. There are default paths, but Vault allows custom configurations for secret engines leading to different paths. In an environment with such a custom configuration, this property can be set accordingly. In case of the KV secret engine, version 2, that is supported by this implementation, the default path for requests is `/v1/secret/data/`. By setting the `vaultPrefix` to something different, e.g. `very-secret`, the URL in requests changes to `/v1/very-secret/data/`. | "secret" | no | +| vaultNamespace | VAULT\_NAMESPACE | The Enterprise version of HashiCorp Vault supports the [namespaces](https://developer.hashicorp.com/vault/docs/enterprise/namespaces) feature. It allows the clear separation of secrets from multiple tenants. If namespaces are enabled, requests to the Vault API must contain a specific header to select the current namespace. If this property is defined, the corresponding header is added. | **null** | no | + +Since the role ID and the secret ID are actually credentials to access the Vault service, they are obtained as secrets from the [ConfigManager](../../config/README.md) under the same keys as listed in the table above. diff --git a/storage/README.adoc b/storage/README.adoc deleted file mode 100644 index 0b2eddb49..000000000 --- a/storage/README.adoc +++ /dev/null @@ -1,115 +0,0 @@ -= Storage Abstraction - -This document describes the storage abstraction used within the ORT Server to provide a generic mechanism to store -arbitrary data temporarily or for a longer time. - -== Purpose -Some components of ORT Server have the requirement to store (potentially large) data for different purposes. One -prominent example is storing the reports produced by the reporter. They need to be persisted somewhere for a -certain amount of time, so that they can be queried and accessed by users. This use case requires a permanent storage -for files, but there are other use cases as well where some data needs to be cached temporarily, e.g. storing -license texts obtained by the scanner, so that they can be referenced in reports later, or caching metadata of -packages for some package managers, so that it does not have to be retrieved on each analyzer run. - -These use cases have in common that arbitrary data has to be stored under a specific key. A suitable abstraction would -therefore be a generic storage interface oriented on a key/value storage. With regard to potential implementations, -there are different options that depend on the concrete storage characteristics of specific data, such as - -* which data needs to be stored -* how big is it -* how long is the data to be stored (short-term caching vs long-term persistence) - -Bringing this together, there can be a generic, key-value-based storage interface defining operations to access and -manipulate data on a storage. For this interface, different implementations exist. This is in-line with other -abstraction layer implementations used within ORT Server, e.g. for storing secrets or passing messages. It allows -integrating various storage mechanisms supported by the platform on which the ORT Server application is running. In -addition, a single application instance can be configured to use multiple storage implementations for different -kinds of data, so that a high flexibility can be achieved. - -== Service Provider Interfaces -This section describes the interfaces that need to be implemented in order to integrate a concrete storage -product. - -=== Access Interface -The storage abstraction layer defines a basic interface, link:spi/src/main/kotlin/StorageProvider.kt[StorageProvider], -allowing to associate data with (string-based) keys. The data is represented by streams, taking the fact into account -that it can become potentially large. (For instance, for the use case of storing report files, some of the files -will have sizes with more than 10 MB.) So, it can be accessed without loading it completely into memory. The interface -defines only a minimum set of CRUD operations to simplify its implementation. - -A dedicated value class has been introduced to represent storage keys. The function to query data returns a -`StorageEntry` object which contains the `InputStream` with the actual data and a string with its content-type. The -idea behind this is that the ORT Server API may in some cases provide direct access to stored data, as is the case for -serving report files. The corresponding endpoint can then easily set the content-type header accordingly. A future -extension could be adding support for other/arbitrary metadata properties. - -The `StorageProvider` interface defines the `write` operation to create and update entries in the storage. It -expects the key and the stream with the data to store. Additionally, the (optional) content-type can be provided. -Another mandatory parameter is the length of the data. This information is required by some storage products, for -instance by Azure Storage Accounts. Since the size of the data cannot be obtained easily from the passed in stream, -it has to be provided explicitly. - -The interface defines further operations to remove an entry from the storage and to check whether an entry with a -given key exists. For the time being, there is no operation to list all existing keys. The expectation is that each -use case implemented with a storage defines a specific convention how to construct keys and how they are supposed -to be interpreted. - -There are a few further assumptions taken by the abstraction layer implementation to simplify concrete implementations -of the `StorageProvider` interface: - -* A concrete implementation can throw arbitrary, proprietary exceptions. These are caught by the abstraction and - wrapped into a standard `StorageException` instance. -* The `read()` operation should throw an exception when the passed in key does not exist. Clients should use the - `contains()` operation to make sure that the desired data is actually available. - -=== Factory Interface -As is typical for the abstraction layers used within ORT Server, the storage abstraction defines a factory interface -to create a specific `StorageProvider` instance based on the application configuration: -link:spi/src/main/kotlin/StorageProviderFactory.kt[StorageProviderFactory]. - -The interface defines a `name` property, which is used to look up a specific factory from the classpath - as usual, -the available factories are obtained via the Java ServiceLoader mechanism, and the selected one is matched by its -name. - -The factory function expects a configuration object as parameter. A concrete implementation can define and -evaluate custom configuration options which are accessible from this object. - -== Using a Storage -With the link:spi/src/main/kotlin/Storage.kt[Storage] class, the storage abstraction defines a facade class that -handles the creation and initialization of a concrete `StorageProvider` instance and simplifies the usage of the -storage API by providing a number of convenience functions. In order to access a storage for a specific use case, -a `Storage` instance has to be created using the `create()` function from the companion object. The function expects -an object with the current application configuration and a string defining the current use case. As mentioned earlier, -different storage implementations can be configured for different data to be stored. To resolve the desired -implementation for the current use case, the `create()` function searches for a configuration section under the -given identifier. This section must at least contain a `name` property referencing the `StorageProviderFactory` of -the selected implementation. Further, implementation-specific properties can be present in this section which the -factory can evaluate. With this information, `create()` can do the usual lookup and instantiate the correct -`StorageProvider`. - -To give a concrete example, we assume that a storage for report files should be configured. The configuration could -look as follows: - -[source] ----- -reportStorage { - name = database - namespace = reports -} ----- - -This fragment basically tells that the report storage is provided by an implementation with the name _database_. -The `namespace` property is evaluated by this implementation. Given this configuration, a `Storage` object for -storing report files can now be obtained in the following way: - -[source,kotlin] ----- -val config = ConfigFactory.load() - -val reportStorage = Storage.create("reportStorage", config) ----- - -The `Storage` class provides functionality that simplifies dealing with data that can be held in memory in form of -strings or byte arrays. Such objects can be read and written directly without having to deal with streams. It is -also responsible for catching all proprietary exceptions thrown by a `StorageProvider` implementation and wrapping -them inside `StorageException` objects. diff --git a/storage/README.md b/storage/README.md new file mode 100644 index 000000000..17e4686fe --- /dev/null +++ b/storage/README.md @@ -0,0 +1,105 @@ +# Storage Abstraction + +This document describes the storage abstraction used within the ORT Server to provide a generic mechanism to store arbitrary data temporarily or for a longer time. + +## Purpose + +Some components of ORT Server have the requirement to store (potentially large) data for different purposes. +One prominent example is storing the reports produced by the reporter. +They need to be persisted somewhere for a certain amount of time, so that they can be queried and accessed by users. +This use case requires a permanent storage for files, but there are other use cases as well where some data needs to be cached temporarily, e.g. storing license texts obtained by the scanner, so that they can be referenced in reports later, or caching metadata of packages for some package managers, so that it does not have to be retrieved on each analyzer run. + +These use cases have in common that arbitrary data has to be stored under a specific key. +A suitable abstraction would therefore be a generic storage interface oriented on a key/value storage. +With regard to potential implementations, there are different options that depend on the concrete storage characteristics of specific data, such as + +- which data needs to be stored +- how big is it +- how long is the data to be stored (short-term caching vs long-term persistence) + +Bringing this together, there can be a generic, key-value-based storage interface defining operations to access and manipulate data on a storage. +For this interface, different implementations exist. +This is in-line with other abstraction layer implementations used within ORT Server, e.g. for storing secrets or passing messages. +It allows integrating various storage mechanisms supported by the platform on which the ORT Server application is running. +In addition, a single application instance can be configured to use multiple storage implementations for different kinds of data, so that a high flexibility can be achieved. + +## Service Provider Interfaces + +This section describes the interfaces that need to be implemented in order to integrate a concrete storage product. + +### Access Interface + +The storage abstraction layer defines a basic interface, [StorageProvider](spi/src/main/kotlin/StorageProvider.kt), allowing to associate data with (string-based) keys. +The data is represented by streams, taking the fact into account that it can become potentially large. +(For instance, for the use case of storing report files, some of the files will have sizes with more than 10 MB.) +So, it can be accessed without loading it completely into memory. +The interface defines only a minimum set of CRUD operations to simplify its implementation. + +A dedicated value class has been introduced to represent storage keys. +The function to query data returns a `StorageEntry` object which contains the `InputStream` with the actual data and a string with its content-type. +The idea behind this is that the ORT Server API may in some cases provide direct access to stored data, as is the case for serving report files. +The corresponding endpoint can then easily set the content-type header accordingly. +A future extension could be adding support for other/arbitrary metadata properties. + +The `StorageProvider` interface defines the `write` operation to create and update entries in the storage. +It expects the key and the stream with the data to store. +Additionally, the (optional) content-type can be provided. +Another mandatory parameter is the length of the data. +This information is required by some storage products, for instance by Azure Storage Accounts. +Since the size of the data cannot be obtained easily from the passed in stream, it has to be provided explicitly. + +The interface defines further operations to remove an entry from the storage and to check whether an entry with a given key exists. +For the time being, there is no operation to list all existing keys. +The expectation is that each use case implemented with a storage defines a specific convention how to construct keys and how they are supposed to be interpreted. + +There are a few further assumptions taken by the abstraction layer implementation to simplify concrete implementations of the `StorageProvider` interface: + +- A concrete implementation can throw arbitrary, proprietary exceptions. + These are caught by the abstraction and wrapped into a standard `StorageException` instance. +- The `read()` operation should throw an exception when the passed in key does not exist. + Clients should use the `contains()` operation to make sure that the desired data is actually available. + +### Factory Interface + +As is typical for the abstraction layers used within ORT Server, the storage abstraction defines a factory interface to create a specific `StorageProvider` instance based on the application configuration: +[StorageProviderFactory](spi/src/main/kotlin/StorageProviderFactory.kt). + +The interface defines a `name` property, which is used to look up a specific factory from the classpath - as usual, the available factories are obtained via the Java ServiceLoader mechanism, and the selected one is matched by its name. + +The factory function expects a configuration object as parameter. +A concrete implementation can define and evaluate custom configuration options which are accessible from this object. + +## Using a Storage + +With the [Storage](spi/src/main/kotlin/Storage.kt) class, the storage abstraction defines a facade class that handles the creation and initialization of a concrete `StorageProvider` instance and simplifies the usage of the storage API by providing a number of convenience functions. +In order to access a storage for a specific use case, a `Storage` instance has to be created using the `create()` function from the companion object. +The function expects an object with the current application configuration and a string defining the current use case. +As mentioned earlier, different storage implementations can be configured for different data to be stored. +To resolve the desired implementation for the current use case, the `create()` function searches for a configuration section under the given identifier. +This section must at least contain a `name` property referencing the `StorageProviderFactory` of the selected implementation. +Further, implementation-specific properties can be present in this section which the factory can evaluate. +With this information, `create()` can do the usual lookup and instantiate the correct `StorageProvider`. + +To give a concrete example, we assume that a storage for report files should be configured. +The configuration could look as follows: + +``` +reportStorage { + name = database + namespace = reports +} +``` + +This fragment basically tells that the report storage is provided by an implementation with the name *database*. +The `namespace` property is evaluated by this implementation. +Given this configuration, a `Storage` object for storing report files can now be obtained in the following way: + +``` kotlin +val config = ConfigFactory.load() + +val reportStorage = Storage.create("reportStorage", config) +``` + +The `Storage` class provides functionality that simplifies dealing with data that can be held in memory in form of strings or byte arrays. +Such objects can be read and written directly without having to deal with streams. +It is also responsible for catching all proprietary exceptions thrown by a `StorageProvider` implementation and wrapping them inside `StorageException` objects. diff --git a/storage/database/README.adoc b/storage/database/README.adoc deleted file mode 100644 index 1e9d4156e..000000000 --- a/storage/database/README.adoc +++ /dev/null @@ -1,74 +0,0 @@ -= Database Storage Implementation - -This module provides an implementation of the link:../README.adoc[Storage Abstraction] that is backed by a database -table. Arbitrary data is stored using the https://jdbc.postgresql.org/documentation/binary-data/[Large Objects] -mechanism of PostgreSQL. - -== Synopsis -This implementation of the Storage Abstraction does not require any external services, but makes use of the ORT Server -database to store data. It accesses the default database provided by Exposed; therefore, it does not require a -dedicated database configuration. - -One goal of this implementation is to support data that is potentially large and should therefore not be loaded into -memory. With a PostgreSQL database, this is currently only possible by using large objects. Here the storage table -only contains a reference to a large object (a long object ID) holding the actual data. The object is stored -separately and needs to be accessed and manipulated by a PostgreSQL-specific API. - -NOTE: PostgreSQL also supports the `bytea` datatype, which is easier to use; but this type is only suitable for data -of limited size because the full data is always read into memory. - -One restriction of the large objects mechanism is that all access to data is only possible within an active -transaction. This does not fit well to the `StorageProvider` interface, which hands over a stream to the client which -is consumed later - at that time, the transaction is already gone. This implementation solves this problem in the -following way: - -* Small data (whose size is below a configurable threshold) is loaded into memory and passed to the client as a - `ByteArrayInputStream`. -* Larger data is copied from the database into a temporary file. Then the provider hands over a special stream to the - client that deletes the file when it gets closed. Thus, the data is accessible outside the transaction. (But care - should be taken that the stream is always closed.) - -Other than that, the implementation is a rather straight-forward mapping from the `StorageProvider` interface to a -database table. The table can be shared between different -link:src/main/kotlin/DatabaseStorageProvider.kt[DatabaseStorageProvider] instances storing different kinds of data. -To make this possible, it contains a discriminator column named `namespace`. The namespace to use must be specified in -the configuration. - -== Configuration -When creating a `StorageProvider` instance via the `Storage` class the _storage type_ must be provided. This allows -using different kinds of storages for different data in ORT Server. The provider-specific configuration is then -expected in a configuration section whose name matches the storage type. - -The following fragment shows an example configuration for the database storage provider. It assumes that the provider -is used for storing reports; so the provider-specific configuration is located below the `reports` element: - -.Example configuration of the database storage provider -[source] ----- -reports { - name = "database" - namespace = "reports" - inMemoryLimit = 65536 -} ----- - -The `name` property selects the provider implementation. It must be set to _database_ to select the database storage -provider implementation. The other properties are specific to this implementation and are explained in the table below: - -.Supported configuration options -[cols="1,3",options=header] -|=== -|Property -|Description - -|namespace -|Defines the namespace to be used for the data stored by this provider instance. This allows distinguishing different -kinds of data that are all managed by the database storage implementation. This is somewhat redundant to the -_storage type_ which determines the configuration section. Since this property is not available in the configuration -passed to the storage provider implementation, a dedicated property is needed. Its value must be unique for all -database storage provider instances in use. - -|inMemoryLimit -|Defines the size of data (in bytes) that can be loaded into memory. Data that exceeds this size is buffered in a -temporary file when it is accessed. -|=== diff --git a/storage/database/README.md b/storage/database/README.md new file mode 100644 index 000000000..1f2c3ac49 --- /dev/null +++ b/storage/database/README.md @@ -0,0 +1,58 @@ +# Database Storage Implementation + +This module provides an implementation of the [Storage Abstraction](../README.adoc) that is backed by a database table. +Arbitrary data is stored using the [Large Objects](https://jdbc.postgresql.org/documentation/binary-data/) mechanism of PostgreSQL. + +## Synopsis + +This implementation of the Storage Abstraction does not require any external services, but makes use of the ORT Server database to store data. +It accesses the default database provided by Exposed; therefore, it does not require a dedicated database configuration. + +One goal of this implementation is to support data that is potentially large and should therefore not be loaded into memory. +With a PostgreSQL database, this is currently only possible by using large objects. +Here the storage table only contains a reference to a large object (a long object ID) holding the actual data. +The object is stored separately and needs to be accessed and manipulated by a PostgreSQL-specific API. + +> [!NOTE] +> PostgreSQL also supports the `bytea` datatype, which is easier to use; but this type is only suitable for data of limited size because the full data is always read into memory. + +One restriction of the large objects mechanism is that all access to data is only possible within an active transaction. +This does not fit well to the `StorageProvider` interface, which hands over a stream to the client which is consumed later - at that time, the transaction is already gone. +This implementation solves this problem in the following way: + +- Small data (whose size is below a configurable threshold) is loaded into memory and passed to the client as a `ByteArrayInputStream`. +- Larger data is copied from the database into a temporary file. + Then the provider hands over a special stream to the client that deletes the file when it gets closed. + Thus, the data is accessible outside the transaction. + (But care should be taken that the stream is always closed.) + +Other than that, the implementation is a rather straight-forward mapping from the `StorageProvider` interface to a database table. +The table can be shared between different [DatabaseStorageProvider](src/main/kotlin/DatabaseStorageProvider.kt) instances storing different kinds of data. +To make this possible, it contains a discriminator column named `namespace`. +The namespace to use must be specified in the configuration. + +## Configuration + +When creating a `StorageProvider` instance via the `Storage` class the *storage type* must be provided. +This allows using different kinds of storages for different data in ORT Server. +The provider-specific configuration is then expected in a configuration section whose name matches the storage type. + +The following fragment shows an example configuration for the database storage provider. +It assumes that the provider is used for storing reports; so the provider-specific configuration is located below the `reports` element: + +``` +reports { + name = "database" + namespace = "reports" + inMemoryLimit = 65536 +} +``` + +The `name` property selects the provider implementation. +It must be set to *database* to select the database storage provider implementation. +The other properties are specific to this implementation and are explained in the table below: + +| Property | Description | +|---------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| namespace | Defines the namespace to be used for the data stored by this provider instance. This allows distinguishing different kinds of data that are all managed by the database storage implementation. This is somewhat redundant to the *storage type* which determines the configuration section. Since this property is not available in the configuration passed to the storage provider implementation, a dedicated property is needed. Its value must be unique for all database storage provider instances in use. | +| inMemoryLimit | Defines the size of data (in bytes) that can be loaded into memory. Data that exceeds this size is buffered in a temporary file when it is accessed. | diff --git a/transport/README.adoc b/transport/README.md similarity index 71% rename from transport/README.adoc rename to transport/README.md index 738386d7f..fd1be27f7 100644 --- a/transport/README.adoc +++ b/transport/README.md @@ -1,35 +1,36 @@ -= Transport +# Transport This document describes the transport abstraction layer used within the ORT server to be independent of the concrete environment in which the server is running. -This folder contains the _spi_ module defining the basic Service Provider Interfaces of the transport abstraction layer. +This folder contains the *spi* module defining the basic Service Provider Interfaces of the transport abstraction layer. Then there are modules providing concrete implementations of these interfaces. The latter have their own documentation. -This document focuses on the _spi_ module and the concepts it introduces. +This document focuses on the *spi* module and the concepts it introduces. -== Purpose +## Purpose In the ORT server, there are multiple components that need to communicate with each other. For instance, when a new request to trigger an analysis run is received via the REST API, it has to be forwarded to the Orchestrator. The Orchestrator then sends a message to the Analyzer worker to start the analysis of the affected repository. The results produced during this analysis need then to be passed back to the Orchestrator. -The concrete mechanisms used for exchanging such messages depend on the environment in which a server instance is running: An ORT server hosted on AWS may use different messaging services than one on Azure; and a local installation may look completely different. +The concrete mechanisms used for exchanging such messages depend on the environment in which a server instance is running: +An ORT server hosted on AWS may use different messaging services than one on Azure; and a local installation may look completely different. To abstract away the differences of specific transport mechanisms, the abstraction layer declares a set of classes and interfaces that define a generic protocol to send or receive messages. All ORT components use these interfaces exclusively to exchange messages and are therefore agnostic of the underlying infrastructure. At runtime, a specific implementation - suitable to the current environment - is selected based on configuration properties. This implementation wires the ORT components together using a platform-specific messaging mechanism. -== Service Provider Interfaces +## Service Provider Interfaces -This section describes the classes and interfaces defined by the _Service Provider Interfaces_ (SPI) module and the underlying concepts. +This section describes the classes and interfaces defined by the *Service Provider Interfaces* (SPI) module and the underlying concepts. Concrete transport implementations have to implement the interfaces defined here. -=== Endpoints and Messages +### Endpoints and Messages Messages are always sent to specific ORT components that are then responsible for their processing. -Since the number of ORT components is finite, this is also the case for the potential message receivers or _endpoints_. -They can therefore be represented by a number of constants - the subclasses of the sealed link:spi/src/main/kotlin/Endpoint.kt[Endpoint] class. +Since the number of ORT components is finite, this is also the case for the potential message receivers or *endpoints*. +They can therefore be represented by a number of constants - the subclasses of the sealed [Endpoint](spi/src/main/kotlin/Endpoint.kt) class. Each subclass defines some metadata about the endpoint that is evaluated when setting up the communication infrastructure. An endpoint can process messages of a specific type. @@ -37,35 +38,36 @@ If an endpoint handles multiple messages, they are organized in a hierarchy of s This makes it possible to define type-safe interfaces for sending and receiving messages without having to deal with component-specific protocols. For instance, the message sender interface has a single method to send a message of the base type to the target endpoint, instead of multiple methods for the different use cases supported by the receiving component. -Messages are represented by the link:spi/src/main/kotlin/Message.kt[Message] class. +Messages are represented by the [Message](spi/src/main/kotlin/Message.kt) class. It consists of -* a message header defining some metadata properties -* the actual message payload whose type is derived from the target endpoint. +- a message header defining some metadata properties +- the actual message payload whose type is derived from the target endpoint. -In the message header, a map with properties to be evaluated by the transport implementation is contained. The map is populated from the labels passed to the current ORT run; so it basically stems from the caller. Using this mechanism, it is possible to customize the behavior of the transport for a specific run. The concept is described in detail at link:../docs/architecture/different_tool_versions.adoc[Support for Different Tool Versions]. +In the message header, a map with properties to be evaluated by the transport implementation is contained. +The map is populated from the labels passed to the current ORT run; so it basically stems from the caller. +Using this mechanism, it is possible to customize the behavior of the transport for a specific run. +The concept is described in detail at [Support for Different Tool Versions](../docs/architecture/different_tool_versions.adoc). -[#_factories] -=== Factories +### Factories In order to send or receive messages, the infrastructure for message exchange must have been properly set up. This is done via the factory interfaces `MessageSenderFactory` and `MessageReceiverFactory`. Both interfaces provide static `create` functions that can be used to create sender or receiver instances compatible with the current environment. They work as follows: -* The target endpoint for sending or receiving messages has to be provided. -* From the configuration, the factory function looks up the transport implementation configured for this endpoint. -* The factory function uses a `ServiceLoader` to find available implementations. -Each existing implementation is assigned a unique name which is matched against the name obtained from the configuration. -* The implementation determined this way is invoked to create the actual sender or receiver object. +- The target endpoint for sending or receiving messages has to be provided. +- From the configuration, the factory function looks up the transport implementation configured for this endpoint. +- The factory function uses a `ServiceLoader` to find available implementations. + Each existing implementation is assigned a unique name which is matched against the name obtained from the configuration. +- The implementation determined this way is invoked to create the actual sender or receiver object. The factory functions rely on the presence of certain configuration properties to determine the correct transport implementation. In theory, each endpoint could be reached via a different transport implementation; therefore, the configuration is endpoint-specific. The `Endpoint` classes define a prefix for configuration keys; the configuration for a specific endpoint is located under this key. The general configuration looks as follows: -[source] ----- +``` analyzer { receiver { type = "transportName" @@ -79,7 +81,7 @@ orchestrator { otherProperty = "other value" } } ----- +``` This fragment shows an example configuration for the Analyzer component (which is configured as message receiver). Here `analyzer` is the configuration prefix defined for the Analyzer endpoint. @@ -92,32 +94,30 @@ Its structure is analogous, but as it is used for sending messages, the transpor The following sections contain examples how to use this mechanism in practice. -=== Sending Messages +### Sending Messages -In order to send a message to a specific endpoint, one has to obtain a link:spi/src/main/kotlin/MessageSender.kt[MessageSender] from a link:spi/src/main/kotlin/MessageSenderFactory.kt[MessageSenderFactory]. -Based on the example configuration contained at <<_factories>>, this fragment shows how a message to the Orchestrator can be sent: +In order to send a message to a specific endpoint, one has to obtain a [MessageSender](spi/src/main/kotlin/MessageSender.kt) from a [MessageSenderFactory](spi/src/main/kotlin/MessageSenderFactory.kt). +Based on the example configuration contained at [Factories](#factories), this fragment shows how a message to the Orchestrator can be sent: -[source,kotlin] ----- +``` kotlin val payload = AnalyzeResult(42) val header = MessageHeader(token = "1234567890", traceId = "dick.tracy") val message = Message(header, payload) val sender = MessageSenderFactory.createSender(OrchestratorEndpoint, config) sender.send(message) ----- +``` Message senders should be obtained once, probably at component startup, and can then be reused during the lifetime of the component. Note that the interface is typesafe; you can only send messages to an endpoint that it can process. -=== Receiving messages +### Receiving messages A component that can handle messages should set up a corresponding receiver when it starts. -This is done via the link:spi/src/main/kotlin/MessageReceiverFactory.kt[MessageReceiverFactory] interface and involves specifying a handler function or lambda that is invoked for the incoming messages. +This is done via the [MessageReceiverFactory](spi/src/main/kotlin/MessageReceiverFactory.kt) interface and involves specifying a handler function or lambda that is invoked for the incoming messages. The example fragment below shows how the initialization code of the Orchestrator might look like: -[source,kotlin] ----- +``` kotlin // Message handler function fun handler(message: Message) { // Message handling code @@ -125,21 +125,21 @@ fun handler(message: Message) { // Install receiver MessageReceiverFactory.createReceiver(OrchestratorEndpoint, config, ::handler) ----- +``` The `createReceiver` call is blocking. It enters the message loop, which will wait for new messages and dispatch them to the handler function. -== Testing support +## Testing support -To simplify testing of message exchange between ORT server components, this module exposes a test transport implementation as a https://docs.gradle.org/current/userguide/java_testing.html#sec:java_test_fixtures[test fixture]. +To simplify testing of message exchange between ORT server components, this module exposes a test transport implementation as a [test fixture](https://docs.gradle.org/current/userguide/java_testing.html#sec:java_test_fixtures). It can be enabled in the configuration of an endpoint like regular transport implementations using the name "testMessageTransport"; so a test class could create a special test configuration that refers to the testing transport. The implementation consists of the two factory classes `MessageSenderFactoryForTesting` and `MessageReceiverFactoryForTesting`. Both provide companion objects that can be used to interact with message senders and receivers in a controlled way: -* With `MessageSenderFactoryForTesting.expectMessage()`, it can be tested whether the code under test has sent a message to a specific endpoint; this message is returned and can be further inspected. -* `MessageReceiverFactoryForTesting.receive()` allows simulating an incoming message to an endpoint. -The function passes the provided message to the `EndpointHandler` function used by the owning endpoint. +- With `MessageSenderFactoryForTesting.expectMessage()`, it can be tested whether the code under test has sent a message to a specific endpoint; this message is returned and can be further inspected. +- `MessageReceiverFactoryForTesting.receive()` allows simulating an incoming message to an endpoint. + The function passes the provided message to the `EndpointHandler` function used by the owning endpoint. These test implementations allow an end-to-end test of an ORT server endpoint: from an incoming request to the response(s) sent to other endpoints. diff --git a/transport/activemqartemis/README.adoc b/transport/activemqartemis/README.md similarity index 84% rename from transport/activemqartemis/README.adoc rename to transport/activemqartemis/README.md index fb8a8760b..fa6c4d479 100644 --- a/transport/activemqartemis/README.adoc +++ b/transport/activemqartemis/README.md @@ -1,8 +1,8 @@ -= ActiveMQ Artemis Transport implementation +# ActiveMQ Artemis Transport implementation -This module provides an implementation of the transport abstraction layer based on https://activemq.apache.org/components/artemis/[Apache ActiveMQ Artemis]. +This module provides an implementation of the transport abstraction layer based on [Apache ActiveMQ Artemis](https://activemq.apache.org/components/artemis/). -== Synopsis +## Synopsis The module allows message exchange via ActiveMQ message queues. It assumes that the queues in use are already configured via an external mechanism. @@ -14,15 +14,14 @@ Metadata from the message header is represented by JMS message properties. In order to use this module, the `type` property in the transport configuration must be set to `activeMQ`. -== Configuration +## Configuration The configuration for message senders and receivers is identical. Both require the URI to the message broker server and the name of the address to send messages to or receive messages from. The following fragment shows the general structure: -[source] ----- +``` endpoint { sender/receiver: { type = "activeMQ" @@ -30,4 +29,4 @@ endpoint { queueName = "my_message_queue" } } ----- +``` diff --git a/transport/kubernetes-jobmonitor/README.adoc b/transport/kubernetes-jobmonitor/README.adoc deleted file mode 100644 index 7d552bd5e..000000000 --- a/transport/kubernetes-jobmonitor/README.adoc +++ /dev/null @@ -1,95 +0,0 @@ -= Kubernetes Job Monitor Component - -This module is an add-on to the link:../kubernetes/README.adoc[Kubernetes Transport] implementation. It implements -robust job handling and cleanup of completed jobs. - -== Synopsis -Workers spawned by the Orchestrator report their status - success or failure - on completion by sending a corresponding -message back to the Orchestrator. That way the Orchestrator can keep track on an ongoing ORT run and trigger the next -steps to make progress. - -In a distributed setup, however, there is always a chance that a worker job crashes completely before it can even send -a failure message. In that scenario, without any further means, the Orchestrator would not be aware of the (abnormal) -termination of the job; thus the whole run would stall. - -The purpose of this component is to prevent this by implementing an independent mechanism to detect failed jobs and -sending corresponding notifications to the Orchestrator. With this in place, it is guaranteed that the Orchestrator is -always notified about the outcome of a job it has triggered. - -== Functionality -For the detection of failed jobs, the Job Monitor component actually implements multiple strategies: - -* It uses the https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes[Kubernetes Watch API] - to receive notifications about changes in the current state of jobs. Based on such change events, it can detect - failed jobs and act accordingly. -* In addition, it lists the currently active jobs periodically and inspects this list for failed jobs. This is done for - the following reasons: - ** The scanning of jobs in regular intervals is a safety net in case a relevant change event was missed by the - watching part. This could happen for instance if the monitor component was shortly down or gets restarted. It is - then still guaranteed that the Orchestrator eventually receives a notification. - ** Based on the job list, it is possible to remove completed jobs and their associated pods. This is not done - out-of-the-box by Kubernetes; so the set of completed jobs would permanently grow. Therefore, the monitor component - does an automatic cleanup of older jobs. -* Practice has shown that the strategies described so far are still not sufficient to handle all potential failure scenarios: It is possible - probably related to certain infrastructure failures - that Kubernetes jobs simply disappear without a notification being received via the watch API. This effect can also be achieved by simply killing a job via a `kubectl delete job` command. Then also the safety net with listing the existing jobs and checking for failures does not help, since the affected jobs no longer exist. The ORT run owning the job would then never be marked as completed. Therefore, there is another component referred to as _lost jobs finder_, which basically does a periodic sync between the jobs that should be active according to the ORT Server database and the actual jobs running on Kubernetes. If this component detects jobs that are expected to be active on Kubernetes, but are missing, it notifies the Orchestrator about them, which can then act accordingly. - -== Configuration -Some aspects of the component can be configured in the module's configuration file or via environment variables. The -fragment below shows the available configuration options: - -.Configuration example -[source] ----- -jobMonitor { - namespace = "ortserver" - enableWatching = true - enableReaper = true - reaperInterval = 600 - enableLostJobs = true - lostJobsInterval = 120 - lostJobsMinAge = 30 -} ----- - -The properties have the following meaning: - -.Configuration options -[cols="1,1,3",options="header"] -|=== -| Property | Variable | Description - -| namespace -| MONITOR_NAMESPACE -| Defines the namespace in which jobs are to be monitored. This is typically the same namespace this component is -deployed in. - -| enableWatching -| MONITOR_WATCHING_ENABLED -| A flag that controls whether the watching mechanism is enabled. If set to *false*, the component will not register -itself as a watcher for job changes. This can be useful for instance in a test environment where failed jobs should not -be cleaned up immediately. - -| enableReaper -| MONITOR_REAPER_ENABLED -| A flag that controls whether the part that scans for completed and failed jobs periodically (aka the _Reaper_) is -active. Again, it can be useful to disable this part to diagnose problems with failed jobs. - -| reaperInterval -| MONITOR_REAPER_INTERVAL -| The interval in which the periodic scans for completed and failed jobs are done (in seconds). This can be used to -fine-tune the time completed jobs are kept. - -|enableLostJobs -|MONITOR_LOST_JOBS_ENABLED -|A flag that controls whether the lost jobs finder component is enabled. If this component is active, a valid database configuration must be provided as well. - -|lostJobsInterval -|MONITOR_LOST_JOBS_INTERVAL -|The interval in which the lost jobs finder component executes its checks (in seconds). Since a check requires some database queries, a balance has to be found between the load on the system caused by this and the delay of notifications sent to the Orchestrator. As the scenario of lost jobs should be rather rare, a longer interval is probably acceptable. - -|lostJobsMinAge -|MONITOR_LOST_JOBS_MIN_AGE -|The minimum age of a job (in seconds) to be taken into account by the lost jobs finder component. This setting addresses potential race conditions that might be caused by delays between creating an entry in the database and starting the corresponding job in Kubernetes; in an extreme case, a job would be considered as lost before it even started on Kubernetes. -|=== - -In addition to these options, the configuration must contain a section defining the link:../README.adoc[transport] -for sending notifications to the Orchestrator. diff --git a/transport/kubernetes-jobmonitor/README.md b/transport/kubernetes-jobmonitor/README.md new file mode 100644 index 000000000..fb88c090d --- /dev/null +++ b/transport/kubernetes-jobmonitor/README.md @@ -0,0 +1,68 @@ +# Kubernetes Job Monitor Component + +This module is an add-on to the [Kubernetes Transport](../kubernetes/README.adoc) implementation. +It implements robust job handling and cleanup of completed jobs. + +## Synopsis + +Workers spawned by the Orchestrator report their status - success or failure - on completion by sending a corresponding message back to the Orchestrator. +That way the Orchestrator can keep track on an ongoing ORT run and trigger the next steps to make progress. + +In a distributed setup, however, there is always a chance that a worker job crashes completely before it can even send a failure message. +In that scenario, without any further means, the Orchestrator would not be aware of the (abnormal) termination of the job; thus the whole run would stall. + +The purpose of this component is to prevent this by implementing an independent mechanism to detect failed jobs and sending corresponding notifications to the Orchestrator. +With this in place, it is guaranteed that the Orchestrator is always notified about the outcome of a job it has triggered. + +## Functionality + +For the detection of failed jobs, the Job Monitor component actually implements multiple strategies: + +- It uses the [Kubernetes Watch API](https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes) to receive notifications about changes in the current state of jobs. + Based on such change events, it can detect failed jobs and act accordingly. +- In addition, it lists the currently active jobs periodically and inspects this list for failed jobs. + This is done for the following reasons: + - The scanning of jobs in regular intervals is a safety net in case a relevant change event was missed by the watching part. + This could happen for instance if the monitor component was shortly down or gets restarted. + It is then still guaranteed that the Orchestrator eventually receives a notification. + - Based on the job list, it is possible to remove completed jobs and their associated pods. + This is not done out-of-the-box by Kubernetes; so the set of completed jobs would permanently grow. + Therefore, the monitor component does an automatic cleanup of older jobs. +- Practice has shown that the strategies described so far are still not sufficient to handle all potential failure scenarios: + It is possible - probably related to certain infrastructure failures - that Kubernetes jobs simply disappear without a notification being received via the watch API. + This effect can also be achieved by simply killing a job via a `kubectl delete job` command. + Then also the safety net with listing the existing jobs and checking for failures does not help, since the affected jobs no longer exist. + The ORT run owning the job would then never be marked as completed. + Therefore, there is another component referred to as *lost jobs finder*, which basically does a periodic sync between the jobs that should be active according to the ORT Server database and the actual jobs running on Kubernetes. + If this component detects jobs that are expected to be active on Kubernetes, but are missing, it notifies the Orchestrator about them, which can then act accordingly. + +## Configuration + +Some aspects of the component can be configured in the module’s configuration file or via environment variables. +The fragment below shows the available configuration options: + +``` +jobMonitor { + namespace = "ortserver" + enableWatching = true + enableReaper = true + reaperInterval = 600 + enableLostJobs = true + lostJobsInterval = 120 + lostJobsMinAge = 30 +} +``` + +The properties have the following meaning: + +| Property | Variable | Description | +|------------------|-------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| namespace | MONITOR\_NAMESPACE | Defines the namespace in which jobs are to be monitored. This is typically the same namespace this component is deployed in. | +| enableWatching | MONITOR\_WATCHING\_ENABLED | A flag that controls whether the watching mechanism is enabled. If set to **false**, the component will not register itself as a watcher for job changes. This can be useful for instance in a test environment where failed jobs should not be cleaned up immediately. | +| enableReaper | MONITOR\_REAPER\_ENABLED | A flag that controls whether the part that scans for completed and failed jobs periodically (aka the *Reaper*) is active. Again, it can be useful to disable this part to diagnose problems with failed jobs. | +| reaperInterval | MONITOR\_REAPER\_INTERVAL | The interval in which the periodic scans for completed and failed jobs are done (in seconds). This can be used to fine-tune the time completed jobs are kept. | +| enableLostJobs | MONITOR\_LOST\_JOBS\_ENABLED | A flag that controls whether the lost jobs finder component is enabled. If this component is active, a valid database configuration must be provided as well. | +| lostJobsInterval | MONITOR\_LOST\_JOBS\_INTERVAL | The interval in which the lost jobs finder component executes its checks (in seconds). Since a check requires some database queries, a balance has to be found between the load on the system caused by this and the delay of notifications sent to the Orchestrator. As the scenario of lost jobs should be rather rare, a longer interval is probably acceptable. | +| lostJobsMinAge | MONITOR\_LOST\_JOBS\_MIN\_AGE | The minimum age of a job (in seconds) to be taken into account by the lost jobs finder component. This setting addresses potential race conditions that might be caused by delays between creating an entry in the database and starting the corresponding job in Kubernetes; in an extreme case, a job would be considered as lost before it even started on Kubernetes. | + +In addition to these options, the configuration must contain a section defining the [transport](../README.adoc) for sending notifications to the Orchestrator. diff --git a/transport/kubernetes/README.adoc b/transport/kubernetes/README.adoc deleted file mode 100644 index bdd65bf56..000000000 --- a/transport/kubernetes/README.adoc +++ /dev/null @@ -1,195 +0,0 @@ -= Kubernetes Transport Implementation - -This module provides an implementation of the transport abstraction layer using the -https://github.com/kubernetes-client/java/[Kubernetes Java Client]. - -== Synopsis - -The Kubernetes transport abstraction layer exchanges messages via environment variables. -The sender creates a Kubernetes Job using the Kubernetes API, that runs the configured container image and sets the message content as environment variables. -The container started by the Kubernetes Job acts as the receiver and constructs the message from the environment variables. - -== Configuration - -In order to use this module, the `type` property in the transport configuration must be set to `kubernetes`. -For the sender part, a number of properties can be provided to configure the resulting pods as shown in the fragment -below: - -[source] ----- -endpoint { - sender { - type = "kubernetes" - namespace = "namespace-inside-cluster" - imageName = "image-to-run" - imagePullPolicy = "Always" - imagePullSecret = "my-secret" - restartPolicy = "" - backoffLimit = 5 - commands = "/bin/sh" - args = "-c java my.pkg.MyClass" - mountSecrets = "server-secrets->/mnt/secrets server-certificates->/mnt/certificates" - annotationVariables = "ANNOTATION_VAR1, ANNOTATION_VAR2" - serviceAccount = "my_service_account" - cpuRequest = 250m - cpuLimit = 500m - memoryRequest = 64Mi - memoryLimit = 128Mi - } -} ----- - -The properties have the following meaning: - -[#tab_kubernetes_config] -.Supported configuration properties -[cols="1,3,1",options=header] -|=== -|Property |Description |Default - -|namespace -|The namespace inside the Kubernetes cluster, in which the job will be created. -|none - -|imageName -|The full name of the container image from which the pod is created, including the tag name. The value can contain variables that are resolved based on message properties. -|none - -|imagePullPolicy -|Defines the https://kubernetes.io/docs/concepts/containers/images/#image-pull-policy[pull policy] for the container -image, one of `IfNotPresent`, `Always`, or `Never`. -|`Never` - -|imagePullSecret -|The name of the secret to be used to connect to the container registry when pulling the container image. This is -needed when using private registries that require authentication. -|empty - -|restartPolicy -|The https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#restart-policy[restart policy] for the -container, one of `Always`, `OnFailure`, or `Never`. -|`OnFailure` - -|backoffLimit -|Defines the https://kubernetes.io/docs/concepts/workloads/controllers/job/#pod-backoff-failure-policy[backup failure policy] -for the job to be created. This is the number of retries for failed pods attempted by Kubernetes before it considers -the job as failed. -|2 - -|commands -|The commands to be executed in the container. This can be used to overwrite the container's default command and -corresponds to the `command` property in the -https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/[Kubernetes pod configuration]. -Here a string can be specified. In order to obtain the array with commands expected by Kubernetes, the string is split -at whitespaces, unless the whitespace occurs in double quotes. -|empty - -|args -|The arguments to be passed to the container's start command. This corresponds to the `args` property in the -https://kubernetes.io/docs/tasks/inject-data-application/define-command-argument-container/[Kubernetes pod configuration]. -It is confusing that Kubernetes treats both properties, `command` and `args`, as arrays. In most examples, a single -string is used as `command`, while multiple arguments can be provided in the `args` array. Analogously to the -`commands` property, the string provided here is split at whitespaces to obtain the arguments. If a single argument -contains whitespaces, it needs to be surrounded by double quotes. -|empty - -|mountSecrets -|With this property, it is possible to mount the contents of Kubernetes -https://kubernetes.io/docs/concepts/configuration/secret/[secrets] as files into the resulting pod. The string is -interpreted as a sequence of mount declarations separated by whitespace. Each mount declaration has the form -_secret->mountPath_, where _secret_ is the name of a secret, and _mountPath_ is a path in the container where the -content of the secret should be made available. At this path, for each key of the secret a file is created whose -content is the value of the key. To achieve this, the Kubernetes Transport implementation generates corresponding -`volume` and `volumeMount` declarations in the pod configuration. This mechanism is useful not only for secrets but -also for other kinds of external data that should be accessible from a pod, for instance custom certificates. -|empty - -|annotationVariables -a|It is often useful or necessary to declare specific annotations for the jobs that are created by the Kubernetes Transport implementation. This can be used for instance for documentation purposes, to group certain jobs, or to request specific services from the infrastructure. Since there can be an arbitrary number of annotations that also might become complex, it is difficult to use a single configuration property to define all annotations at once. (And using a dynamic set of configuration properties does not work well either with the typical approach configuration is read in ORT Server.) - -To deal with these issues, the implementation introduces a level of indirection: The _annotationVariables_ property does not contain the definitions for annotations itself, but only lists the names of environment variables - separated by comma - that declare annotations. Each referenced environment variable must have a value of the form `key=value`. The _key_ becomes the key of the annotation, the _value_ its value. - -As a concrete example, if _annotationVariables_ has the value - - annotationVariables = "VAR1, VAR2" - -and there are corresponding environment variables: - - VAR1=annotation1=value1 - VAR2=annotation2=value2 - -then the Kubernetes Transport implementation will produce a job declaration containing the following fragment: - -[source,yaml] ----- -template: - metadata: - annotations: - annotation1: value1 - annotation2: value2 ----- - -If variables are referenced that do not exist or do not contain an equals ('=') character in their value to separate the key from the value, a warning is logged, and those variables are ignored. -|empty - -|serviceAccount -|Allows specifying the name of a service account that is assigned to newly created pods. Service accounts can be used to grant specific permissions to pods. -|null - -|cpuRequest -|Allows setting the request for the CPU resource. The value can contain variables that are resolved based on message properties. -|undefined - -|cpuLimit -|Allows setting the limit for the CPU resource. The value can contain variables that are resolved based on message properties. -|undefined - -|memoryRequest -|Allows setting the request for the memory resource. The value can contain variables that are resolved based on message properties. -|undefined - -|memoryLimit -|Allows setting the limit for the memory resource. The value can contain variables that are resolved based on message properties. -|undefined - -|=== - -While the configuration is static for a deployment of ORT Server, there are use cases that require changing some of the settings dynamically for a specific ORT run. For instance, if the run processes a large repository, the memory limits might need to be increased. To make this possible, the values of some properties can contain variables that are resolved from the properties of the current message. Table <> indicates, which properties support this mechanism. Variables follow the popular syntax `$+{variable}+`. - -To give an example, an excerpt from the configuration could look as follows: - -[source] ----- -endpoint { - sender { - type = "kubernetes" - memoryLimit = ${memory} - ... - } -} ----- - -If the message now has the following set in its `transportProperties`: - - kubernetes.memory = 768M - -Then the memory limit of the pod to be created will be set to 768 megabytes. - -NOTE: The receiver part does not need any specific configuration settings except for the transport type itself. - -== Inheritance of environment variables -Per default, when creating a new job, the `KubernetesMessageSender` passes all environment variables defined for the -current pod to the specification of the new job. That way common variables like service credentials can be shared -between pods. - -A problem can arise though if there are name clashes with environment variables, e.g. if the new job requires a -different value in a variable than the current pod. To address such problems, the Kubernetes transport protocol -supports a simple mapping mechanism for variable names that start with a prefix derived from the target endpoint: -When setting up the environment variables for the new job it checks for variables whose name starts with the prefix -name of the target endpoint in capital letters followed by an underscore. This prefix is then removed from the -variable in the environment of the new job. - -For instance, in order to set the `HOME` variable for the Analyzer worker to a specific value, define a variable -`ANALYZER_HOME` in the Orchestrator pod. When then a new Analyzer job is created, its `HOME` variable get initialized -from the value of the `ANALYZER_HOME` variable. An existing `HOME` variable in the Orchestrator pod will not conflict -with this other value. diff --git a/transport/kubernetes/README.md b/transport/kubernetes/README.md new file mode 100644 index 000000000..94cb8175f --- /dev/null +++ b/transport/kubernetes/README.md @@ -0,0 +1,186 @@ +# Kubernetes Transport Implementation + +This module provides an implementation of the transport abstraction layer using the [Kubernetes Java Client](https://github.com/kubernetes-client/java/). + +## Synopsis + +The Kubernetes transport abstraction layer exchanges messages via environment variables. +The sender creates a Kubernetes Job using the Kubernetes API, that runs the configured container image and sets the message content as environment variables. +The container started by the Kubernetes Job acts as the receiver and constructs the message from the environment variables. + +## Configuration + +In order to use this module, the `type` property in the transport configuration must be set to `kubernetes`. +For the sender part, a number of properties can be provided to configure the resulting pods as shown in the fragment below: + +``` +endpoint { + sender { + type = "kubernetes" + namespace = "namespace-inside-cluster" + imageName = "image-to-run" + imagePullPolicy = "Always" + imagePullSecret = "my-secret" + restartPolicy = "" + backoffLimit = 5 + commands = "/bin/sh" + args = "-c java my.pkg.MyClass" + mountSecrets = "server-secrets->/mnt/secrets server-certificates->/mnt/certificates" + annotationVariables = "ANNOTATION_VAR1, ANNOTATION_VAR2" + serviceAccount = "my_service_account" + cpuRequest = 250m + cpuLimit = 500m + memoryRequest = 64Mi + memoryLimit = 128Mi + } +} +``` + +The properties have the following meaning: + + +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
PropertyDescriptionDefault

namespace

The namespace inside the Kubernetes cluster, in which the job will be created.

none

imageName

The full name of the container image from which the pod is created, including the tag name. The value can contain variables that are resolved based on message properties.

none

imagePullPolicy

Defines the pull policy for the container image, one of IfNotPresent, Always, or Never.

Never

imagePullSecret

The name of the secret to be used to connect to the container registry when pulling the container image. This is needed when using private registries that require authentication.

empty

restartPolicy

The restart policy for the container, one of Always, OnFailure, or Never.

OnFailure

backoffLimit

Defines the backup failure policy for the job to be created. This is the number of retries for failed pods attempted by Kubernetes before it considers the job as failed.

2

commands

The commands to be executed in the container. This can be used to overwrite the container’s default command and corresponds to the command property in the Kubernetes pod configuration. Here a string can be specified. In order to obtain the array with commands expected by Kubernetes, the string is split at whitespaces, unless the whitespace occurs in double quotes.

empty

args

The arguments to be passed to the container’s start command. This corresponds to the args property in the Kubernetes pod configuration. It is confusing that Kubernetes treats both properties, command and args, as arrays. In most examples, a single string is used as command, while multiple arguments can be provided in the args array. Analogously to the commands property, the string provided here is split at whitespaces to obtain the arguments. If a single argument contains whitespaces, it needs to be surrounded by double quotes.

empty

mountSecrets

With this property, it is possible to mount the contents of Kubernetes secrets as files into the resulting pod. The string is interpreted as a sequence of mount declarations separated by whitespace. Each mount declaration has the form secret→mountPath, where secret is the name of a secret, and mountPath is a path in the container where the content of the secret should be made available. At this path, for each key of the secret a file is created whose content is the value of the key. To achieve this, the Kubernetes Transport implementation generates corresponding volume and volumeMount declarations in the pod configuration. This mechanism is useful not only for secrets but also for other kinds of external data that should be accessible from a pod, for instance custom certificates.

empty

annotationVariables

It is often useful or necessary to declare specific annotations for the jobs that are created by the Kubernetes Transport implementation. This can be used for instance for documentation purposes, to group certain jobs, or to request specific services from the infrastructure. Since there can be an arbitrary number of annotations that also might become complex, it is difficult to use a single configuration property to define all annotations at once. (And using a dynamic set of configuration properties does not work well either with the typical approach configuration is read in ORT Server.)

+

To deal with these issues, the implementation introduces a level of indirection: The annotationVariables property does not contain the definitions for annotations itself, but only lists the names of environment variables - separated by comma - that declare annotations. Each referenced environment variable must have a value of the form key=value. The key becomes the key of the annotation, the value its value.

+

As a concrete example, if annotationVariables has the value

+annotationVariables = "VAR1, VAR2" +

and there are corresponding environment variables:

+VAR1=annotation1=value1 VAR2=annotation2=value2 +

then the Kubernetes Transport implementation will produce a job declaration containing the following fragment:

+ +

If variables are referenced that do not exist or do not contain an equals ('=') character in their value to separate the key from the value, a warning is logged, and those variables are ignored.

empty

serviceAccount

Allows specifying the name of a service account that is assigned to newly created pods. Service accounts can be used to grant specific permissions to pods.

null

cpuRequest

Allows setting the request for the CPU resource. The value can contain variables that are resolved based on message properties.

undefined

cpuLimit

Allows setting the limit for the CPU resource. The value can contain variables that are resolved based on message properties.

undefined

memoryRequest

Allows setting the request for the memory resource. The value can contain variables that are resolved based on message properties.

undefined

memoryLimit

Allows setting the limit for the memory resource. The value can contain variables that are resolved based on message properties.

undefined

+ +While the configuration is static for a deployment of ORT Server, there are use cases that require changing some of the settings dynamically for a specific ORT run. +For instance, if the run processes a large repository, the memory limits might need to be increased. +To make this possible, the values of some properties can contain variables that are resolved from the properties of the current message. +The table above indicates, which properties support this mechanism. +Variables follow the popular syntax `$+{variable}+`. + +To give an example, an excerpt from the configuration could look as follows: + +``` +endpoint { + sender { + type = "kubernetes" + memoryLimit = ${memory} + ... + } +} +``` + +If the message now has the following set in its `transportProperties`: + +``` +kubernetes.memory = 768M +``` + +Then the memory limit of the pod to be created will be set to 768 megabytes. + +> [!NOTE] +> The receiver part does not need any specific configuration settings except for the transport type itself. + +## Inheritance of environment variables + +Per default, when creating a new job, the `KubernetesMessageSender` passes all environment variables defined for the current pod to the specification of the new job. +That way common variables like service credentials can be shared between pods. + +A problem can arise though if there are name clashes with environment variables, e.g. if the new job requires a different value in a variable than the current pod. +To address such problems, the Kubernetes transport protocol supports a simple mapping mechanism for variable names that start with a prefix derived from the target endpoint: +When setting up the environment variables for the new job it checks for variables whose name starts with the prefix name of the target endpoint in capital letters followed by an underscore. +This prefix is then removed from the variable in the environment of the new job. + +For instance, in order to set the `HOME` variable for the Analyzer worker to a specific value, define a variable `ANALYZER_HOME` in the Orchestrator pod. +When then a new Analyzer job is created, its `HOME` variable get initialized from the value of the `ANALYZER_HOME` variable. +An existing `HOME` variable in the Orchestrator pod will not conflict with this other value. diff --git a/transport/rabbitmq/README.adoc b/transport/rabbitmq/README.adoc deleted file mode 100644 index df34270c3..000000000 --- a/transport/rabbitmq/README.adoc +++ /dev/null @@ -1,65 +0,0 @@ -= RabbitMQ Transport implementation - -This module provides an implementation of the transport abstraction layer based on https://www.rabbitmq.com/[RabbitMQ]. - -== Synopsis - -The module allows message exchange via RabbitMQ message queues. -It assumes that the queues in use are already configured via an external mechanism. -Their names have to be provided in the configuration. - -The messages to be processed are converted to AMQP messages. -The payload is serialized to JSON and transferred in the text body of the message. -Metadata from the message header is represented by AMQP message properties. - -In order to use this module, the `type` property in the transport configuration must be set to `rabbitMQ`. - -== Configuration - -The configuration for message senders and receivers is identical. -Both require the URI and the credentials to the message broker server and the name of the involved message queue. -The credentials are obtained as secrets from the link:../../config/README.adoc[ConfigManager]. - -The following fragment shows the general structure: - -[source] ----- -endpoint { - sender/receiver: { - type = "rabbitMQ" - serverUri = "amqps://rabbit-mq-server.com:5671" - queueName = "my_message_queue" - rabbitMqUser = "myUsername" - rabbitMqPassword = "myPassword" - } -} ----- - -Table <> contains a description of the supported configuration properties: - -[#tab_rabbitmq_config] -.Supported configuration options -[cols="1,3,1",options=header] -|=== -|Property |Description |Secret - -|serverUri -|The URI of the RabbitMQ server. -|no - -|queueName -|The name of the message queue to send messages to or to retrieve messages from. -|no - -|rabbitMqUser -|The username to authenticate against the RabbitMQ server. -|yes - -|rabbitMqPassword -|The password to authenticate against the RabbitMQ server. -|yes -|=== - -NOTE: It is possible to set configuration properties via environment variables. Since each endpoint has its own - messaging configuration, different environment variables are used. Inspect the different - `application.conf` files to find the variables in use. diff --git a/transport/rabbitmq/README.md b/transport/rabbitmq/README.md new file mode 100644 index 000000000..10894ae14 --- /dev/null +++ b/transport/rabbitmq/README.md @@ -0,0 +1,49 @@ +# RabbitMQ Transport implementation + +This module provides an implementation of the transport abstraction layer based on [RabbitMQ](https://www.rabbitmq.com/). + +## Synopsis + +The module allows message exchange via RabbitMQ message queues. +It assumes that the queues in use are already configured via an external mechanism. +Their names have to be provided in the configuration. + +The messages to be processed are converted to AMQP messages. +The payload is serialized to JSON and transferred in the text body of the message. +Metadata from the message header is represented by AMQP message properties. + +In order to use this module, the `type` property in the transport configuration must be set to `rabbitMQ`. + +## Configuration + +The configuration for message senders and receivers is identical. +Both require the URI and the credentials to the message broker server and the name of the involved message queue. +The credentials are obtained as secrets from the [ConfigManager](../../config/README.adoc). + +The following fragment shows the general structure: + +``` +endpoint { + sender/receiver: { + type = "rabbitMQ" + serverUri = "amqps://rabbit-mq-server.com:5671" + queueName = "my_message_queue" + rabbitMqUser = "myUsername" + rabbitMqPassword = "myPassword" + } +} +``` + +This table contains a description of the supported configuration properties: + +| Property | Description | Secret | +|------------------|---------------------------------------------------------------------------------|--------| +| serverUri | The URI of the RabbitMQ server. | no | +| queueName | The name of the message queue to send messages to or to retrieve messages from. | no | +| rabbitMqUser | The username to authenticate against the RabbitMQ server. | yes | +| rabbitMqPassword | The password to authenticate against the RabbitMQ server. | yes | + +> [!NOTE] +> It is possible to set configuration properties via environment variables. +> Since each endpoint has its own messaging configuration, different environment variables are used. +> Inspect the different `application.conf` files to find the variables in use.