Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce datasources in package to configure inputs and streams #212

Merged
merged 4 commits into from
Feb 18, 2020

Conversation

ruflin
Copy link
Member

@ruflin ruflin commented Feb 11, 2020

In elastic/beats#15940 datasources, inputs and streams are introduced into the agent config. To make it possible to configure these in the UI and through the API, some changes to the manifest definitions of a package and datasets are needed.

Package manifest

Each package must specify the datasources it supports with the supported inputs inside. So far all the packages only support one datasource but I want to keep the door open for this to potentially change in the future. It also makes it possible to have the manifest config of a datasource be identical to the config which ends up in the agent config.

The package manifest datasource definition looks as following (nginx example):

datasources:
  -
    # Do we need a name for the data source?
    name: nginx

    # List of inputs this datasource supports
    inputs:
      -
        # An id can be given, in case the type used here is not unique
        # This is for selection in the stream
        # id: nginx
        type: metrics/nginx

        # Common configuration options for this input
        vars:
          - name: hosts
            description: Nginx hosts
            default:
              ["http://127.0.0.1"]
            # All the config options that are required should be shown in the UI
            required: true
          - name: period
            description: "Collection period. Valid values: 10s, 5m, 2h"
            default: "10s"
          - name: username
            type: text
          - name: password
            # This is the html input type?
            type: password

      -
        type: logs

        # Common configuration options for this input
        vars:

      -
        type: syslog

        # Common configuration options for this input
        vars:

Inside the datasource, the supported inputs are specified with the common variables across all streams which use a certain input. In the UI I expect that we show the required configs by default and all the others are under "Advanced" or similar.

Dataset manifest

With the datasources and inputs defined on the package level, each dataset can specify which inputs it supports. Most datasets will only support one input for now. For the nginx metrics this looks as following:

inputs:
  - type: "metric/nginx"

    # Only the variables have to be repeated that are not specified as part of the input
    vars:
      # All variables are specified in the input already

As an example with supporting multiple inputs, we have the nginx error logs:

inputs:
  - type: log
    vars:
      - name: paths
        required: true
        default:
          - /var/log/nginx/error.log*
        os.darwin:
          - /usr/local/var/log/nginx/error.log*
        os.windows:
          - c:/programdata/nginx/logs/error.log*

  - type: syslog
    vars:
      # Are udp and tcp syslog input two different inputs?
      - name: protocol.udp.host
        required: true
        default:
          - "localhost:9000"

The log and syslog input are supported (not the case today, just an example). One the dataset level also all additional variables for this dataset are specified. The ones already specified on the input level in the package don't have to be specified again.

Stream definition

Now that the dataset has its supported inputs and variables defined, the stream can be defined. The stream defines which input it uses from the dataset and its configuration variables. Here an example for nginx metrics:

input: metrics/nginx
metricsets: ["stubstatus"]
period: {{period}}
enabled: true

hosts: {{hosts}}

{{#if username}}
username: "{{username}}"
{{/if}}
{{#if password}}
password: "{{password}}"
{{/if}}

During creation time of the stream config the variables from the datasource inputs and local variables from the dataset are filled in.

A stream definition could also support multiple inputs as seen in the following example:


{{#if input == log}}
input: log

{{#each paths}}
paths: "{{this}}"
{{/each}}
exclude_files: [".gz$"]

processors:
  - add_locale: ~
{{/if}}

{{#if input == syslog}}
input: syslog

{{/if}}

Further changes

  • Rename agent/input to agent/stream as a stream is configured there.

In elastic/beats#15940 datasources, inputs and streams are introduced into the agent config. To make it possible to configure these in the UI and through the API, some changes to the manifest definitions of a package and datasets are needed.

**Package manifest**

Each package must specify the datasources it supports with the supported inputs inside. So far all the packages only support one datasource but I want to keep the door open for this to potentially change in the future. It also makes it possible to have the manifest config of a datasource be identical to the config which ends up in the agent config.

The package manifest datasource definition looks as following (nginx example):

```
datasources:
  -
    # Do we need a name for the data source?
    name: nginx

    # List of inputs this datasource supports
    inputs:
      -
        # An id can be given, in case the type used here is not unique
        # This is for selection in the stream
        # id: nginx
        type: metrics/nginx

        # Common configuration options for this input
        vars:
          - name: hosts
            description: Nginx hosts
            default:
              ["http://127.0.0.1"]
            # All the config options that are required should be shown in the UI
            required: true
          - name: period
            description: "Collection period. Valid values: 10s, 5m, 2h"
            default: "10s"
          - name: username
            type: text
          - name: password
            # This is the html input type?
            type: password

      -
        type: logs

        # Common configuration options for this input
        vars:

      -
        type: syslog

        # Common configuration options for this input
        vars:

```

Inside the datasource, the supported inputs are specified with the common variables across all streams which use a certain input. In the UI I expect that we show the `required` configs by default and all the others are under "Advanced" or similar.

**Dataset manifest**

With the datasources and inputs defined on the package level, each dataset can specify which inputs it supports. Most datasets will only support one input for now. For the nginx metrics this looks as following:

```
inputs:
  - type: "metric/nginx"

    # Only the variables have to be repeated that are not specified as part of the input
    vars:
      # All variables are specified in the input already
```

As an example with supporting multiple inputs, we have the nginx error logs:

```
inputs:
  - type: log
    vars:
      - name: paths
        required: true
        default:
          - /var/log/nginx/error.log*
        os.darwin:
          - /usr/local/var/log/nginx/error.log*
        os.windows:
          - c:/programdata/nginx/logs/error.log*

  - type: syslog
    vars:
      # Are udp and tcp syslog input two different inputs?
      - name: protocol.udp.host
        required: true
        default:
          - "localhost:9000"
```

The log and syslog input are supported (not the case today, just an example). One the dataset level also all additional variables for this dataset are specified. The ones already specified on the input level in the package don't have to be specified again.

**Stream definition**

Now that the dataset has its supported inputs and variables defined, the stream can be defined. The stream defines which input it uses from the dataset and its configuration variables. Here an example for nginx metrics:

```
input: metrics/nginx
metricsets: ["stubstatus"]
period: {{period}}
enabled: true

hosts: {{hosts}}

{{#if username}}
username: "{{username}}"
{{/if}}
{{#if password}}
password: "{{password}}"
{{/if}}
```

During creation time of the stream config the variables from the datasource inputs and local variables from the dataset are filled in.

A stream definition could also support multiple inputs as seen in the following example:

```

{{#if input == log}}
input: log

{{#each paths}}
paths: "{{this}}"
{{/each}}
exclude_files: [".gz$"]

processors:
  - add_locale: ~
{{/if}}

{{#if input == syslog}}
input: syslog

{{/if}}
```

**Further changes**

* Rename `agent/input` to `agent/stream` as a stream is configured there.
@ruflin ruflin self-assigned this Feb 11, 2020
@ruflin
Copy link
Member Author

ruflin commented Feb 11, 2020

@skh Could you give some feedback if the above info the the manifests and stream is enough to build a full config?
@jen-huang Will this info be sufficient to build the UI on it?
@ph Applying what we discussed in the config.

@skh
Copy link
Contributor

skh commented Feb 11, 2020

Could you give some feedback if the above info the the manifests and stream is enough to build a full config?

Where can I find out what a full, correct agent config must look like?

@ruflin
Copy link
Member Author

ruflin commented Feb 11, 2020

@skh elastic/beats#15940

@ph
Copy link

ph commented Feb 11, 2020

@ruflin format sound good, a few things:

  • Make sure you rename the type for metrics where the service is before metrics like this: docker/metrics.
  • As we discussed over zoom for simplification, lets make sure we do not have a field defined at the level of the input that can be redefined at the stream level.

Comment on lines 17 to 20
# Should we define this as array? How will the UI best make sense of it?
type: textarea
default:
- /var/log/nginx/access.log*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we define this as array?

Does this mean that users can have multiple paths defined? If so, how about adding a boolean field called multi that tells the UI that there can be multiple of the specified type:

type: text
multi: true

For example, if we want to have multiple paths as default value, we can do like:

default:
  - /var/log/nginx/access.log*
  - /some/path/log/nginx/access.log*

or

default: ["/var/log/nginx/access.log*", "/some/path/log/nginx/access.log*"]

(not sure which yaml format we prefer but would be great to have it consistent across these examples 🙂)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there can be multiple default paths. Changed it do your suggestion.

Comment on lines 23 to 26
os.darwin:
- /usr/local/var/log/nginx/access.log*
os.windows:
- c:/programdata/nginx/logs/*access.log*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit confused about the format here, are these default values, but platform-specific? If so, should they be nested under default?

Copy link
Member Author

@ruflin ruflin Feb 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. They are platform specific defaults. As we should not mix array and object I wonder how the yaml should look like:

        default:
           all:
              - /var/log/nginx/access.log*
        # I suggest to use ECS fields for this config options here: https://github.com/elastic/ecs/blob/master/schemas/os.yml
        # This would need to be based on a predefined definition on what can be filtered on
          os.darwin:
            - /usr/local/var/log/nginx/access.log*
          os.windows:
            - c:/programdata/nginx/logs/*access.log*

I don't like the all. Perhaps we find a better name?

Copy link
Contributor

@jen-huang jen-huang Feb 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fond of all either, and it seems silly to do default.default 🙂 Thinking about this again, what if we just nest default under each platform? I don't know if we'll want to add other platform-specific configuration for vars, but this schema could support that too since it essentially mirrors all the top-level vars properties. As an example, maybe paths is not required for windows for whatever reason:

      - name: paths
        required: true
        default:
            - /var/log/nginx/access.log*
        os.darwin:
          default:
            - /usr/local/var/log/nginx/access.log*
        os.windows:
          required: false
          default:
            - c:/programdata/nginx/logs/*access.log*

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. So in summary: All configs from the top level could also be nested under each platform. Lets try it.

What I suggest for now, lets ignore platform at least on the UI side.

datasources:
-
# Do we need a name for the data source?
name: nginx
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think name would be useful to populate the id in agent config, but we can just use the package name too

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean prefixing the id? Because the id needs to be unique.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I meant prefixing

Comment on lines +31 to +36
- name: hosts
description: Nginx hosts
default:
["http://127.0.0.1"]
# All the config options that are required should be shown in the UI
required: true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wold like to see type defined explicitly for all vars. And if we like the multi suggestion in my previous comment, these should be added:

type: text
multi: true

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Adding.

Comment on lines +37 to +39
- name: period
description: "Collection period. Valid values: 10s, 5m, 2h"
default: "10s"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add type: duration or type: text?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for a duration type, we used integer priviously and it was not that great. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like duration if that works for the UI.

type: text
- name: password
# This is the html input type?
type: password
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to decide all possible values for type field. They do not need to be HTML types, though. Here are the possible types I currently see (or can foresee being useful):

text -> maps to UI text input
number -> maps to UI number input
boolean -> maps to UI checkbox
duration -> maps to UI text input
  this would let us do client-side validation but maybe overkill for now?
url -> maps to UI text input
  this would let us do client-side validation but maybe overkill for now?

In the case of password vars, I would use text for them as they'll show up as simple text inputs in the UI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the above proposed types. I was thinking of password instead of text to not have to show clear text passwords directly on the screen when users put them in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW: If we have these predefined values for type we can also validate it already on the package side to make sure nothing odd shows up in the package itself.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the URL types we have to be careful when we do this and we do it, in some modules we do use compound scheme (not sure if this is the exact term?) Like "http+npipe://./pipe/custom" see https://github.com/elastic/beats/blob/d75261469b5c2e675d7d331b86bfc8599d797ddc/metricbeat/mb/parse/url_test.go for a few examples.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see url support as a stretch goal. In case of doubt we can always fall back to just string. Lets keep it simple for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add granular text types (duration, url, etc) but it doesn't mean that we need to do validation right now. Validation can be added later on from all fronts (UI, agent, package) once we align on rules.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also ++ on keeping password type. This give the UI information to decide when to show clear text inputs and when to obfuscate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as we started the UI implementation and are happy with the format, we should open a PR here with the exact specs.

- c:/programdata/nginx/logs/*access.log*
# List of supported inputs
inputs:
- type: log
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hbharding suggested today that it would be great to have descriptions in the UI. Could we add description prop for every input, under its type? I think these descriptions should only be a sentence or two, max. Not sure how we will localize these though..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Added it. But did not add it here but on the package level where the overall inputs are defined. As a single input is shown for multiple datasets, I guess it makes more sense there.

inputs:
- type: log
vars:
- name: paths
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to my previous comment, it would be great to have description prop per var too. This would be really short descriptions that give the user info about what each var is used for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Member Author

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jen-huang @ph I did one more push with the most recent reviews. I suggest we get this PR in and then I adjust all the other packages we have + making it fully available in the our API. Having the format should already allow the UI work to get started.

- c:/programdata/nginx/logs/*access.log*
# List of supported inputs
inputs:
- type: log
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Added it. But did not add it here but on the package level where the overall inputs are defined. As a single input is shown for multiple datasets, I guess it makes more sense there.

inputs:
- type: log
vars:
- name: paths
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

type: text
- name: password
# This is the html input type?
type: password
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as we started the UI implementation and are happy with the format, we should open a PR here with the exact specs.

@ph
Copy link

ph commented Feb 17, 2020

@ruflin I am ok to merge this and iterate.

@ruflin ruflin merged commit e965931 into elastic:master Feb 18, 2020
@ruflin
Copy link
Member Author

ruflin commented Feb 18, 2020

I merged this change as it seems we are mostly happy with what we have here and will follow up with more changes for further discussions.

@ruflin ruflin deleted the datasources branch February 18, 2020 11:52
ruflin added a commit to ruflin/package-registry that referenced this pull request Feb 18, 2020
In elastic#212 the datasource structure was introduced in the manifests but not exposed through the APIs yet. This PR changes this exposing all the fields. So far it only exposes the fields and no validation is done yet. This needs to be added in the future.

A test package was added to see the output of datasource configs and potential changes to it.
ruflin added a commit that referenced this pull request Feb 18, 2020
In #212 the datasource structure was introduced in the manifests but not exposed through the APIs yet. This PR changes this exposing all the fields. So far it only exposes the fields and no validation is done yet. This needs to be added in the future.

A test package was added to see the output of datasource configs and potential changes to it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants