Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logstash add ElasticsearchRefs #6662

Merged
merged 35 commits into from
Apr 26, 2023

Conversation

kaisecheng
Copy link
Contributor

@kaisecheng kaisecheng commented Apr 5, 2023

This PR allows configuration of elasticsearchRefs in CRD to associate Elasticsearch clusters to Logstash. The connectivity info/ credentials are passed via environment variable to Logstash container for referencing in Logstash plugins: logstash-output-elasticsearch, logstash-input-elasticsearch and logstash-filter-elasticsearch.

Sample config

apiVersion: logstash.k8s.elastic.co/v1alpha1
kind: Logstash
metadata:
  name: logstash-sample
spec:
  elasticsearchRefs:
    - clusterName: production
      name: elasticsearch-sample
      namespace: spaceship

    

Environment variables taking $clusterName_ES_ as prefix are available in Logstash container

PRODUCTION_ES_HOSTS
PRODUCTION_ES_USERNAME
PRODUCTION_ES_PASSWORD
PRODUCTION_ES_SSL_CERTIFICATE_AUTHORITY # path of ca.crt of production cluster

Sample Logstash pipeline with elasticsearch output using environment variables

- pipeline.id: main
  config.string: |
    output { 
     elasticsearch {
       hosts => "${PRODUCTION_ES_HOSTS}"
       user => "${PRODUCTION_ES_USERNAME}"
       password => "${PRODUCTION_ES_PASSWORD}" 
       cacert => "${PRODUCTION_ES_SSL_CERTIFICATE_AUTHORITY}"
      } 
    }

The sample logstash yml file as located in config/samples/logstash/logstash_es.yaml will
create:

NAMESPACE  NAME                                                                   READY  REASON  AGE
default    Logstash/logstash-sample                                               -              4m41s
default    ├─Secret/logstash-sample-default-elasticsearch-sample-logstash-user    -              2m54s
default    ├─Secret/logstash-sample-logstash-es-default-elasticsearch-sample-ca   -              2m54s
default    ├─Secret/logstash-sample-ls-config                                     -              2m53s
default    ├─Service/logstash-sample-ls-api                                       -              2m53s
default    │ └─EndpointSlice/logstash-sample-ls-api-vpxv5                         -              2m53s
default    └─StatefulSet/logstash-sample-ls                                       -              2m53s
default      ├─ControllerRevision/logstash-sample-ls-c6dc8b867                    -              2m53s
default      └─Pod/logstash-sample-ls-0                                           True           2m53s

e2e test

  • TestLogstashEsOutput

Still todo:

  • add rbac

Co-authored-by: Rob Bavey rob.bavey@elastic.co

@botelastic botelastic bot added the triage label Apr 5, 2023
@kaisecheng kaisecheng added >enhancement Enhancement of existing functionality and removed triage labels Apr 5, 2023
@kaisecheng kaisecheng changed the base branch from main to feature/logstash April 5, 2023 19:54
- fix e2e TestSamples
- check ES status
- check ES association secrets
- check association volume
- check association env
…h_esrefs

# Conflicts:
#	pkg/controller/logstash/driver.go
#	pkg/controller/logstash/pod.go
#	test/e2e/test/logstash/builder.go
@kaisecheng kaisecheng marked this pull request as ready for review April 11, 2023 13:27
Copy link
Member

@robbavey robbavey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic. I do have a couple of broad questions before I dive in fully

@@ -150,6 +152,22 @@ var (
},
},
},
LogstashUserRole: esclient.Role{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indices chosen here will restrict logstash to only being able to create the "standard" indices/templates, and not custom ones created by users.

I notice that agent and enterprise search use a superuser role here, presumably to get over the same limitation. I am curious to hear from other folks here as to the best approach.

cc @pebrc @jsvd

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indices chosen here will restrict logstash to only being able to create the "standard" indices/templates, and not custom ones created by users.

That's a good point. New roles can be added using the roles section, and even if it is not ideal this can also be used to override predefined roles.

Copy link
Collaborator

@pebrc pebrc Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the uncommented/explained use of superuser roles by Agent and Enterprise Search is actually somewhat of a bug. Even if there is a motivation for it in order to create and ingest data into custom indices, it feels like something that should be called out in a comment/doc at the very least and probably a custom role with appropriate privileges to create arbitrary indices would have still been better than superuser.

@@ -31,6 +35,10 @@ type LogstashSpec struct {
// +kubebuilder:validation:Optional
Image string `json:"image,omitempty"`

// ElasticsearchRef is a reference to an Elasticsearch cluster running in the same Kubernetes cluster.
// +kubebuilder:validation:Optional
ElasticsearchRefs []commonv1.ObjectSelector `json:"elasticsearchRefs,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking forward a little to when we will have named clusters in Logstash, we talked about introducing the notion of a clusterName into the ElasticsearchRef definition along the lines of

elasticsearchRefs:
  - clusterName: k8sData
    name: local
    namespace: production               

with the clusterName being used as the id in the cluster.yaml equivalent that we envisage for Logstash.
If we think about a cluster.yaml looking something like:

elasticsearch_clusters:
  k8sData:
    host: "https://metaldata-es-http:9200"
    username: “logstash_system
    password: REDACTED
    ca_file: /mnt/elastic-internal/es-certs/ca.crt

I think we could do this by creating the LogstashSpec to look like:

type LogstashSpec struct {
  :
  // Named list of Elasticsearch Clusters
  ElasticsearchRefs[] ElasticsearchCluster `json:"elasticsearchRefs,omitempty"` 
  :
}

// Definition of type to define an Elasticsearch Cluster to be referred to in 
// Logstash pipeline
type ElasticsearchCluster struct {
  commonv1.ObjectSelector `json:",omitempty,inline"`
  ClusterName              string `json:"clusterName,omitempty"`
}

We could potentially create the env variables with the clusterName? (And if we did would we need the namespace here?)

There is some prior art with the elastic agent using an outputName to distinguish between multiple elasticsearchRefs.

Alternatively, we could stick with what we have here, and write cluster.yaml as:

elasticsearch_clusters:
  local_production:
    host: "https://metaldata-es-http:9200"
    username: “logstash_system
    password: REDACTED
    ca_file: /mnt/elastic-internal/es-certs/ca.crt

using $namespace_$name as the id.

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are awesome suggestions. +1 to follow the prior art of agent to unify the experience. Also, +1 to bring cluster name in technical preview.

I think we can use outputName instead of clusterName to further share the experience. For the env variable, let's replace $namespace_$name with outputName. The name needs to be unique in elasticsearchRefs. I will add validation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought on using clusterName over outputName is that we also use want to use elasticsearchRef in input and filter plugins, not just for outputs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robbavey Revisiting the usage of clusterName. I have doubts about its value. One drawback is the stack monitoring should understand clusterName and send metrics to the corresponding cluster, which does not feasible now because the operator only considers namespace + name in the association.

From the design doc, things we want to achieve (now and future cluster.yml) with clusterName seem can be replaced by namespace + name.

    elasticsearch {
      cluster_id => namespace_name
    }
elasticsearch_clusters:
  namespace_name:
    host: "https://metaldata-es-http:9200"
    username: “logstash_system
    password: REDACTED
    ca_file: /mnt/elastic-internal/es-certs/ca.crt
  namespace_name2:
    cloud_id: XXXXXXX
     api_key: yyyyyy

clusterName works like a nickname that is handy when user wants to give a second name.
I wanna keep the current env variable naming convention, namespace + name, at least in the technical preview.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with @barkbay that the lack of clusterName in elasticsearchRef in stack monitoring is probably ok - clusterName is added as an aid for the user to refer to an elasticsearch cluster pointed to by elasticsearchRef outside of the kubernetes CRD in pipeline configurations, without needing to deal with kubernetes semantics, such as namespaces. The ability to define/move namespaces outside of the CRD is an an interesting twist on this.

Copy link
Collaborator

@pebrc pebrc Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be an optional feature? If users define it it will be used to name the environment variables? Also could it then just be named alias?

elasticsearchRefs:
  - alias: o11y-cluster
    name: monitor
    namespace: production  

I think I also agree that stack monitoring is different enough to allow a special case here. Because users have to understand the relationship between the elasticsearchRef and what they define in configuration there is a motivation for such a construct. Why would one want to use an alias for stack monitoring?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Logstash context, clusterName is a mapping of name to endpoint (host, username, password, cacert). The endpoint can be an input (source) or an output (destination). It will be referenced in pipeline.

input { 
  kafka { } 
}
output {
  elasticsearch { cluster_name => "production" }
  elasticsearch { cluster_name => "uat" }
}

In Logstash CRD, we allow users to define the name mapping

  elasticsearchRefs:
    - clusterName: production
      name: es1
    - clusterName: uat
       name:es2
    - clusterName: monitor
       name:es3

However, Stack Monitoring does not understand the name production, uat and monitor. If Stack Monitoring wants to insert metrics to monitor, it needs the following

  monitoring:
    metrics:
      elasticsearchRefs:
        - name: es3

As a user, I am confused why I can't use monitor that I declared earlier. The name works in pipeline but not in Stack Monitoring.

IMO, ultimately we will need clusterName to achieve cluster.yml which contains the mapping of alias to endpoint. In technical preview, it seems not ready. I hope the example can explain the confusion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my confusion here, is what would be your expectation of using clusterName in the monitoring section?

Are you expecting to be able to use clusterName instead of name + namespace? Or as well as? Is your expectation that you could use

 monitoring:
    metrics:
      elasticsearchRefs:
        - clusterName: monitor

or that you would expect

 monitoring:
    metrics:
      elasticsearchRefs:
         - clusterName: monitor
            name:es3

or that clusterName is arbitrary without the specific cluster or clusterName setting in cluster.yml when that comes to be?

I agree that it feels a little half baked without the cluster.yml, and we only refer to clusterName in the env variable, but I do feel that it will have utility when we start allowing named clusters, and maybe introducing the notion earlier is of some value.

Thoughts @flexitrev, @jsvd, @roaksoax?

Copy link
Contributor Author

@kaisecheng kaisecheng Apr 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my expectation is to be able to use clusterName to output metrics to monitor

 monitoring:
    metrics:
      elasticsearchRefs:
        - clusterName: monitor

and won't stop user to use namespace + name to output

  monitoring:
    metrics:
      elasticsearchRefs:
        - name: es3

The declaration in spec.elasticsearchRefs will all convert to cluster.yml. Only the clusterName in cluster.yml can be referenced in pipeline.

  elasticsearchRefs:
    - clusterName: production
      name: es1
    - clusterName: uat
       name:es2
    - clusterName: monitor
       name:es3

cluster.yml

- production:
    host:...
    username:...
    password:...
    cacert:...
- uat:
    host:...
    username:...
    password:...
    cacert:...
- monitor:
    host:...
    username:...
    password:...
    cacert:...

@kaisecheng kaisecheng requested a review from barkbay April 13, 2023 12:18
@kaisecheng
Copy link
Contributor Author

@barkbay I think this is ready for review

There is a discussion about bringing clusterName to elasticsearchRefs. The decision is not to add clusterName in the technical preview.

{
Names: []string{"logstash-*", "ecs-logstash-*", "logs-*", "metrics-*", "synthetics-*", "traces-*"},
Privileges: []string{"manage", "read", "create_doc", "view_index_metadata", "create_index"},
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need more permissions here, in terms of both index names, and privileges.
I tried a few different simple scenarios, and there were a few issues -
Removing data_streams => false will write to logstash, and ecs_compatibility => disabled will write to ecs-logstash by default, so we will need to add those to the list of Names
Additionally, I think we need to add additional privileges - we get a

[indices:data/write/index:op_type/index] is unauthorized for user [default-logstash-sample-default-elasticsearch-logstash-user] with effective roles [eck_logstash_user_role] on indices [ecs-logstash], this action is granted by the index privileges [create,index,write,all]"

error when attempting to write to this index. I suspect we will need the write privilege here - the elasticsearch output is able to do create, index, update and delete actions
If we are going to limit the indices here, we will need to add instructions on how users can get around this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for double check it. I have updated the setting.
ecs-logstash and logstash are required by ilm_rollover_alias
The logstash doc regarding secure connection needs update

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This now works as expected.

Copy link
Member

@robbavey robbavey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me from a logstash perspective.

My only concern is around the experience when referring to an index that is not one of the "default" ones, and how we choose to document this.

AuthSecretName: "logstash-sample-default-elasticsearch-sample-logstash-user",
AuthSecretKey: "default-logstash-sample-default-elasticsearch-sample-logstash-user",
CACertProvided: false,
CASecretName: "logstash-sample-logstash-es-default-elasticsearch-sample-ca",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we expect this to be created if we are not using tls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@barkbay
Copy link
Contributor

barkbay commented Apr 17, 2023

@barkbay I think this is ready for review

Sure, I'll do a first review today.

Copy link
Contributor

@barkbay barkbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a few comments, I still have to do some tests.

pkg/apis/logstash/v1alpha1/logstash_types.go Outdated Show resolved Hide resolved

func getEsRefNamespacedName(assoc commonv1.Association) string {
ref := assoc.AssociationRef()
return fmt.Sprintf("%s_%s", ref.Namespace, ref.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref.Name may be empty if user provided connection settings through a Secret:

apiVersion: logstash.k8s.elastic.co/v1alpha1
kind: Logstash
metadata:
  name: logstash-sample
spec:
  count: 1
  version: 8.7.0
  elasticsearchRefs:
    - secretName: external-cloud-es-ref
  pipelines: {}
---
apiVersion: v1
kind: Secret
metadata:
  name: external-cloud-es-ref
stringData:
  url: https://my-deployment.es.europe-west1.gcp.cloud.es.io
  username: elastic
  password: REDACTED

This feature, introduced in ECK 2.2.0 is not documented (yet: #6449) but can be used by a user to connect a stack component to an [external/non ECK managed] Elasticsearch cluster.

I'm not sure what is the best way to handle this case if we want to inject connection settings using env. variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great that you raise the issue. We can address the external ES connection in a follow-up. I am thinking of using ref.NameOrSecretName() for env. In the future, the operator will need to write managed and unmanaged cluster info to a cluster.yml. Logstash will need to map the cluster.yml to plugin config.

In the technical preview, we aim for managed clusters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make it a public issue so our users can find it?

pkg/controller/logstash/env.go Outdated Show resolved Hide resolved
pkg/apis/logstash/v1alpha1/logstash_types.go Outdated Show resolved Hide resolved
@@ -150,6 +152,22 @@ var (
},
},
},
LogstashUserRole: esclient.Role{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indices chosen here will restrict logstash to only being able to create the "standard" indices/templates, and not custom ones created by users.

That's a good point. New roles can be added using the roles section, and even if it is not ideal this can also be used to override predefined roles.

Co-authored-by: Michael Morello <michael.morello@gmail.com>
kaisecheng and others added 6 commits April 21, 2023 15:15
Co-authored-by: Michael Morello <michael.morello@gmail.com>
…to logstash_esrefs

# Conflicts:
#	pkg/controller/logstash/pod.go
- change env variable prefix
- add validation
Copy link
Member

@robbavey robbavey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great - really close, I have a couple of nits on clarifying the messaging on missing clusterName

pkg/apis/logstash/v1alpha1/validations.go Outdated Show resolved Hide resolved
pkg/apis/logstash/v1alpha1/validations_test.go Outdated Show resolved Hide resolved
pkg/apis/logstash/v1alpha1/validations_test.go Outdated Show resolved Hide resolved
pkg/apis/logstash/v1alpha1/validations_test.go Outdated Show resolved Hide resolved
kaisecheng and others added 4 commits April 24, 2023 18:55
Co-authored-by: Rob Bavey <rob.bavey@elastic.co>
Co-authored-by: Rob Bavey <rob.bavey@elastic.co>
Co-authored-by: Rob Bavey <rob.bavey@elastic.co>
Co-authored-by: Rob Bavey <rob.bavey@elastic.co>
Copy link
Member

@robbavey robbavey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from the logstash side

@kaisecheng
Copy link
Contributor Author

@pebrc @barkbay I have added clusterName and gotten thumbs up from Rob. This is ready to review.

@robbavey
Copy link
Member

@kaisecheng One thing I noticed, that I should have noticed earlier 🤦 - we use USERNAME as the suffix in the env variable, but the plugin settings use user for the configuration settings. What do you think about having USER for the env variable?

@kaisecheng
Copy link
Contributor Author

@robbavey I have changed the suffix to USER

Copy link
Member

@robbavey robbavey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@robbavey
Copy link
Member

@pebrc @barkbay This is ready for final review by the ECK team - I'm happy with this from the logstash side

@barkbay
Copy link
Contributor

barkbay commented Apr 26, 2023

👀

Copy link
Contributor

@barkbay barkbay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

pkg/apis/logstash/v1alpha1/logstash_types.go Outdated Show resolved Hide resolved
// ElasticsearchCluster definition of type to define an Elasticsearch Cluster to be referred to in Logstash pipeline
type ElasticsearchCluster struct {
commonv1.ObjectSelector `json:",omitempty,inline"`
ClusterName string `json:"clusterName,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to prevent user from using a same clusterName for different clusters with a new validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would like to improve it in future

pkg/apis/logstash/v1alpha1/logstash_types.go Outdated Show resolved Hide resolved
pkg/apis/logstash/v1alpha1/validations.go Outdated Show resolved Hide resolved
kaisecheng and others added 4 commits April 26, 2023 15:59
Co-authored-by: Michael Morello <michael.morello@gmail.com>
Co-authored-by: Michael Morello <michael.morello@gmail.com>
Co-authored-by: Michael Morello <michael.morello@gmail.com>
@kaisecheng kaisecheng merged commit 69b5ef9 into elastic:feature/logstash Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement Enhancement of existing functionality :logstash
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants