Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move all modules to using filestream #41861

Open
rdner opened this issue Dec 3, 2024 · 8 comments
Open

Move all modules to using filestream #41861

rdner opened this issue Dec 3, 2024 · 8 comments
Assignees
Labels
Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@rdner
Copy link
Member

rdner commented Dec 3, 2024

Describe the enhancement:

We currently have a lot of modules that are still using the log input type (deprecated since 7.16.0) with unstable file identification that, due to file rotation, leads to parsing errors (e.g. multiline grok or JSON parser), data duplication and even possible data loss.

Describe a specific use case for the enhancement or feature:

We should migrate all of the above mentioned modules to using type: filestream + fingerprint file identity.

For example, this Elasticsearch GC module would change from:

type: log
paths:
{{ range $i, $path := .paths }}
 - {{$path}}
{{ end }}
exclude_files: [".gz$"]
exclude_lines: ["^(OpenJDK|Java HotSpot).* Server VM ", "^CommandLine flags: ", "^Memory: ", "^{"] # exclude JVM8 banner and JSON
multiline:
  pattern: '^(\[?[0-9]{4}-[0-9]{2}-[0-9]{2}|{)'
  negate: true
  match: after
processors:
  - add_fields:
      target: ''
      fields:
        ecs.version: 1.12.0

to

type: filestream
id: <some-globally-unique-id>
take_over: true
paths:
{{ range $i, $path := .paths }}
 - {{$path}}
{{ end }}
exclude_files: [".gz$"]
exclude_lines: ["^(OpenJDK|Java HotSpot).* Server VM ", "^CommandLine flags: ", "^Memory: ", "^{"] # exclude JVM8 banner and JSON
multiline:
  pattern: '^(\[?[0-9]{4}-[0-9]{2}-[0-9]{2}|{)'
  negate: true
  match: after
processors:
  - add_fields:
      target: ''
      fields:
        ecs.version: 1.12.0
1c1,8
< type: log
---
> type: filestream
> id: <some-globally-unique-id>
> take_over: true

The main challenge here is to introduce the ability to set a unique ID to each module. We can use a default value unique to each module but if we allow users to run multiple modules of the same type, their filestream inputs must have unique identifiers.

Useful links

@rdner rdner added Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team labels Dec 3, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@rdner
Copy link
Member Author

rdner commented Dec 11, 2024

After #40197 is done, we don't need to put the fingerprint configuration in this task anymore, it's going to be a new default.

@rdner rdner changed the title Move all modules to using filestream + fingerprint file identity Move all modules to using filestream Dec 11, 2024
@jlind23
Copy link
Collaborator

jlind23 commented Jan 20, 2025

Assigning this to you @flexitrev as per my conversation with @rdner you'll be chasing down the modules owner and ask them to migrate.

@flexitrev
Copy link

We should have something to check against options used, to see if they require renaming -
https://www.elastic.co/guide/en/beats/filebeat/current/_step_3_use_new_option_names.html

@nimarezainia
Copy link
Contributor

@flexitrev i wanted to link this older issue here: elastic/integrations#2518
The instructions on how to migrate from logfile to filestream are there i that issue. It was also the basis of what is in the docs.

I think the ask (if I' not mistaken) is to have all the integration owners to modify/migrate their packages to use filestream.

@flexitrev
Copy link

@rdner what will be the impact if the modules are not migrated?

@rdner
Copy link
Member Author

rdner commented Feb 4, 2025

@flexitrev

The main impact is that we cannot delete the code of the log input that we deprecated in 7.16.

Additionally, filestream is way more reliable when it comes to log rotations and file identification.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

No branches or pull requests

5 participants