Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Log file collector based on v2 input API #20243

Closed
16 of 18 tasks
urso opened this issue Jul 27, 2020 · 5 comments
Closed
16 of 18 tasks

Log file collector based on v2 input API #20243

urso opened this issue Jul 27, 2020 · 5 comments
Labels
enhancement ext-goal External goal of an iteration Filebeat Filebeat Project:Filebeat-Input-v2 refactoring Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:Services (Deprecated) Label for the former Integrations-Services team v8.1.0

Comments

@urso
Copy link

urso commented Jul 27, 2020

Implement an alternative log file collector (Name: logfile) for use with the agent based on the v2 input API.

The v2 input API allows file state to be stored in a generic key-value store, while requiring the Input type to implement all required logic for state and ACK handling. A generic implementation for stateful inputs is available in the filebeat/input/v2/input-cursor package (currently requires a static set of sources only).

Tasks

Phase 1 (feature-parity)

  • create a special stateful input for the new logfile
  • add harvesters to an input dynamically
  • watch for file events
    • create an interface that is responsible for watching changes in the input files
    • refactor the current scanner so it implements the new interface

Missing features

  • JSON reader
    • add it again
    • rename it to NDJSON
  • multiline reader
  • configurable cleaner options

Phase 2 (enhancements)

Inputs based on log

Marking it GA

- [ ] remove log input in 8.x

Related: #15324

@urso urso added enhancement refactoring Filebeat Filebeat Project:Filebeat-Input-v2 Team:Services (Deprecated) Label for the former Integrations-Services team labels Jul 27, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@andresrc andresrc added the ext-goal External goal of an iteration label Jul 31, 2020
kvch added a commit that referenced this issue Oct 1, 2020
## What does this PR do?

This PR adds the skeleton of the new `filestream` input. The name of the input can be changed. The input was renamed from `logfile` because we are not going to provide the same options as the current `log` input. As `logfile` is already used by Agent for the `log` input, it is easier to adopt a new input with a different name.

The PR seems big, but the contents of `filebeat/input/filestream/internal/input-logfile` is basically the same as `filebeat/input/v2/input-cursor`. It is separated into a different folder because when the time comes, we would like to unify the two input types. The main difference between the two inputs is that the `configure` function of `input-logfile` returns a `Prospector` which finds inputs dynamically. Whereas `input-cursor` requires a list of paths without globbing.

The following files need review:

* filebeat/input/filestream/input.go
* filebeat/input/filestream/internal/input-logfile/fswatch.go
* filebeat/input/filestream/internal/input-logfile/harvester.go
* filebeat/input/filestream/internal/input-logfile/input.go
* filebeat/input/filestream/internal/input-logfile/prospector.go
* filebeat/input/filestream/prospector.go

The others are the same as `input-cursor`.

Also, updated tests are coming in a new PR.

## Related issues

First step #20243
kvch added a commit to kvch/beats that referenced this issue Oct 1, 2020
## What does this PR do?

This PR adds the skeleton of the new `filestream` input. The name of the input can be changed. The input was renamed from `logfile` because we are not going to provide the same options as the current `log` input. As `logfile` is already used by Agent for the `log` input, it is easier to adopt a new input with a different name.

The PR seems big, but the contents of `filebeat/input/filestream/internal/input-logfile` is basically the same as `filebeat/input/v2/input-cursor`. It is separated into a different folder because when the time comes, we would like to unify the two input types. The main difference between the two inputs is that the `configure` function of `input-logfile` returns a `Prospector` which finds inputs dynamically. Whereas `input-cursor` requires a list of paths without globbing.

The following files need review:

* filebeat/input/filestream/input.go
* filebeat/input/filestream/internal/input-logfile/fswatch.go
* filebeat/input/filestream/internal/input-logfile/harvester.go
* filebeat/input/filestream/internal/input-logfile/input.go
* filebeat/input/filestream/internal/input-logfile/prospector.go
* filebeat/input/filestream/prospector.go

The others are the same as `input-cursor`.

Also, updated tests are coming in a new PR.

## Related issues

First step elastic#20243

(cherry picked from commit cb624cf)
kvch added a commit that referenced this issue Oct 2, 2020
## What does this PR do?

This PR adds the skeleton of the new `filestream` input. The name of the input can be changed. The input was renamed from `logfile` because we are not going to provide the same options as the current `log` input. As `logfile` is already used by Agent for the `log` input, it is easier to adopt a new input with a different name.

The PR seems big, but the contents of `filebeat/input/filestream/internal/input-logfile` is basically the same as `filebeat/input/v2/input-cursor`. It is separated into a different folder because when the time comes, we would like to unify the two input types. The main difference between the two inputs is that the `configure` function of `input-logfile` returns a `Prospector` which finds inputs dynamically. Whereas `input-cursor` requires a list of paths without globbing.

The following files need review:

* filebeat/input/filestream/input.go
* filebeat/input/filestream/internal/input-logfile/fswatch.go
* filebeat/input/filestream/internal/input-logfile/harvester.go
* filebeat/input/filestream/internal/input-logfile/input.go
* filebeat/input/filestream/internal/input-logfile/prospector.go
* filebeat/input/filestream/prospector.go

The others are the same as `input-cursor`.

Also, updated tests are coming in a new PR.

## Related issues

First step #20243

(cherry picked from commit cb624cf)
kvch added a commit that referenced this issue Oct 2, 2020
## What does this PR do?

This PR adds the implementation for `FSWatcher` and `FSScanner` for the `filestream` input.

The implementation of `FSScanner` is called `fileScanner`. It is responsible for
* resolves recursive globs on creation
* normalizes glob patterns on creation
* finds files which match the configured paths and returns `FileInfo` for those

This is the refactored version of the `log` input's scanner, globber functions.

The implementation of `FSWatcher` is called `fileWatcher`. It checks the file list returned by `fileScanner` and creates events based on the result.

## Why is it important?

It is required for the `filestream` input.

## Related issues

Related #20243
kvch added a commit to kvch/beats that referenced this issue Oct 2, 2020
…#21444)

## What does this PR do?

This PR adds the implementation for `FSWatcher` and `FSScanner` for the `filestream` input.

The implementation of `FSScanner` is called `fileScanner`. It is responsible for
* resolves recursive globs on creation
* normalizes glob patterns on creation
* finds files which match the configured paths and returns `FileInfo` for those

This is the refactored version of the `log` input's scanner, globber functions.

The implementation of `FSWatcher` is called `fileWatcher`. It checks the file list returned by `fileScanner` and creates events based on the result.

## Why is it important?

It is required for the `filestream` input.

## Related issues

Related elastic#20243

(cherry picked from commit a119083)
kvch added a commit that referenced this issue Oct 5, 2020
…ner for filestream (#21468)

* Add implementation of FSWatcher and FSScanner for filestream (#21444)

## What does this PR do?

This PR adds the implementation for `FSWatcher` and `FSScanner` for the `filestream` input.

The implementation of `FSScanner` is called `fileScanner`. It is responsible for
* resolves recursive globs on creation
* normalizes glob patterns on creation
* finds files which match the configured paths and returns `FileInfo` for those

This is the refactored version of the `log` input's scanner, globber functions.

The implementation of `FSWatcher` is called `fileWatcher`. It checks the file list returned by `fileScanner` and creates events based on the result.

## Why is it important?

It is required for the `filestream` input.

## Related issues

Related #20243

(cherry picked from commit a119083)

* Do not run symlink tests on Windows (#21472)
@ph
Copy link
Contributor

ph commented Apr 27, 2021

@kvch Could you update this issue with the remaining work for the filestream input?

@kvch kvch removed their assignment Oct 7, 2021
@jlind23 jlind23 added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Nov 10, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@jlind23
Copy link
Collaborator

jlind23 commented Nov 10, 2021

@kvch as discussed could you please update this issue with the missing parts?

@kvch
Copy link
Contributor

kvch commented Dec 8, 2021

I am closing this issue. Most of the work is done. The two unfinished task is tracked here separately because they are just enhancements:

@kvch kvch closed this as completed Dec 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement ext-goal External goal of an iteration Filebeat Filebeat Project:Filebeat-Input-v2 refactoring Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:Services (Deprecated) Label for the former Integrations-Services team v8.1.0
Projects
None yet
Development

No branches or pull requests

6 participants