Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leverage the Logstash 8.4.0 DLQ reader to delete consumed segments #43

Merged

Conversation

andsel
Copy link
Contributor

@andsel andsel commented Jun 16, 2022

Release notes

Expose the clean_consumed configuration setting to remove the consumed DLQ segments.

What does this PR do?

Expose the setting to be used in config files and lock the execution to Logstash >= 8.4.0 because leverages the API provided by PR elastic/logstash#14188.

Why is it important/What is the impact to the user?

Permit to the user to automatically clean the DLQ segments once consumed, if the feature is implemented in the target Logstash.

Checklist

  • My code follows the style guidelines of this project
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files (and/or docker env variables)
  • [ ] I have added tests that prove my fix is effective or that my feature works

Author's Checklist

  • Build and install the plugin a Logstash and check it removes the segments.

How to test this PR locally

Testing is very similar to PR elastic/logstash#14255, it needs an upstream pipeline that produces DLQ events to be stored in DLQ segments, and a downstream pipeline with a DLQ input with clean_consumed feature enabled.
Verify that tail segments are removed.

DLQ feature needs to be enabled in logstash.yml.

  • prepare the ES index to reject the inserts, create an index and close it:
PUT test_index/
POST test_index/_close
  • in logstash.yml enable DLQ feature:
dead_letter_queue.enable: true
  • run a pipeline that produces events into DLQ:
input {
  generator {
    # this is some data
    message => '{"jvm" : {"threads" : {"count" : 49,"peak_count" : 50},"mem" : {"heap_used_percent" : 14,"heap_committed_in_bytes" : 309866496,"heap_max_in_bytes" : 1037959168,"heap_used_in_bytes" : 151686096,"non_heap_used_in_bytes" : 122486176,"non_heap_committed_in_bytes" : 133222400,"pools" : {"survivor" : {"peak_used_in_bytes" : 8912896,"used_in_bytes" : 288776,"peak_max_in_bytes" : 35782656,"max_in_bytes" : 35782656,"committed_in_bytes" : 8912896},"peak_used_in_bytes" : 148656848,"used_in_bytes" : 148656848,"peak_max_in_bytes" : 715849728,"max_in_bytes" : 715849728,"committed_in_bytes" : 229322752},"young" : {"peak_used_in_bytes" : 71630848,"used_in_bytes" : 2740472,"peak_max_in_bytes" : 286326784,"max_in_bytes" : 286326784,"committed_in_bytes" : 71630848}}},"gc" : {"collectors" : {"old" : {"collection_time_in_millis" : 607,"collection_count" : 12},"young" : {"collection_time_in_millis" : 4904,"collection_count" : 1033}}},"uptime_in_millis" : 1809643}'
    codec => json
  }
}

output {
  elasticsearch {
    index => "test_index"
    hosts => "http://localhost:9200"
    user => "elastic"
    password => "changeme"
  }
}
  • create a downstream pipeline with the clean_consumed enabled:
input {
  dead_letter_queue {
    path => "/<logstash path.data>/dead_letter_queue/"
    pipeline_id => "dlq_upstream"
    # new feature
    clean_consumed => true
  }
}

output {
  stdout {
    codec => dots
  }
}
  • verify that the segments are removed once consumed.
ll /<logstash path.data>/dead_letter_queue/dlq_upstream

Related issues

Use cases

As user with a DLQ producer and a consumer pipeline, Logstash as to be able to automatically free the DLQ space used by already consumed events from the consumer side.

@andsel andsel force-pushed the feature/clean_consumed_segments branch from 5d420f8 to ec89202 Compare July 5, 2022 07:25
@andsel
Copy link
Contributor Author

andsel commented Jul 5, 2022

The 🔴 CI on ELASTIC_STACK_VERSION=8.x is expected because the 8.4.0 is not yet published, the important part is the 🟢 SNAPSHOT=true ELASTIC_STACK_VERSION=8.x.
This is a chicken-egg problem, the new 8.4.0 needs to be released with the DLQ input plugin containing this PR.

@andsel andsel marked this pull request as ready for review July 5, 2022 15:21
@andsel andsel requested a review from robbavey July 5, 2022 15:21
if clean_consumed && !Gem::Requirement.new('>= 8.4.0').satisfied_by?(@logstash_version)
raise ConfigurationError.new("clean_consumed can be used only with Logstash version 8.4.0 and above")
end
if clean_consumed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should log or fail the pipeline if we are implicitly changing a configuration setting, particularly one that is explicitly set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we could enforce that the user sets also commit_offsets, I thought about the implicit setting once clean_consumed to smooth the adoption. But in other plugins we tend to be more explicit, so I'll switch to raising a configuration error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default commit_offset is true so if the user doesn't explicitly set to false, the adoption is smooth.
Fixed with 114ab84

@andsel andsel requested a review from robbavey July 6, 2022 08:50
Copy link
Contributor

@robbavey robbavey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits

- Better wording for error.
- Fixed RSpec tests to adhere to spec style guide.

Co-authored-by: Rob Bavey <rob.bavey@elastic.co>
@andsel andsel requested a review from robbavey July 6, 2022 13:30
Copy link
Contributor

@robbavey robbavey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@andsel andsel merged commit 5d0d230 into logstash-plugins:main Jul 6, 2022
@andsel andsel self-assigned this Jul 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants