-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/splunkhec] Add the ability to specify a Failover Endpoint and Enable a Circuit Breaker #23822
Conversation
|
64d00ff
to
c80026c
Compare
Code rebased to resolve CLA issue with older commits. |
@cparkins I can say right now that while this is an awesome idea and a great contribution, I loathe turning this code which already feels like a lot of spaghetti into something even more complex. I will take a good look once tests pass but it's not something I'm looking forward to support, and I have posted comments on the issue with my thoughts. |
@atoulme Fortunately the implementation is pretty simple. I'd love to see this turn into something more like a failover connector but I don't really know how to code that yet. My first implementation was done in the core repository but I couldn't get it to work 👎
I've responded to all of your comments. Hopefully with some explanation. I think adding resiliency feature is worthwhile but there may be better ways to implement some or all of what we are trying to do. I have personally rebased this code about 4 times since 0.74 and can understand your trepidation in making this exporter even more complex than it already is. |
No worries. It will help going forward if you can open issues ahead of time, as we have changes we're also considering to help with batching that may affect the design here. |
@atoulme I thought there was one, but it was related to the failover connector. This code has been something that I've been working on for quite some time and we have tested in previous versions of the exporter. If the batching changes again it would probably be similar to the last update. Unfortunately this solution seems to require a second buffer to send the failover request without destroying the payload. |
if failoverBuf != nil { | ||
_, failoverErr := failoverBuf.Write(b) | ||
if failoverErr != nil { | ||
c.logger.Info("Error writing failover data", zap.Error(failoverErr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove these log entries once we are done with end-to-end testing.
} | ||
permanentErrors = append(permanentErrors, consumererror.NewPermanent(fmt.Errorf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was changed to match how Traces and Logs were handling these errors.
- Allow Failover URI to be configured and used in case of a failed request. - Allow a Circuit Breaker to be configured and used for the primary and failover url.
54c8278
to
44693e8
Compare
@atoulme I think this is finally ready for review! |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
@atoulme Tests are now passing, any chance I can get a review? |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
I'm curious what's next here @atoulme ? I see that "discussion needed" was labeled on this PR and that the originating issue 23821 is on it's way to being closed in about 45 days due to being stale as well. How does this discussion proceed? Can I help with something here? Thanks! |
I didn't have time to review the changes and the PR lapsed. Anyone can bring this up in a SIG meeting - hence the "discussion needed" label. |
Description:
Add the ability to specify an alternative Splunk HEC Endpoint when the primary endpoint is failing. In addition, if failures are occurring on a regular basis allow each endpoint to be disabled with a Circuit Breaker pattern.
Link to tracking Issue:
23821
Testing:
Testing locally by setting a primary endpoint that would always fail and a secondary that would fail or succeed. Working on testing end-to-end in Test Environment.
Documentation:
Not completed yet.