-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/http sink #67
Conversation
vidosits
commented
Sep 24, 2023
•
edited
Loading
edited
- Implement Sink
- Document config params and update README
- Create tests
internal/eventing/sink/http/http.go
Outdated
func (h *httpSink) Emit( | ||
_ sink.Context, _ time.Time, topicName string, key, envelope schema.Struct, | ||
) error { | ||
delete(envelope, "schema") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there should be a general property to disable schema sending. I think right now it'd be the only sink not including it. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this mainly depends on where you'd like to take the project. Would you like to have everything or almost everything as a configurable parameter?
If so then yeah it would make sense to be able to turn on/off the sending of the schema from the config file.
I can also see a world where the package always emits everything and it's up to the receiver to discard whatever is not needed or if one would like to avoid unnecessary traffic then one could transform the payload with a plugin before emitting.
I think you probably have better things to work on in this project then having to make schema.Struct
be optional throughout all the sinks just because this one sink and the stdout
one does not send it.
I have to be honest and tell you that I've started the http sink by copying the stdout
one and that one starts the emitting process by deleting the schema.
For now I think it'd be best if all the sinks behaved the same way (aside from maybe the stdout
one, see above), not just because of consistency, but because if you decide to make the schema sending optional you'll have an easier time refactoring and not having to wade through code just to find out if a particular sink decides to omit schema by default.
I've pushed a commit that adds back in the schema for this sink as well. I'll use a plugin to transform the payload before sending to the sink anyway, so I'll just omit it there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree. I think it could be interesting since the schema creates quite some overhead and traffic. I also think it may be better to provide the option to create a separate stream of schema data, which could be routed to a schema registry, but I never used one before so I wouldn't have enough insight into it.
I think for now we can just keep it in and make it up to the receiver to discard it. For traffic reasons it may be interesting to eventually provide the option. Tbh, even stdout should support it, since the idea (next to testing) was to pipe the output somewhere else. The reason why you can send any log message to stderr 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[...] since the schema creates quite some overhead and traffic. [...] For traffic reasons it may be interesting to eventually provide the option
Yeah, this is probably the biggest argument for making it optional.
I think for now we can just keep it in and make it up to the receiver to discard it.
What are your thoughts on discarding it with a Plugin before sending it? We're streaming anywhere between 10k - 100k per second with timescaledb-event-streamer
although it's on a private network I can see how one would like to exclude the schema from being sent to save on ingress/egress/networking cost, i.e.: like with a cloud provider.
internal/eventing/sink/http/http.go
Outdated
} | ||
} | ||
|
||
address := config.GetOrDefault(c, config.PropertyHttpUrl, "http://localhost:80") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have a separate transport, wouldn't it make sense to remove the protocol part here? Alternatively instead of using the TLS enabled prop, you could just check the protocol and if it's HTTPS go and assume TLS enabled (and just read the remaining properties).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right, I was just hacking it together so the hard-coded one stayed in. I've pushed the commit where we infer the usage of TLS by the prefix of the url and if it's present then take the TLS settings into account.
}) | ||
|
||
go func() { | ||
if err := http.ListenAndServe(address, nil); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to use an actual instance and shut it down cleanly in the tear down? That way you also wouldn't be required to use a fixed port number (while I don't think it's a real issue though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It absolutely would, that was my first approach as well, but I wrote less than 10k lines of golang in total, so I couldn't figure out why my server instance seems to have stopped instantly. I could make it work this way though.
I tried my best for the second try, let me know if it's better this way or not. Thanks! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have a look! No worries, not a Go master myself, even though I've wrote quite some go code by now. I'm originally a Java guy 😊
I'll be off until the weekend (few days trip). Just want to make sure you know that I'm not ignoring you, but be less online for a few 😂 |
Thanks for letting me know! I did not think you were ignoring me, you also don't owe me anything, but I appreciate it nonetheless :) |
While being out I actually thought about sending data to an HTTP endpoint immediately. That means that all queuing has to happen somewhere else, either in here or right behind the HTTP endpoint. It also means we may have to add some back-pressure to slow down handling of incoming messages (which in turn may prevent us from handling keep-alive messages in time and improve the reconnect functionality). I think you mentioned you're already using it. Have you load-tested it? Maybe there should be a bit of a warning that long running operations may cause issues 🤔 |
Yeah, although if you're using it in a way that you need to buffer messages maybe the HTTP sink is the wrong choice? For example, Kafka may be down or writing to it may be slow. Debezium for example provides a queue for that.. In summary, I agree, a queue would be nice, but then maybe it wouldn't have to be HTTP sink specific.
No proper load testing has happened, but we are streaming 10k-100k changes per second from one of our timescale clusters and there were no problems so far whatsoever. Although, the service we're sending these event to is written with Actix and it's just ingesting the data, so the request doesn't block for long. It was specifically set up that way, because I knew it was up to us to guarantee that we can keep up with the event streamer (because there is no queuing). We haven't seen any issues (so far).
Yeah, I guess that could be nice. |
10-100k events sounds nice! I'll be ok leaving it that way then. Guess we can add a little note to the http endpoint to make sure people understand it's up to them then! 👍 |
Thanks a lot again :) Keep the contributions coming if you have other ideas :) |
No problem, thanks for merging and thanks for the great package! |