-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Azure Event Hubs Output - Support setting partition key #10762
Comments
Hi, Per the link you provided and Azure/azure-event-hubs-go#134 it does look like setting the event's My concern however is how the partition key is determined. Your proposal is to provide a field/tag value to read from and set the partition key from there. This could result in a huge number of partitions getting generated. My understanding is while you can create more partitions to increase throughput, this might result in dozens, hundreds, etc. of partitions getting created if you are not careful. This does not seem ideal either. Based on your suggestion of "SubscriberId" how many partitions would you be generating with your data? I assume having a setting to allow a specific Thanks |
You won’t create more partitions than you configure on the eventhub side. That will always be a hard limit no matter the cardinality of the partition key value
The partition key value gets hashed and that hash determines which of the available partitions the event will get sent to as the potential hash values get split evenly over the partitions
It’s loosely equivalent to how load balancers will balance connections and perform stickiness
|
Ah, thank you for the explanation that does indeed help clarify the situation. next steps: update plugin to add a |
I put up #10795 as a quick test of what this might look like. If you want to grab one of the artifacts and test it, it would be very helpful. |
Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you! |
Feature Request
Opening a feature request kicks off a discussion.
Proposal:
Current EventHub output works great - for some users there is a need to control how events get sent to specific partitions for event ordering or other similar processing requirements
The Azure Eventhubs Go module which the output plugin uses suggests that this could/should be as simple as setting the PartitionKey value for an event and then eventhub will take care of hashing based on that key and consistently send events with that particular value to the same partition. - https://github.com/Azure/azure-event-hubs-go#send-and-receive
I propose that we add some configuration that allows the user to configure which value to set for this based on a specific Tag or data field with the default to omit this entirely which would retain current behaviour by default.
e.g.
partition_key_field = "SubscriberId"
Current behavior:
No partition key value is set on events meaning all events are randomly load balanced over eventhub partitions
Desired behavior:
PartitionKey value gets set on events so that they are consistently load balanced over eventhub partitions in a deterministic manner
Use case:
This is useful because by default downstream processing can then take advantage that specific events will group into specific partitions and can perform relevant optimizations accordingly. an example in this scenario is Azure Stream Analytics Jobs
Also important to note is that some events may be very order dependent - being able to control events going to specific partitions allows for order of events to be guaranteed in subsequent processing
The text was updated successfully, but these errors were encountered: