Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Azure Event Hubs Output - Support setting partition key #10762

Closed
anthonysomerset opened this issue Mar 2, 2022 · 5 comments · Fixed by #11076
Closed

Feature Request: Azure Event Hubs Output - Support setting partition key #10762

anthonysomerset opened this issue Mar 2, 2022 · 5 comments · Fixed by #11076
Labels
area/azure Azure plugins including eventhub_consumer, azure_storage_queue, azure_monitor feature request Requests for new plugin and for new features to existing plugins help wanted Request for community participation, code, contribution waiting for response waiting for response from contributor

Comments

@anthonysomerset
Copy link

Feature Request

Opening a feature request kicks off a discussion.

Proposal:

Current EventHub output works great - for some users there is a need to control how events get sent to specific partitions for event ordering or other similar processing requirements

The Azure Eventhubs Go module which the output plugin uses suggests that this could/should be as simple as setting the PartitionKey value for an event and then eventhub will take care of hashing based on that key and consistently send events with that particular value to the same partition. - https://github.com/Azure/azure-event-hubs-go#send-and-receive

I propose that we add some configuration that allows the user to configure which value to set for this based on a specific Tag or data field with the default to omit this entirely which would retain current behaviour by default.

e.g.

partition_key_field = "SubscriberId"

Current behavior:

No partition key value is set on events meaning all events are randomly load balanced over eventhub partitions

Desired behavior:

PartitionKey value gets set on events so that they are consistently load balanced over eventhub partitions in a deterministic manner

Use case:

This is useful because by default downstream processing can then take advantage that specific events will group into specific partitions and can perform relevant optimizations accordingly. an example in this scenario is Azure Stream Analytics Jobs

Also important to note is that some events may be very order dependent - being able to control events going to specific partitions allows for order of events to be guaranteed in subsequent processing

@anthonysomerset anthonysomerset added the feature request Requests for new plugin and for new features to existing plugins label Mar 2, 2022
@telegraf-tiger telegraf-tiger bot added the area/azure Azure plugins including eventhub_consumer, azure_storage_queue, azure_monitor label Mar 2, 2022
@powersj
Copy link
Contributor

powersj commented Mar 7, 2022

Hi,

Per the link you provided and Azure/azure-event-hubs-go#134 it does look like setting the event's PartitionKey will send that event to a specific partition.

My concern however is how the partition key is determined. Your proposal is to provide a field/tag value to read from and set the partition key from there. This could result in a huge number of partitions getting generated. My understanding is while you can create more partitions to increase throughput, this might result in dozens, hundreds, etc. of partitions getting created if you are not careful. This does not seem ideal either.

Based on your suggestion of "SubscriberId" how many partitions would you be generating with your data?

I assume having a setting to allow a specific event_hubs output to all go to a specific partition is not enough?

Thanks

@powersj powersj added the waiting for response waiting for response from contributor label Mar 7, 2022
@anthonysomerset
Copy link
Author

anthonysomerset commented Mar 7, 2022 via email

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Mar 7, 2022
@powersj
Copy link
Contributor

powersj commented Mar 8, 2022

Ah, thank you for the explanation that does indeed help clarify the situation.

next steps: update plugin to add a partition_key_field option that optionally takes a field to read. If the field exists, then the partition key will be set with this value for each event before adding to the batch.

@powersj powersj added the help wanted Request for community participation, code, contribution label Mar 8, 2022
powersj added a commit to powersj/telegraf that referenced this issue Mar 8, 2022
@powersj
Copy link
Contributor

powersj commented Mar 8, 2022

I put up #10795 as a quick test of what this might look like. If you want to grab one of the artifacts and test it, it would be very helpful.

@powersj powersj added the waiting for response waiting for response from contributor label Mar 24, 2022
@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Apr 8, 2022

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Page. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/azure Azure plugins including eventhub_consumer, azure_storage_queue, azure_monitor feature request Requests for new plugin and for new features to existing plugins help wanted Request for community participation, code, contribution waiting for response waiting for response from contributor
Projects
None yet
2 participants