EventHubConsumerClient is restarting reading of the partitions from the start of partition instead of respecting starting_position #13548

Snezhana · 2020-09-03T15:03:29Z

Package Name: azure-eventhub
Package Version:
azure-eventhub5.1.0
azure-eventhub-checkpointstoreblob==1.1.0
Operating System: Linux
Python Version: 3.8.1

Describe the bug
We are using EventHubConsumerClient as as intelligent consumer agents for managing the reading of different readers from one consumer group. We use BlobCheckpointStore as checkpoint_store when creating a consumer but we use it only for managing of the readers not for check-pointing since we need to go back 50h in the past when the consumer is restarted. We are providing the starting_position in receive method. Our starting position is 50h in the past and works fine when the consumer group is started for the first time. But if there is any restart or we decide to stop and start the reader again, the reading of the partitions is starting from the start and with 50h offset as we want.

To Reproduce
Steps to reproduce the behavior:

Create a new consumer group for the topic
Create EventHubConsumerClient
consumer_client = EventHubConsumerClient.from_connection_string(
conn_str=connection_str,
consumer_group=consumer_group,
checkpoint_store=checkpoint_store,
auth_timeout=0
)
Start receiving event
consumer_client.receive(on_event=on_event, **{'starting_position': offsetdate})

Result: The events are received starting from the offset - which is correct

Stop the consumer and run it again with the same consumer group

Result: The events are received from the start of the partition
Expected behavior
After restarting the consumer with the same consumer group, it should start reading from the offset which we set on receive method.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

KieranBrantnerMagee · 2020-09-05T07:06:44Z

Hey Snezhana, thanks for reaching out.
From a brief read of the docstrings for starting_position and the code surrounding the checkpoint init logic the behavior would seem to line up with the symptoms you observed wherein the starting_position is only observed if there isn't a checkpoint in the store; this begs the question if you're checkpointing at all? (Given your mentioned intent to only use offset I might assume not, which would connect the dots as the checkpoint store would likely be initialized to the start of the partition, but figured I should ask.)

We'll need to look into this in the coming week if the above doesn't explain what's going on, but I would mention that this isn't the first request we've had for a "offset prior to now" style of read, and I'll make sure to convey this. In terms of potential workarounds; perhaps hold onto some amount of history and not checkpoint until you know you have, in your case, 50h of new events? (so that you can rely on the checkpoint store exclusively)

Don't hesitate to let me know if I've misunderstood some aspect of this, or if further clarity is needed.

Snezhana · 2020-09-07T07:27:45Z

Hi, Thanks for the replay.
Yes, indeed, we use the checkpoint store only to manage the reading of more pods with same consumer group, but we don't checkpoint the offset cause of the 50h history we need.
So, as I understood, currently there is no way to use starting_postition and checkpoint store together in our situation, unless we implement workaround?

KieranBrantnerMagee · 2020-09-15T00:12:47Z

Correct; sorry to not be able to give a neater solution to this out-of-box.

That said, don't hesitate to let me know if you had any other questions or help I could provide in line with this.

ghost · 2020-09-22T02:00:22Z

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

KieranBrantnerMagee · 2020-09-25T17:47:51Z

Closing as I believe we have a mitigation/understanding of the current interaction. Don't hesitate to reopen this, and if this scenario continues to be a friction point for folks, I'm more than happy to accumulate those asks to present a case to allow this, so anyone (addressing this both to OP and anyone who comes across this in the future) can feel free to give a shout if relevant.

TimeSeriesInsingts t2 config (Azure#13548) * TimeSeriesInsingts t2 config * timeseriesinsights readme.python change

xiangyan99 assigned KieranBrantnerMagee Sep 3, 2020

kaerm added Client This issue points to a problem in the data-plane of the library. Event Hubs labels Sep 3, 2020

ghost removed the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Sep 3, 2020

KieranBrantnerMagee added the needs-author-feedback Workflow: More information is needed from author to address the issue. label Sep 5, 2020

ghost added needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team and removed needs-author-feedback Workflow: More information is needed from author to address the issue. labels Sep 7, 2020

KieranBrantnerMagee added needs-author-feedback Workflow: More information is needed from author to address the issue. and removed needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team labels Sep 15, 2020

ghost added the no-recent-activity There has been no recent activity on this issue. label Sep 22, 2020

KieranBrantnerMagee closed this as completed Sep 25, 2020

ghost removed the no-recent-activity There has been no recent activity on this issue. label Sep 25, 2020

github-actions bot locked and limited conversation to collaborators Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EventHubConsumerClient is restarting reading of the partitions from the start of partition instead of respecting starting_position #13548

EventHubConsumerClient is restarting reading of the partitions from the start of partition instead of respecting starting_position #13548

Snezhana commented Sep 3, 2020

KieranBrantnerMagee commented Sep 5, 2020

Snezhana commented Sep 7, 2020

KieranBrantnerMagee commented Sep 15, 2020

ghost commented Sep 22, 2020

KieranBrantnerMagee commented Sep 25, 2020

EventHubConsumerClient is restarting reading of the partitions from the start of partition instead of respecting starting_position #13548

EventHubConsumerClient is restarting reading of the partitions from the start of partition instead of respecting starting_position #13548

Comments

Snezhana commented Sep 3, 2020

KieranBrantnerMagee commented Sep 5, 2020

Snezhana commented Sep 7, 2020

KieranBrantnerMagee commented Sep 15, 2020

ghost commented Sep 22, 2020

KieranBrantnerMagee commented Sep 25, 2020