-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worker goes idle forever #20
Comments
We are aware of this issue (leases not created due to delay in shards appearing in Streams metadata) and released a fix in v1.4.0. Are you initializing your worker using the recommended factory method mentioned in the Readme? |
Thank you for your fast response! Great to know you already have a fix for it. We are not using the factory mentioned in the readme, but I will give it a try now. If I understand correctly, when Proxy is used, it will detect case when some new shards are not returned, and will try a few more times before returning, so that the new shards are returned. Is it more or less correct? Could you please update the documentation? I was following the Walkthough there, but it does not use the added and recommended worker factory. People may fall in the same problem in the future. |
The fix is in production for nearly a week now. Since then, the issue did not appear. We can see in logs, that in the last 3 days, the added proxy spotted and resolved inconsistencies roughly once daily.
I think we can consider the problem resolved. Thank you guys for fixing it! I will leave the issue open as a reminder to update the documentation. |
I have requested for the documentation/walkthrough to be updated. |
Documentation has been updated. Closing this issue. |
In one of our applications, we have observed that DynamoDB Streams processing sometimes stops until application is restarted. The first time it happened it caused quite a headache, as we discovered it more than 24 hours later (some data was no longer available in the stream). Now, with monitoring in place, we can see it happens every few days (happened 4 times so far). We have observed the following:
SHARD_END
(not every time though). RecordProcessor is shut down with statusTERMINATE
and no new RecordProcessor is created. ShutdownTask does not report CreateLeases metrics, which it usually does.No activities assigned
. We can see in lease table that there is only one shard with checkpoint atSHARD_END
. When refreshing the table, we can see thatleaseCounter
gets incremented. The TakeLeases and RenewAllLeases operations keep successfully running (by successfully I mean it reports success in metrics). LeaseTaker sees no new shards to take.TRIM_HORIZON
, one is child of the shard with checkpoint atSHARD_END
and parent of the other shard withTRIM_HORIZON
checkpoint. The application resumes processing where it left off (or at oldest available data).Checking KCL library implementation, we have noticed that LeaseTaker will take new leases only if these are available in the lease table. Discovering and inserting new leases to lease table happens only on 2 occasions: on worker initialization and on reaching shard end. We suspect that sometimes when shard end is reached and shards are listed, information about new shards is not yet available. Because of that, no new shards are inserted into lease table and so LeaseTaker will not see the new shards. As no shard is being consumed, no shard end is reached, no shards are ever inserted to lease table, and so the worker stays idle forever. Given there is more than one worker instance, the problem is probably less visible, since shards will be synced again when another worker finishes its shard, unlocking the idle worker. Nevertheless, there will be a period where worker is idle because shards are not in sync in lease table.
I am not sure, whether this issue belongs to KCL library or the DynamoDB Adapter. It seems KCL is working under assumption, that information about new shards is always available before shard end is reached. I don't know whether this assumption is intentional and violated by the Adapter, or whether the assumption is wrong and has to be fixed in KCL. Therefore I created this issue in both projects. The same issue in the other project: awslabs/amazon-kinesis-client#442
Libraries used:
com.amazonaws:dynamodb-streams-kinesis-adapter:1.4.0
com.amazonaws:amazon-kinesis-client:1.9.0
The text was updated successfully, but these errors were encountered: