Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #500 to avoid potential hang and event loss #501

Merged
merged 5 commits into from
Mar 6, 2024

Conversation

ruitianzhong
Copy link
Contributor

@ruitianzhong ruitianzhong commented Mar 3, 2024

  • add IfUsed() for worker to check if it is used.
  • add Get() for routine that generates event before using the worker and Put() when after using the worker. They should not be called concurently(only one producer). So it panics if unexpected behavior happen.
  • add clean up logic for ew.incoming and return only after both the ew.incoming is empty and ew is not used.

The reasoning for that is as follows.

When returned from ew.Close(), there are two possiblities:

  1. no routine can touch it.
  2. one routine can still touch ew because getWorkerByUUID()
    happen before ew.Close()

When no routine can touch it (i.e.,ew.IfUsed == false),
we just drain the ew.incoming and return.

When one routine can touch it (i.e.,ew.IfUsed == true), we ensure
that we only return after the routine can not touch it
(i.e.,ew.IfUsed == false). At this point, we can ensure that no
other routine will touch it and send events through the ew.incoming.
So, we return.

Because eworker has been deleted from workqueue after ew.Close()
(ordered by a workqueue lock), at this point, we can ensure that
no ew will not be touched even in the future. So the return is
safe.

Finally, to verify it, in ruitianzhong@e5bce57 , run go test -v ./pkg/event_processor/ -run TestHang , no hang happened.

Because the poc inject delay to reliably reproduce the hang, I do not add it as a unit test in bug-fix branch.

Signed-off-by: ruitianzhong <ruitian-zhong@outlook.com>
@cfc4n cfc4n added question Further information is requested test Tests and some Magic labels Mar 3, 2024
Signed-off-by: ruitianzhong <ruitian-zhong@outlook.com>
Signed-off-by: ruitianzhong <ruitian-zhong@outlook.com>
Signed-off-by: ruitianzhong <ruitian-zhong@outlook.com>
Signed-off-by: ruitianzhong <ruitian-zhong@outlook.com>
@ruitianzhong ruitianzhong requested a review from cfc4n March 5, 2024 16:50
Copy link
Member

@cfc4n cfc4n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

pkg/event_processor/iworker.go Show resolved Hide resolved
@cfc4n cfc4n merged commit a286703 into gojue:master Mar 6, 2024
6 checks passed
@ruitianzhong ruitianzhong deleted the bug-fix branch March 6, 2024 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested test Tests and some Magic
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants