Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mechanism to ensure local data sync thread is always running #43

Closed
nitisht opened this issue Aug 17, 2022 · 0 comments · Fixed by #49
Closed

Mechanism to ensure local data sync thread is always running #43

nitisht opened this issue Aug 17, 2022 · 0 comments · Fixed by #49
Assignees

Comments

@nitisht
Copy link
Member

nitisht commented Aug 17, 2022

The local data sync thread is responsible for moving the data to a local directory and then move it to object storage as needed. If this thread crashes for any reason, we should be able detect that, try restarting the thread.

If the thread can't be respawned due to whatever reasons, the server should at the end of granularity period should stop taking post events (because we can't ensure data integrity).

@nitisht nitisht added the bug label Aug 17, 2022
nitisht pushed a commit that referenced this issue Aug 20, 2022
This PR makes bunch of changes to ways how local sync 
and s3 sync threads are managed inside parseable. The 
goal is to make parseable main thread be aware of unwanted 
failure inside either of these dedicated sync thread. Additionally 
we want to have control over life cycle of these threads and 
have them stop on command, This can be later explored in 
future if we want to have graceful shutdown an also to 
guarantee more data consistency in case of failure.

This solution works something like this:

Sync threads are spawned with pair of one shot channels 
(named with suffix inbox and outbox) for communication with 
main thread.

inbox is sender type which can be used to stop the respective 
thread by breaking its scheduler loop. outbox is receiver type 
which is polled inside the main thread's looped select. So that 
whenever we receive a message from thread we stop.
scheduler is ran inside loop which runs pending tasks and polls 
the channel for any message from main thread 
( if any then stop the thread )

All of this is wrapped by a catch unwind. So in case of an 
unwinding because of panic there is a chance to let main thread 
know and handle accordingly. This is added as a safeguard for 
now but later we need to verify that local_sync and s3_sync 
don't have anything that can panic.

Fixes #43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants