You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add epoch ID = epoch number + checksum of projection!
Done via compare() func.
Change all protocol ops to add epoch ID
Add projection store to each FLU.
What should the API look like? (borrow from chain mgr PoC?)
Yeah, I think that’s pretty complete. Steal it now, worry later.
Choose protocol & TCP port. Share with get/put? Separate?
Hrm, I like the idea of having a single TCP port to talk to any single
FLU.
To make the protocol “easy” to hack, how about using the same basic
method as append/write where there’s a variable size blob. But we’ll
format that blob as a term_to_binary(). Then dispatch to a single
func, and pattern match Erlang style in that func.
Do it.
Finish OTP’izing the Chain Manager with FLU & proj store processes
Eliminate the timeout exception for the client: just {error,timeout} ret
Move prototype/chain-manager code to “top” of source tree
Preserve current test code (leave as-is? tiny changes?)
Make chain manager code flexible enough to run “real world” or “sim”
Add projection wedging logic to each FLU.
Implement real data repair, orchestrated by the chain manager
Change all protocol ops to enforce the epoch ID
Add no-wedging state to make testing easier?
Adapt the projection-aware, CR-implementing client from demo-day
Add major comment sections to the CR-impl client
Simple basho_bench driver, put some unscientific chalk on the benchtop
Create parallel PULSE test for basic API plus chain manager repair
Add client-side vs. server-side checksum type, expand client API?
Add gproc and get rid of registered name rendezvous
Fixes the atom table leak
Fixes the problem of having active sequencer for the same prefix
on two FLUS in the same VM
Fix all known bugs/cruft with Chain Manager (list below)
Fix known bugs
Clean up crufty TODO comments and other obvious cruft
Re-add verification step of stable epochs, including inner projections!
Attempt to remove cruft items in flapping_i?
Move the FLU server to gen_server behavior?
Chain manager CP mode, Plan B
SKIP Maybe? Change ch_mgr to use middleworker
Is it worthwhile? Is the parallelism so important? No, probably.
SKIP Move middleworker func to utility module?
Add new proc to psup group
Name: machi_fitness
ch_mgr keeps its current proc struct: i.e. same 1 proc as today
NO chmgr asks hosed mgr for hosed list @ start of react_to_env
For all hosed, do async: try to read latest proj.
NO If OK, inform hosed mgr: status change will be used by next HC iter.
NO If fail, no change, because that server is already known to be hosed
For all non-hosed, continue as the chain manager code does today
Any new errors are added to UpNodes/DownNodes tracking as used today
At end of react loop, if UpNodes list differs, inform hosed mgr.
If map key is not atom, then atom->string or atom->binary is fine.
For map value, is it possible CRDT LWW type?
Investigate riak_dt data structure definition, manipulating, etc.
Add dependency on riak_dt
Update is an entire dict from Observer O
Merge my pending map + update map + my last mod time + my unfit list
if merged /= pending:
Schedule async tick (more)
Tick message contains list of servers with differing state as of this
instant in time… we want to avoid triggering decisions about
fitness/unfitness for other servers where we might have received less
than a full time period’s worth of waiting.
Spam merged map to All_list – [Me]
Set pending <- merged
When we receive an async tick
set active map <- pending map for all servers in ticks list
Send ch_mgr a react_to_env tick trigger
react_to_env tick trigger actions
Filter active map to remove stale entries (i.e. no update in 1 hour)
If time since last map spam is too long, spam our pending map
Proceed with normal react processing, using active map for AllHosed!