Skip to content

Latest commit

 

History

History
114 lines (86 loc) · 4.63 KB

TODO-shortterm.org

File metadata and controls

114 lines (86 loc) · 4.63 KB

To Do list

remove the escript* stuff from machi_util.erl

Add functions to manipulate 1-chain projections

  • Add epoch ID = epoch number + checksum of projection! Done via compare() func.

Change all protocol ops to add epoch ID

Add projection store to each FLU.

What should the API look like? (borrow from chain mgr PoC?)

Yeah, I think that’s pretty complete. Steal it now, worry later.

Choose protocol & TCP port. Share with get/put? Separate?

Hrm, I like the idea of having a single TCP port to talk to any single FLU.

To make the protocol “easy” to hack, how about using the same basic method as append/write where there’s a variable size blob. But we’ll format that blob as a term_to_binary(). Then dispatch to a single func, and pattern match Erlang style in that func.

Do it.

Finish OTP’izing the Chain Manager with FLU & proj store processes

Eliminate the timeout exception for the client: just {error,timeout} ret

Move prototype/chain-manager code to “top” of source tree

Preserve current test code (leave as-is? tiny changes?)

Make chain manager code flexible enough to run “real world” or “sim”

Add projection wedging logic to each FLU.

Implement real data repair, orchestrated by the chain manager

Change all protocol ops to enforce the epoch ID

  • Add no-wedging state to make testing easier?

Adapt the projection-aware, CR-implementing client from demo-day

Add major comment sections to the CR-impl client

Simple basho_bench driver, put some unscientific chalk on the benchtop

Create parallel PULSE test for basic API plus chain manager repair

Add client-side vs. server-side checksum type, expand client API?

Add gproc and get rid of registered name rendezvous

Fixes the atom table leak

Fixes the problem of having active sequencer for the same prefix

on two FLUS in the same VM

Fix all known bugs/cruft with Chain Manager (list below)

Fix known bugs

Clean up crufty TODO comments and other obvious cruft

Re-add verification step of stable epochs, including inner projections!

Attempt to remove cruft items in flapping_i?

Move the FLU server to gen_server behavior?

Chain manager CP mode, Plan B

SKIP Maybe? Change ch_mgr to use middleworker

Is it worthwhile? Is the parallelism so important? No, probably.

SKIP Move middleworker func to utility module?

Add new proc to psup group

Name: machi_fitness

ch_mgr keeps its current proc struct: i.e. same 1 proc as today

NO chmgr asks hosed mgr for hosed list @ start of react_to_env

For all hosed, do async: try to read latest proj.

NO If OK, inform hosed mgr: status change will be used by next HC iter.

NO If fail, no change, because that server is already known to be hosed

For all non-hosed, continue as the chain manager code does today

Any new errors are added to UpNodes/DownNodes tracking as used today

At end of react loop, if UpNodes list differs, inform hosed mgr.

fitness_mon, the fitness monitor

Map key & val sketch

Logical sketch:

Map key: ObservingServerName::atom()

Map val: { ObservingServerLastModTime::now(), UnfitList::list(ServerName::atom()), AdminDownList::list(ServerName::atom()), Props::proplist() }

Implementation sketch:

  1. Use CRDT map.
  2. If map key is not atom, then atom->string or atom->binary is fine.
  3. For map value, is it possible CRDT LWW type?

Investigate riak_dt data structure definition, manipulating, etc.

Add dependency on riak_dt

Update is an entire dict from Observer O

Merge my pending map + update map + my last mod time + my unfit list

if merged /= pending:

Schedule async tick (more)

Tick message contains list of servers with differing state as of this instant in time… we want to avoid triggering decisions about fitness/unfitness for other servers where we might have received less than a full time period’s worth of waiting.

Spam merged map to All_list – [Me]

Set pending <- merged

When we receive an async tick

set active map <- pending map for all servers in ticks list

Send ch_mgr a react_to_env tick trigger

react_to_env tick trigger actions

Filter active map to remove stale entries (i.e. no update in 1 hour)

If time since last map spam is too long, spam our pending map

Proceed with normal react processing, using active map for AllHosed!