feat(dscache): Fill dscache when saving, using event.Bus. #1133

dustmop · 2020-02-21T01:29:20Z

Logbook informs event.Bus of dataset inits and saves. Init events are sent to ds cache, as well as save events. Remove old logbook.action. When saving a dataset for the first commit, the flag --use-dscache will opt-in to dscache by creating a new instance.

b5

Exciting stuff! I'd like to make sure we get off on the right foot with the event bus. Locking down the intended uses for events is the best way to avoid event nightmares IMHO.

b5 · 2020-02-21T13:03:45Z

event/ds_change.go

+	"github.com/qri-io/dataset"
+)
+
+var (


can these be const?

b5 · 2020-02-21T13:08:33Z

event/ds_change.go

+
+var (
+	// ETDatasetInit is for events that initialize datasets
+	ETDatasetInit = Topic("ds:init")


I think we should namespace events according to the package that emits them. This would be

ETLogbookDatasetNameInitialized = Topic("logbook:DatasetNameInitialized")

Given that events are this super loose thing, I think it's appropriate to use long, verbose names describing exactly what they're doing, including a tense. A general form:
package_name ':' MethodNameWithTense

b5 · 2020-02-21T14:12:28Z

dscache/dscache.go

+// Subscribe watches a logbook for actions that change the state
+func (d *Dscache) Subscribe(bus event.Bus) {
+	d.Messages = bus.Subscribe(event.ETDatasetInit, event.ETDatasetChange)
+	go func() {
+		for {
+			e := <-d.Messages
+			if dsChange, ok := e.Payload.(event.DatasetChangeEvent); ok {
+				if e.Topic == event.ETDatasetInit {
+					if err := d.updateInitDataset(dsChange); err != nil {
+						log.Error(err)
+					}
+				} else if e.Topic == event.ETDatasetChange {
+					if err := d.updateMoveCursor(dsChange); err != nil {
+						log.Error(err)
+					}
+				}
+			}
+		}
+	}()
+}


Ok lots to talk about here. Here's how I think we should refactor:

// We need to not leak goroutines. The context passed into the `New` constructor // needs to be plumbed into here and cleaned up if the passed-in context is // cancelled. func (d *Dscache) subscribe(ctx context.Context, bus event.Bus) { // Subsystems that are aware of events should not have an event channel field. // This allows code that breaks the encapsulation of the `subscribe` method. // Instead `subscribe` should use a local variable. eventsCh := bus.Subscribe(event.ETDatasetInit, event.ETDatasetChange) go func() { for { select { // The statement that listens for events should use the "commma ok" // pattern to check for a closed channel. If the event bus context is // cancelled before this goroutine exits, the event bus should close // listener channels. Breaking on !ok will gracefully close case e, ok := <-eventsCh: if !ok { break } if dsChange, ok := e.Payload.(event.DatasetChangeEvent); ok { if e.Topic == event.ETDatasetInit { if err := d.updateInitDataset(dsChange); err != nil { log.Error(err) } } else if e.Topic == event.ETDatasetChange { if err := d.updateMoveCursor(dsChange); err != nil { log.Error(err) } } } case <-ctx.Done(): // TODO (b5): add unsubscribe method to bus interface, the arg for // Unsubscribe should be the channel variable itself, which will be // available in this scope: // bus.Unsubscribe(eventsCh) break } }() } }

The general pattern I like to see emerge is a private subscribe method that encapsulates all event handling for a subsystem. We should use the same private method name across all subsystems for clarity. subscribe should always be called within the constructor and listen to the context given to the constructor. Passing a non-nil event bus to a constructor should always subscribe, cutting down on potential bugs caused by requiring package consumers to both construct & subscribe.

Finally, I think this is a good enough reason to add an Unsubscribe method to the bus interface, and call it if the context is cancelled. Currently we can get away with this because in all (production) cases the context given to Dscache will live for the same duration as the bus itself. But this might not always be the case (I'm thinking of other subsystems like cron). Having all subscribers clean up on close means we won't have to rely on assumption of aligned context duration. I'd be happy to PR in the Unsubscribe method.

This all makes sense, added as described

b5 · 2020-02-21T14:13:08Z

dscache/dscache.go

 	ProfileIDToUsername map[string]string
+	Messages            <-chan event.Event


Let's drop this field, see big Subscribe note

b5 · 2020-02-21T14:47:48Z

logbook/logbook.go

@@ -92,6 +93,7 @@ type Book struct {
 	pk         crypto.PrivKey
 	authorID   string
 	authorName string
+	bus        event.Bus


I'm a fan of this. We may want to consider adding a Publisher interface to the event package that only has the publish method, and make this a publisher field. Setup would be the same, just would add extra assurances that people can't subscribe in weird places.

We could also define a NilPublisher implementation that is nil-callable, which would drop the need for
if book.bus !=nil { every time we want publish.

dustmop · 2020-02-24T19:41:05Z

Addressed your comments

b5

Thanks so much! As soon as CI passes, let's get this merged

dustmop · 2020-02-24T22:39:10Z

Oh, just realized event.Bus is making these tests non-deterministic because the dscache may render before the event is fully handled. Also, probably need to put a mutex around the dscache update handler. I'll have to fix these tomorrow, then will merge.

Logbook informs event.Bus of dataset inits and saves. Init events are sent to ds cache, as well as save events. Remove old logbook.action. When saving a dataset for the first commit, the flag --use-dscache will opt-in to dscache by creating a new instance.

Pass bus to dscache constructor. Subscribe as a private method inside the constructor. Add an unsubscribe method to the bus, call it when the subscriber is finished.

…ocess

Add subscribe method to filesystem watcher. Add locks to dscache. Sync tests. Sync.wait has to count number of subscribers. Wait returns any errors that occur, delivery via Acknowledge method. Remove syncs when they are finished with Wait.

dustmop · 2020-02-27T19:10:09Z

Discussed offline, we're rethinking this approach. Using an event.Bus to update dscache doesn't seem necessary, and leads to lots of bad things like non-deterministic behavior and race conditions. Instead, we're just going to use a simple callback function from logbook. Leaving this PR for now for posterity, but going to switch over to this simpler approach, in #1140.

b5 · 2020-02-27T19:27:45Z

Here's my notes from our chat:

Ugh, just found a problem. The Syncronizer pattern assumes that multiple events of the same kind will not be emitted while holding a Sync lock. This won't be true in production. Easiest way to point this out is to launch 2 events of the same kind while waiting:

func TestSyncMultipleEventPublish(t *testing.T) {
	ctx := context.Background()
	counter := syncCounter{}

	bus := NewBus(ctx)
	ch1 := bus.Subscribe(ETMainTestEvent)
	handleEvents(bus, ch1, nil, &counter)

	s := bus.Synchronizer()
	go bus.Publish(ETMainTestEvent, 1)
	go bus.Publish(ETMainTestEvent, 1)
	time.Sleep(100)
	s.Wait()

	if counter.Count != 1 {
		t.Errorf("expected Count to be 1, got %d", counter.Count)
	}
}

We could solve this by allocating an identifier with each distinct "request", and attaching that to the syncronizer. In go it'd be best to use context for this sort of thing. Those adjustments might look like this:

// NewRequestContext creates a request context from the event package with an 
// embedded request ID value that can be used to distinguish event subscriptions
func NewRequestContext() context.Context {
	return context.WithValue(EventID, newRequestID(), context.Background())
}

func TestSyncMultipleEventPublish(t *testing.T) {
	// bus intentional uses a different context
	bus := NewBus(context.Background())
	counter := syncCounter{}

	s := bus.Syncronizer(reqCtx)

	handleEvents(bus, ch1, nil, &counter)

	reqCtx := NewContext()
	ch1 := bus.Subscribe(ETMainTestEvent)

	// publish the event
	bus.Publish(reqCtx, ETMainTestEvent, 1)
	// bus.Publish(ETMainTestEvent, 1)
	
	// wait will internally use the request identifier to distinguish contexts, 
	// counting only references that are published with the same request context
	err := <- s.Wait(reqCtx)
	if err != nil {
		t.Fatal(err)
	}

	if counter.Count != 1 {
		t.Errorf("expected Count to be 1, got %d", counter.Count)
	}
}

Relying on reference counting

the Syncronizer pattern is too niive about reference counting. The reference count may drop to zero before the process is truly finished:

Here's an example of an evented publish:

execution time ->

WaitGroup Reference Count:
0      1    0     1   0
•------•----• - - • - •
       * remote:LogsPublished
                  * remote:BlocksPublished
            * dscache ACK remote:LogsPublished
                      * dscache ACK remote:BlocksPublished

In the above example dscache needs to react to both remote events, but if any events are adknowledged before all events are published, the reference count will drop to zero prematurely, causing the Syncronizer to return early.

We can have two patterns.

I think the problem here we're thinking too either-or about using an event bus. We have patterns that already work, and some places where they really don't. We should only use patterns on the latter.

For any given set of business logic, there are functions on the critical path. Critical path functions MUST complete (either with success or error) before a request is considered finished. For this stuff, our current pattern of just calling functions in sequence, and maybe using an errorCollector is the stronger pattern, because it's properly declarative.

We want the critical path to be declartive. Events are for side effects.

It seems many of our packages have been designed with this critical path/side effect diachotomy in mind:

critical path packages

base (and by extension everything base imports. stuff like dsfs, stats, startf)
qfs
logbook
remote (client)
regclient
fsi
event
dscache

side-effect packages

remote
cron
watchfs
remote (server)
websockets
p2p

Some observations we can draw from this:

Critical path packages consist of discrete functions that
Cricital path packages should be nil callable
Critical path pacakge should not import side effect packages
Cricital path packages must talk to side-effect packages via events.
critical path constructor functions should have event.Publisher args, not event.Bus args.
Side effect packages can invoke critical path packages

Other things we considered

I was thinking a MaybeSubscribeOnce pattern returns an immideately-resolving promise if given a nil value:

// MaybeSubscribeOnce
promise := bus.MaybeSubscribeOnce(reqCtx, inst.dscache, ETDscacheRefAdded)

But this doesn't solve the core problem of keeping core logic & side effects in separate paradigms.

b5 · 2020-02-29T00:30:29Z

Ok, we’ve pulled the best bits out of this. You’ve served us well #1133, rest now.

dustmop self-assigned this Feb 21, 2020

dustmop added feat A code change that adds functionality logbook logbook subsystem issues labels Feb 21, 2020

dustmop requested a review from b5 February 21, 2020 01:29

b5 requested changes Feb 21, 2020

View reviewed changes

b5 self-requested a review February 24, 2020 20:59

b5 approved these changes Feb 24, 2020

View reviewed changes

dustmop force-pushed the dscache-init branch from a2d02cc to 31e0a94 Compare February 25, 2020 18:49

dustmop added 2 commits February 25, 2020 13:54

fix(dscache): Many cleanups and fixes

352fc7a

Pass bus to dscache constructor. Subscribe as a private method inside the constructor. Add an unsubscribe method to the bus, call it when the subscriber is finished.

dustmop force-pushed the dscache-init branch 2 times, most recently from 037b31a to 90e3b4c Compare February 25, 2020 18:59

feat(event): Synchronize allows users of bus to wait for events to pr…

5d38e5b

…ocess

dustmop force-pushed the dscache-init branch 2 times, most recently from 09324da to 8231792 Compare February 25, 2020 23:54

refactor(event): Fix multiple event bus issues

ff767fb

Add subscribe method to filesystem watcher. Add locks to dscache. Sync tests. Sync.wait has to count number of subscribers. Wait returns any errors that occur, delivery via Acknowledge method. Remove syncs when they are finished with Wait.

dustmop force-pushed the dscache-init branch from 8231792 to ff767fb Compare February 26, 2020 00:41

b5 mentioned this pull request Feb 27, 2020

refactor(event): wiring watchfs up to FSI events #1142

Merged

b5 closed this Feb 29, 2020

b5 mentioned this pull request Mar 2, 2020

restructure dependencies as dscache comes into being #1144

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dscache): Fill dscache when saving, using event.Bus. #1133

feat(dscache): Fill dscache when saving, using event.Bus. #1133

dustmop commented Feb 21, 2020

b5 left a comment

b5 Feb 21, 2020

b5 Feb 21, 2020

b5 Feb 21, 2020

dustmop Feb 24, 2020

b5 Feb 21, 2020

b5 Feb 21, 2020

dustmop Feb 24, 2020

dustmop commented Feb 24, 2020

b5 left a comment

dustmop commented Feb 24, 2020

dustmop commented Feb 27, 2020

b5 commented Feb 27, 2020

b5 commented Feb 29, 2020

		ProfileIDToUsername map[string]string
		Messages <-chan event.Event

feat(dscache): Fill dscache when saving, using event.Bus. #1133

feat(dscache): Fill dscache when saving, using event.Bus. #1133

Conversation

dustmop commented Feb 21, 2020

b5 left a comment

Choose a reason for hiding this comment

b5 Feb 21, 2020

Choose a reason for hiding this comment

b5 Feb 21, 2020

Choose a reason for hiding this comment

b5 Feb 21, 2020

Choose a reason for hiding this comment

dustmop Feb 24, 2020

Choose a reason for hiding this comment

b5 Feb 21, 2020

Choose a reason for hiding this comment

b5 Feb 21, 2020

Choose a reason for hiding this comment

dustmop Feb 24, 2020

Choose a reason for hiding this comment

dustmop commented Feb 24, 2020

b5 left a comment

Choose a reason for hiding this comment

dustmop commented Feb 24, 2020

dustmop commented Feb 27, 2020

b5 commented Feb 27, 2020

Relying on reference counting

We can have two patterns.

critical path packages

side-effect packages

Other things we considered

b5 commented Feb 29, 2020