Dependency Injection Revisited #7637

rkapka · 2020-10-25T19:20:02Z

Dependency Injection Revisited

Introduction

Composition Root

If you read some DI literature, you will stumble upon the concept of a Composition Root. A quick Google search reveals the following definition:

The composition root is the single place in your application where the composition of the object graphs for your application take place, using the dependency injection container (although how this is done is irrelevant, it could be using a container or could be done manually using pure DI)

Our dependency graph consists of a node, which depends on several services, which in turn may depend on other services. If we follow the Composition Root pattern, then registering dependencies through ServiceRegistry should not take place in the node, but rather in main.go or another DI-specific file, and the node should be a part of the object graph. This is a topic for another time, though. For now registration does still take place in the node itself and I don't think it's that big of a deal.

Who should manage an object's lifetime?

Quote from Dependency Injection Principles, Practices, and Patterns:

WHERE SHOULD DEPENDENCIES BE RELEASED?
(...) where should object graphs be released, and who is responsible for doing this? It's important to note that the code that has requested an object graph is also responsible for requesting its release. Because the request for an object graph is typically part of the COMPOSITION ROOT, so is the initiation of its release.

I think this definition can be extended from object graphs to single objects. Taking C# as an example, it has a predefined using keyword which can be used in this way:

using (StreamWriter writer = new StreamWriter(...)) {
    // some code here that uses writer
}

This is equivalent to calling a writer.Dispose() method after running all code inside the using statement. The Dispose() method is defined on an interface and its main purpose is to release any resources managed by the object, such as DB connections. The main thing to notice here is that it's the code creating the object which invokes Dispose(), not the object itself. I think Java has a similar mechanism.

Pull Request

`Stop()` vs `cancel()`

If we take a look at our services and the concept of releasing resources, such as goroutines, it is easy to see that such resources can be defined in multiple places: New(), Start(), and other functions (which might even be invoked from within parent services). To me it seems pretty obvious that the equivalent of C#'s Dispose() is context.CancelFunc, and thus cancelling should be used to provide resource cleanup. And because this cleanup should be called by the code which creates the object, which is node + registry in our case, then it's their responsibility. I think of Stop() as performing closing actions on the service before its job is done, which has nothing to do with releasing resources (which should happen after the service's job is complete).

Introducing `ServiceContext`

This PR introduces a ServiceContext struct which contains a context and a cancel function. ServiceContext is created in the node and registered alongside the service in the registry. The Ctx field is passed directly to New() in the node and indirectly to Start() and Stop() through the registry. This makes sure that all Service interface's functions use the same context, which can be cancelled from a single place. Cancelling takes place inside registry's StopAll() and is done after calling Stop(). It is not technically possible to cancel the context inside Stop() without passing the cancel function as a parameter (a new context acquired through WithCancel() will not cancel the parent context, which is the one passed into New() etc.), which is essentially the same as cancelling from the outside after Stop() is executed, only less readable.

Services depending on other services

Sometimes a Parent service struct has a Child service as a field. One important question arises: what context should be passed into a Child's function f() when executing Parent.Child.f()? The answer is of course the parent context, simply flowing to the child through function calls - idiomatic Go at its best. Although this answer might be trivial, it is a result of moving contexts away from struct fields into function parameters in the PR. Having a context field allows to either pass the parent context as a parameter or use the child context directly inside the function (Child.ctx). That's two ways of doing the same thing. And if the code is changed so that the child's context field no longer reuses the parent's context, then it is easy to introduce bugs because functions which used a context parameter will continue to work on the parent context.

Closing thoughts

Even though this is a pretty radical change, I think we should give this design a try. If you look at the current state of services, you will see several patterns being used for cancellation as well as for creating contexts and reusing them between services and even inside one service. I believe that the proposed solution is more maintainable, easier to reason about and will make future development and DI enhancements simpler.

# Conflicts: # beacon-chain/blockchain/service.go # beacon-chain/rpc/service.go # beacon-chain/sync/initial-sync/service.go # beacon-chain/sync/subscriber.go # shared/service_registry.go # validator/web/server.go

nisdas · 2020-10-26T05:23:37Z

Hey @rkapka , thanks for opening up this PR for discussion. I will give my initial thoughts on this:

If we take a look at our services and the concept of releasing resources, such as goroutines, it is easy to see that such resources can be defined in multiple places: New(), Start(), and other functions (which might even be invoked from within parent services). To me it seems pretty obvious that the equivalent of C#'s Dispose() is context.CancelFunc, and thus cancelling should be used to provide resource cleanup. And because this cleanup should be called by the code which creates the object, which is node + registry in our case, then it's their responsibility. I think of Stop() as performing closing actions on the service before its job is done, which has nothing to do with releasing resources (which should happen after the service's job is complete).

Not sure I agree with this distinction, all our Start methods start of the main event loop of a service as well as other child event loops. Using Stop to clean up all these resources created seems natural, rather than doing does outside Stop and adding new logic.

This PR introduces a ServiceContext struct which contains a context and a cancel function. ServiceContext is created in the node and registered alongside the service in the registry. The Ctx field is passed directly to New() in the node and indirectly to Start() and Stop() through the registry. This makes sure that all Service interface's functions use the same context, which can be cancelled from a single place. Cancelling takes place inside registry's StopAll() and is done after calling Stop(). It is not technically possible to cancel the context inside Stop() without passing the cancel function as a parameter (a new context acquired through WithCancel() will not cancel the parent context, which is the one passed into New() etc.), which is essentially the same as cancelling from the outside after Stop() is executed, only less readable.

I am not sure what difference this has functionally versus what we have now. We do also defer the cancelling inside a service's Stop method. What this change does now, is make it harder to reason about on how we manage services. Cancelling the context outside a service's definition impedes readability in my opinion.

The answer is of course the parent context, simply flowing to the child through function calls - idiomatic Go at its best. Although this answer might be trivial, it is a result of moving contexts away from struct fields into function parameters in the PR. Having a context field allows to either pass the parent context as a parameter or use the child context directly inside the function (Child.ctx). That's two ways of doing the same thing. And if the code is changed so that the child's context field no longer reuses the parent's context, then it is easy to introduce bugs because functions which used a context parameter will continue to work on the parent context.

This could be applied for any change to our services or child services. This change doesn't really prevent anyone from passing down non service based contexts to any child service routines(context.Backgorund). In fact this makes it harder to determine the source of all these contexts, previously with s.ctx , we know that this is the actual service context that we are using. While now it adds in an additional layer of indirection.

This is a pretty big change, and I have a hard time seeing the benefits of this change. The ordering of service shutdowns allows us to clean up all services properly. This PR took care of the major issues with it:
#7625

This change also is making a distinction where there doesn't need to be one necessarily. Start can be looked on as the creating on the necessary resources to run a service while Stop is the cleaning up of the service and any associated resources. Cancelling the context outside of Stop seems to be an unnecessary change with the same end result.

rkapka · 2020-10-26T11:36:32Z

Thank you for your point of view @nisdas

Not sure I agree with this distinction, all our Start methods start of the main event loop of a service as well as other child event loops. Using Stop to clean up all these resources created seems natural, rather than doing does outside Stop and adding new logic.

I am not sure what difference this has functionally versus what we have now. We do also defer the cancelling inside a service's Stop method. What this change does now, is make it harder to reason about on how we manage services. Cancelling the context outside a service's definition impedes readability in my opinion.

The main purpose of this PR is to simplify working with services. Although I agree that adding ServiceContext to the equation makes initial reasoning about services more difficult, once you understand how this works writing and reviewing internal service code will be more straightforward.

The reason for introducing this change is to shift the responsibility of resource cleanup from the service itself to the creator of the service. I believe that the code creating a service should be responsible for managing its lifetime. You don't have a cancel() function available on the context itself - you cancel the context from the outside. Having said that, I am having second thoughts about cancelling inside Stop() - I guess defer cancel() is as good as calling cancel() outside of the service. Changing this would require passing the cancel function into Stop() as a second parameter, as I would still keep ServiceContext. Unless you want to store the context as a field in the service, which I am against.

This could be applied for any change to our services or child services. This change doesn't really prevent anyone from passing down non service based contexts to any child service routines(context.Backgorund).

This argument is a bit unfair. There is no bulletproof solution - you can always pass nil somewhere or do other stupid things. My point is that passing context between functions gives less options to make a mess, which is a good thing.

In fact this makes it harder to determine the source of all these contexts, previously with s.ctx , we know that this is the actual service context that we are using. While now it adds in an additional layer of indirection.

There are several functions in our code in the form of func (s *Service) foo(ctx context.Context). Inside the function you can either use ctx or s.ctx, which is very confusing (you have no guarantee whether it's the same context or not). Would you say we shoud just use s.ctx everywhere?

One more thing that I would like to add to the overall discussion is that I tried to look at the whole thing from a perspective of someone joining the project. Having the ability to just call s.cancel() inside a non-exported function deep down the call stack is a recipe for disaster, and I highly doubt that this will get noticed by a reviewer. You mentioned #7625, which is a great example of how a seemingly correct solution - cancelling a context - introduced a bug that spanned the whole Stop() function. Although this particular issue has been fixed, I am sure a similar one is just a matter of time. The less opportunity we have for such mistakes in the future, the better.

UPDATE
I would also like to mention #7492, which shows that it was not entirely clear what Start() is for, which introduced the issue. I think a similar thing could be said about Stop() and services in general. What logic should go into New() vs Start()? I personally have no idea, although this could be just my lack of knowledge.

stale · 2020-11-14T05:09:04Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale · 2020-11-21T07:21:18Z

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

stale · 2020-11-30T13:21:30Z

This pull request has been closed due to inactivity. Please reopen this pull request if you would like to continue working on it.

rkapka added 30 commits October 21, 2020 16:46

allow service cleanup on node shutdown

42a7748

revert error to fatal

2517374

Merge branch 'master' into shutdown-cleanup

f23568f

remove unused struct field

ea80748

remove context param from powchain service

7b69421

add context to Start

06cb5e0

add context to more Start functions

1297c5b

add context map to registry

23d4ac1

replace used context inside Start

3446bc8

implement common context pattern in Stop

b7cac6c

move cancel to registry

5ae20ff

clean up blockchain service

bfb0765

remove todos

663e09f

clean up beacon chain gateway

9a4ae15

clean up interop-cold-start

bdbd1a2

clean up attestations

58595dd

clean up p2p

2512db8

clean up initial-sync and sync

b846cee

more sync cleanup

46b14a6

clean up powchain

46b7905

fix test build errors

630c75b

more test fixes

50ba8e0

another fuzz test fix

b61c660

clean up rpc

33d1b63

clean up slasher's beaconclient

6435212

clean up slasher's detectopn

da9c3ee

clean up slasher's rpc

1a529ff

clean up validator's client

4c528ac

clean up validator's rpc

c5f9a55

clean up validator's rpc gateway

9ed1266

rkapka added 10 commits October 23, 2020 15:50

clean up validator's slasher protection

fd00a0c

Merge branch 'master' into shutdown-cleanup

efe739c

# Conflicts: # beacon-chain/blockchain/service.go # beacon-chain/rpc/service.go # beacon-chain/sync/initial-sync/service.go # beacon-chain/sync/subscriber.go # shared/service_registry.go # validator/web/server.go

test fixes after merging with master

f6c2ca4

a little more cleanup

7a45c3b

Introduce ServiceContext

712c197

remove todos

c8f32ac

Remove name of unused Start context param

24903f5

removed unused dependencies

ee4181b

Merge branch 'master' into service-context-redesign

1c7f389

bring back node cancelling

e93f2d7

rkapka requested a review from a team as a code owner October 25, 2020 19:20

rkapka requested review from 0xKiwi, prestonvanloon and terencechain October 25, 2020 19:20

rkapka marked this pull request as draft October 25, 2020 19:20

rkapka mentioned this pull request Oct 25, 2020

Redesign how context is handled by the service registry #7594

Closed

terencechain requested review from rauljordan and nisdas October 25, 2020 19:31

terencechain added the Discussion Simply a thread for talking about stuff label Oct 25, 2020

remove changes to test files

709255b

stale bot added the Stale There hasn't been any activity here in some time... label Nov 21, 2020

stale bot closed this Nov 30, 2020

rauljordan deleted the service-context-redesign branch November 30, 2020 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dependency Injection Revisited #7637

Dependency Injection Revisited #7637

rkapka commented Oct 25, 2020 •

edited

Loading

nisdas commented Oct 26, 2020

rkapka commented Oct 26, 2020 •

edited

Loading

stale bot commented Nov 14, 2020

stale bot commented Nov 21, 2020

stale bot commented Nov 30, 2020

Dependency Injection Revisited #7637

Dependency Injection Revisited #7637

Conversation

rkapka commented Oct 25, 2020 • edited Loading

Dependency Injection Revisited

Introduction

Composition Root

Who should manage an object's lifetime?

Pull Request

Stop() vs cancel()

Introducing ServiceContext

Services depending on other services

Closing thoughts

nisdas commented Oct 26, 2020

rkapka commented Oct 26, 2020 • edited Loading

stale bot commented Nov 14, 2020

stale bot commented Nov 21, 2020

stale bot commented Nov 30, 2020

rkapka commented Oct 25, 2020 •

edited

Loading

`Stop()` vs `cancel()`

Introducing `ServiceContext`

rkapka commented Oct 26, 2020 •

edited

Loading