Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependency Injection Revisited #7637

Closed
wants to merge 41 commits into from
Closed

Conversation

rkapka
Copy link
Contributor

@rkapka rkapka commented Oct 25, 2020

Dependency Injection Revisited

Introduction

Composition Root

If you read some DI literature, you will stumble upon the concept of a Composition Root. A quick Google search reveals the following definition:

The composition root is the single place in your application where the composition of the object graphs for your application take place, using the dependency injection container (although how this is done is irrelevant, it could be using a container or could be done manually using pure DI)

Our dependency graph consists of a node, which depends on several services, which in turn may depend on other services. If we follow the Composition Root pattern, then registering dependencies through ServiceRegistry should not take place in the node, but rather in main.go or another DI-specific file, and the node should be a part of the object graph. This is a topic for another time, though. For now registration does still take place in the node itself and I don't think it's that big of a deal.

Who should manage an object's lifetime?

Quote from Dependency Injection Principles, Practices, and Patterns:

WHERE SHOULD DEPENDENCIES BE RELEASED?
(...) where should object graphs be released, and who is responsible for doing this? It's important to note that the code that has requested an object graph is also responsible for requesting its release. Because the request for an object graph is typically part of the COMPOSITION ROOT, so is the initiation of its release.

I think this definition can be extended from object graphs to single objects. Taking C# as an example, it has a predefined using keyword which can be used in this way:

using (StreamWriter writer = new StreamWriter(...)) {
    // some code here that uses writer
}

This is equivalent to calling a writer.Dispose() method after running all code inside the using statement. The Dispose() method is defined on an interface and its main purpose is to release any resources managed by the object, such as DB connections. The main thing to notice here is that it's the code creating the object which invokes Dispose(), not the object itself. I think Java has a similar mechanism.

Pull Request

Stop() vs cancel()

If we take a look at our services and the concept of releasing resources, such as goroutines, it is easy to see that such resources can be defined in multiple places: New(), Start(), and other functions (which might even be invoked from within parent services). To me it seems pretty obvious that the equivalent of C#'s Dispose() is context.CancelFunc, and thus cancelling should be used to provide resource cleanup. And because this cleanup should be called by the code which creates the object, which is node + registry in our case, then it's their responsibility. I think of Stop() as performing closing actions on the service before its job is done, which has nothing to do with releasing resources (which should happen after the service's job is complete).

Introducing ServiceContext

This PR introduces a ServiceContext struct which contains a context and a cancel function. ServiceContext is created in the node and registered alongside the service in the registry. The Ctx field is passed directly to New() in the node and indirectly to Start() and Stop() through the registry. This makes sure that all Service interface's functions use the same context, which can be cancelled from a single place. Cancelling takes place inside registry's StopAll() and is done after calling Stop(). It is not technically possible to cancel the context inside Stop() without passing the cancel function as a parameter (a new context acquired through WithCancel() will not cancel the parent context, which is the one passed into New() etc.), which is essentially the same as cancelling from the outside after Stop() is executed, only less readable.

Services depending on other services

Sometimes a Parent service struct has a Child service as a field. One important question arises: what context should be passed into a Child's function f() when executing Parent.Child.f()? The answer is of course the parent context, simply flowing to the child through function calls - idiomatic Go at its best. Although this answer might be trivial, it is a result of moving contexts away from struct fields into function parameters in the PR. Having a context field allows to either pass the parent context as a parameter or use the child context directly inside the function (Child.ctx). That's two ways of doing the same thing. And if the code is changed so that the child's context field no longer reuses the parent's context, then it is easy to introduce bugs because functions which used a context parameter will continue to work on the parent context.

Closing thoughts

Even though this is a pretty radical change, I think we should give this design a try. If you look at the current state of services, you will see several patterns being used for cancellation as well as for creating contexts and reusing them between services and even inside one service. I believe that the proposed solution is more maintainable, easier to reason about and will make future development and DI enhancements simpler.

@rkapka rkapka requested a review from a team as a code owner October 25, 2020 19:20
@rkapka rkapka marked this pull request as draft October 25, 2020 19:20
@terencechain terencechain added the Discussion Simply a thread for talking about stuff label Oct 25, 2020
@nisdas
Copy link
Member

nisdas commented Oct 26, 2020

Hey @rkapka , thanks for opening up this PR for discussion. I will give my initial thoughts on this:

If we take a look at our services and the concept of releasing resources, such as goroutines, it is easy to see that such resources can be defined in multiple places: New(), Start(), and other functions (which might even be invoked from within parent services). To me it seems pretty obvious that the equivalent of C#'s Dispose() is context.CancelFunc, and thus cancelling should be used to provide resource cleanup. And because this cleanup should be called by the code which creates the object, which is node + registry in our case, then it's their responsibility. I think of Stop() as performing closing actions on the service before its job is done, which has nothing to do with releasing resources (which should happen after the service's job is complete).

Not sure I agree with this distinction, all our Start methods start of the main event loop of a service as well as other child event loops. Using Stop to clean up all these resources created seems natural, rather than doing does outside Stop and adding new logic.

This PR introduces a ServiceContext struct which contains a context and a cancel function. ServiceContext is created in the node and registered alongside the service in the registry. The Ctx field is passed directly to New() in the node and indirectly to Start() and Stop() through the registry. This makes sure that all Service interface's functions use the same context, which can be cancelled from a single place. Cancelling takes place inside registry's StopAll() and is done after calling Stop(). It is not technically possible to cancel the context inside Stop() without passing the cancel function as a parameter (a new context acquired through WithCancel() will not cancel the parent context, which is the one passed into New() etc.), which is essentially the same as cancelling from the outside after Stop() is executed, only less readable.

I am not sure what difference this has functionally versus what we have now. We do also defer the cancelling inside a service's Stop method. What this change does now, is make it harder to reason about on how we manage services. Cancelling the context outside a service's definition impedes readability in my opinion.

The answer is of course the parent context, simply flowing to the child through function calls - idiomatic Go at its best. Although this answer might be trivial, it is a result of moving contexts away from struct fields into function parameters in the PR. Having a context field allows to either pass the parent context as a parameter or use the child context directly inside the function (Child.ctx). That's two ways of doing the same thing. And if the code is changed so that the child's context field no longer reuses the parent's context, then it is easy to introduce bugs because functions which used a context parameter will continue to work on the parent context.

This could be applied for any change to our services or child services. This change doesn't really prevent anyone from passing down non service based contexts to any child service routines(context.Backgorund). In fact this makes it harder to determine the source of all these contexts, previously with s.ctx , we know that this is the actual service context that we are using. While now it adds in an additional layer of indirection.

This is a pretty big change, and I have a hard time seeing the benefits of this change. The ordering of service shutdowns allows us to clean up all services properly. This PR took care of the major issues with it:
#7625

This change also is making a distinction where there doesn't need to be one necessarily. Start can be looked on as the creating on the necessary resources to run a service while Stop is the cleaning up of the service and any associated resources. Cancelling the context outside of Stop seems to be an unnecessary change with the same end result.

@rkapka
Copy link
Contributor Author

rkapka commented Oct 26, 2020

Thank you for your point of view @nisdas

Not sure I agree with this distinction, all our Start methods start of the main event loop of a service as well as other child event loops. Using Stop to clean up all these resources created seems natural, rather than doing does outside Stop and adding new logic.

I am not sure what difference this has functionally versus what we have now. We do also defer the cancelling inside a service's Stop method. What this change does now, is make it harder to reason about on how we manage services. Cancelling the context outside a service's definition impedes readability in my opinion.

The main purpose of this PR is to simplify working with services. Although I agree that adding ServiceContext to the equation makes initial reasoning about services more difficult, once you understand how this works writing and reviewing internal service code will be more straightforward.

The reason for introducing this change is to shift the responsibility of resource cleanup from the service itself to the creator of the service. I believe that the code creating a service should be responsible for managing its lifetime. You don't have a cancel() function available on the context itself - you cancel the context from the outside. Having said that, I am having second thoughts about cancelling inside Stop() - I guess defer cancel() is as good as calling cancel() outside of the service. Changing this would require passing the cancel function into Stop() as a second parameter, as I would still keep ServiceContext. Unless you want to store the context as a field in the service, which I am against.

This could be applied for any change to our services or child services. This change doesn't really prevent anyone from passing down non service based contexts to any child service routines(context.Backgorund).

This argument is a bit unfair. There is no bulletproof solution - you can always pass nil somewhere or do other stupid things. My point is that passing context between functions gives less options to make a mess, which is a good thing.

In fact this makes it harder to determine the source of all these contexts, previously with s.ctx , we know that this is the actual service context that we are using. While now it adds in an additional layer of indirection.

There are several functions in our code in the form of func (s *Service) foo(ctx context.Context). Inside the function you can either use ctx or s.ctx, which is very confusing (you have no guarantee whether it's the same context or not). Would you say we shoud just use s.ctx everywhere?

One more thing that I would like to add to the overall discussion is that I tried to look at the whole thing from a perspective of someone joining the project. Having the ability to just call s.cancel() inside a non-exported function deep down the call stack is a recipe for disaster, and I highly doubt that this will get noticed by a reviewer. You mentioned #7625, which is a great example of how a seemingly correct solution - cancelling a context - introduced a bug that spanned the whole Stop() function. Although this particular issue has been fixed, I am sure a similar one is just a matter of time. The less opportunity we have for such mistakes in the future, the better.

UPDATE
I would also like to mention #7492, which shows that it was not entirely clear what Start() is for, which introduced the issue. I think a similar thing could be said about Stop() and services in general. What logic should go into New() vs Start()? I personally have no idea, although this could be just my lack of knowledge.

@stale
Copy link

stale bot commented Nov 14, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

1 similar comment
@stale
Copy link

stale bot commented Nov 21, 2020

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Stale There hasn't been any activity here in some time... label Nov 21, 2020
@stale
Copy link

stale bot commented Nov 30, 2020

This pull request has been closed due to inactivity. Please reopen this pull request if you would like to continue working on it.

@stale stale bot closed this Nov 30, 2020
@rauljordan rauljordan deleted the service-context-redesign branch November 30, 2020 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Discussion Simply a thread for talking about stuff Stale There hasn't been any activity here in some time...
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants