feat(lib): Dispatch methods call, used by FSI #1653

dustmop · 2021-03-02T00:50:42Z

Dispatch is a new mechanism for handling calls invoked on the lib.Instance. Calls are sent to dispatch as a method name and input parameters. The actual implementation is looked up and then invokved using reflection. This allows the http api to layer itself directly on top of these same methods, which also lets us replace the old rpc style with the http api. It also nicely sets us up to introduce multi-tenancy and multi-processing, by having a single place that handles incoming requests.

This mechanmism is introduced here, and only used for FSI now. Dispatch can be introduced gradually, and does not require changing the whole world at once. This PR should serve as a guide for how to do the same refactoring for other method groups. Note that each FSI method in FSIMethods is now a very thin call to Dispatch, and then a type coercion afterwards to get the correct return value. Actual implementations live at the bottom of the same source file, and each take a Scope, which is a new structure to control access to the otherwise global resources in Instance.

Dispatch is a new mechanism for handling calls invoked on the lib.Instance. Calls are sent to dispatch as a method name and input parameters. The actual implementation is looked up and then invokved using reflection. This allows the http api to layer itself directly on top of these same methods, which also lets us replace the old rpc style with the http api. It also nicely sets us up to introduce multi-tenancy and multi-processing, by having a single place that handles incoming requests. This mechanmism is introduced here, and only used for FSI now. Dispatch can be introduced gradually, and does not require changing the whole world at once. This PR should serve as a guide for how to do the same refactoring for other method groups. Note that each FSI method in FSIMethods is now a very thin call to Dispatch, and then a type coercion afterwards to get the correct return value. Actual implementations live at the bottom of the same source file, and each take a Scope, which is a new structure to control access to the otherwise global resources in Instance.

dustmop · 2021-03-02T00:54:38Z

Reviewers: the meat of this change is in lib/dispatch.go, which implements dispatch and contains some prose explaining how it works. After reading that, lib/fsi.go is the next most interesting place, in that it uses Dispatch for each FSI method, turning each call into a very thin function, with the actual implementations at the bottom of the file. The changes in api/fsi.go are also mostly mechanical, as much of the special behavior is being removed, such that each handler resembles the other very closely.

I want to do some manual testing of this feature before we move ahead. I did some testing during development, but further changes were introduced, so I want to make sure that everything is operating as expected. There's also a lack of testing on the dispatch mechanism itself (such as tests for function registration and invocation) but I want to make sure the design works okay before adding those.

Arqu

I've left some commentary, but am generally amazed and thrilled to see this land.
Scope feels very much like the right way forward and properly insulating lib.Instance. Also love that it's piecemeal.

Arqu · 2021-03-02T11:48:27Z

lib/dispatch.go

+ "strings"
+)
+
+// Dispatch is system for handling calls to lib. Currently only implemented for FSI methods.


The Currently only... part is slated to change soon, think we should omit as it goes stale quick

Arqu · 2021-03-02T12:10:27Z

lib/dispatch.go

+
+// methodEndpoint returns a method name and returns the API endpoint for it
+func methodEndpoint(method string) APIEndpoint {
+ if method == "fsi.write" {


Why is this hardcoded here?
Actually, I don't really understand the encoding schema here.
fsi.write is a special case as the route has the extra prefix. On the remaining routes we simply take the package.method fromat and return a /method/ route, however that's not always the case. registry, remote, profile, IPFS are all in a similar boat as fsi.write.
Additionally a lot of routes do not support a trailing / and will fail in that case.

I know we need a translation layer for method name > endpoint but I'd rather have it just be directly assigned instead of handling all the edge cases with auto parsing. Maybe move the parsing function into api/api.go and use that to set the methodEndpoints along with the handler so that in future extensions it's hard to miss.

It's a bit anti-pattern in that it surfaces some lib level knowledge up to API but given that we rely on that same HTTP API for our RPC layer, we have to be strict about it regardless and the Qri implementation for the API needs to be aware of those details. Other users can skip that part if they don't intend on using the HTTP RPC layer at all.

I probably should have left a comment, but I definitely intended this hardcoded part to be temporary in there's a better plan. You're absolutely right that the correspondence between API paths and methods needs to be better codified, for now this PR is keeping it the same by pointing out the special case. We should make this do something better in a follow-up change.

Arqu · 2021-03-02T12:12:51Z

api/fsi.go

+
+// If the route has a dataset reference in the url, parse that ref, and
+// add it to the request object using the field "refstr".
+func addDsRefFromURL(r *http.Request, routePrefix string) error {


Am not a huge fan of this, but this should go away once UnmarshalFromRequest is utilized across the board. Maybe just leave a TODO that also says so.

Curious what you mean; which part are you not a fan of? The need for this function addDsRefFromURL? Or keeping the dsref in the url like /status/dustmop/my_dataset? From my understanding, UnmarshalFromRequest is basically a hook that lets parameters customize their conversion from http to concrete struct types. There's a lot of methods that allow dsrefs to live in the url, so I would prefer to avoid needing to repeat the same conversion code over and over if we can. I left a TODO comment somewhat to what effect on line 52 (the first usage of addDsRefFromURL), and plan on making a follow-up PR after this to demonstrate what I'm thinking.

Oh, my bad, should have been more clear on this.
So the reason I don't like it is that this further solidifies the pattern of prefix + hand parsing the ref at the end.
The intent of the comment was to steer this towards the recently added form of explicitly naming the dsref parts and using a common, global middleware to be super condensed on how we parse these.

In api/api.go we have

func handleRefRoute(m *mux.Router, ae lib.APIEndpoint, f http.HandlerFunc) { m.Handle(ae.String(), f) m.Handle(fmt.Sprintf("%s/%s", ae, "{peername}/{name}"), f) m.Handle(fmt.Sprintf("%s/%s", ae, "{peername}/{name}/{selector}"), f) m.Handle(fmt.Sprintf("%s/%s", ae, "{peername}/{name}/at/{fs}/{hash}"), f) m.Handle(fmt.Sprintf("%s/%s", ae, "{peername}/{name}/at/{fs}/{hash}/{selector}"), f) }

And when we add add a route based ref handler we use the above form to define the route/handler mapping.
eg handleRefRoute(m, lib.AEGet, s.Middleware(dsh.GetHandler))

In api/middleware we then have

func refStringMiddleware(next http.Handler) http.Handler { return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { setRefStringFromMuxVars(r) next.ServeHTTP(w, r) }) }

Which acts as the propagation engine for the refstr query param.
It will also make it easier to switch later on to attach to context/scope instead of query param.

Finally, the addDsRefFromURL fails on any url that has content/path components after the ref itself in this case. At the same time I was aiming to remove entirely lib/api.go DsRefFromPath and similar functions once the refactor is complete.

But to distill my own fluff, I would prefer we use the handleRefRoute approach.

b5 · 2021-03-02T04:00:20Z

api/fsi.go

 type FSIHandlers struct {
- lib.FSIMethods
+ inst *lib.Instance
 dsm *lib.DatasetMethods
 ReadOnly bool
 }


Big fan of how you're tackling this handler transition. Just to keep everyone oriented in the right direction, we'd like to end up with a set of API endpoints that look something like this, with the handlers themselves being generated by a dispatch factory function:

endpoint HTTP methods

"/wd/status/{path:.*}" GET

"/wd/whatchanged/{path:.*}" GET

"/wd/init/{path:.*}" POST

"/wd/checkout/{path:.*}" POST

"/wd/restore/{path:.*}" POST

"/wd/fsi/write/{path:.*}" POST

Two questions:

Once we have a /wd/ prefix, do we also need /fsi/ for the last endpoint? How about just /wd/write/?

/init/ does not require a path, it often takes just the target directory and dataset name. Of course the dataset name is just the second half of the dsref, should it change to require the full dsref?

As to the question of mapping endpoints to methods: right now FSIMethods has Name that just returns a string. I suppose we could change that to Mapping to return a map[string]{string, string} instead: mapping the method name to a endpoint and http verb. I think that's very similar to what you did in #1650

Once we have a /wd/ prefix, do we also need /fsi/ for the last endpoint? How about just /wd/write/?

whoops that's a typo. All for /wd/write. Apologies that's a major throw-off given what we're discussing here 😄

/init/ does not require a path, it often takes just the target directory and dataset name. Of course the dataset name is just the second half of the dsref, should it change to require the full dsref?

I'm of the mind that any singular dataset reference should be captured uniformly in the request URL. The upside of allowing a full ref would be support for initializing a working directory at a commit other than HEAD, or using an initID

I suppose we could change that to Mapping to return a map[string]{string, string} instead: mapping the method name to a endpoint and http verb. I think that's very similar to what you did in #1650

I feel like this is the path of least resistance for now, and would love revisit this after we do a bunch of this refactoring to see if we can't write a cleaner abstraction

b5 · 2021-03-02T04:04:41Z

cmd/checkout.go

@@ -70,6 +71,9 @@ func (o *CheckoutOptions) Complete(f Factory, args []string) (err error) {

 // Run executes the `checkout` command
 func (o *CheckoutOptions) Run() (err error) {
+ ctx := context.Background()


use context.TODO() here. We should be passing the root context from cmd/cmd.go into the command constructor

Ah, I wasn't aware that cmd had an abandoned context, and thought that this would always be the top-level of the call stack. Will do.

b5 · 2021-03-02T04:05:59Z

cmd/fsi.go

@@ -89,12 +87,15 @@ func (o *FSIOptions) Link() (err error) {
 return err
 }

+ ctx := context.Background()


same, use context.TODO() instead

b5 · 2021-03-02T04:07:07Z

cmd/fsi.go

@@ -104,17 +105,18 @@ func (o *FSIOptions) Link() (err error) {

 // Unlink executes the fsi unlink command
 func (o *FSIOptions) Unlink() error {
- var res string
+ ctx := context.Background()


since Unlink is a one-off method, have it accept a context argument instead, and pass in a context.TODO defined at the call site

b5 · 2021-03-02T04:10:00Z

cmd/fsi_integration_test.go

+ // TODO(dustmop): ipfs repo error: "this repo is currently being accessed by another process"
+ // Figure out why this is failing and restore this check


just a hunch, but it's possible this is caused by passing in context.Background instead of a context that is cancelled.

It looks like it's only happening for ExecCommandCombinedOutErr, but I don't understand how that function is different from other Exec* calls.

b5 · 2021-03-02T13:51:29Z

lib/dispatch.go

+ log.Errorf("%s: bad number of inputs: %d", funcName, f.NumIn())
+ continue


these should be panics. You've mentioned above a need for vet rules, having panics will get us half of that value without needing a fancy linter. Keep in mind it's possible to test a function that panics with a recover

Ah, right. I was thinking of how the gorpc system will silently ignore invalid method signatures, but I guess it's easy enough to make helpers that we don't need to do that.

Also, the vet rules are to test that the FSIMethods.Status signature matches the FSIImpl.Status signature (for the single input struct and output type). That's different then validating the signatures of only the FSIImpl methods.

b5 · 2021-03-02T13:54:51Z

lib/dispatch.go

+ }
+ // Second input must be a scope
+ inType = f.In(1)
+ if inType.Name() != "Scope" {


Is this why the Scope struct is exported? I'd really like the scope struct to be unexported, but if it needs to be exported to satisfy the rules of reflection, then we may have to just deal with it via documentation comments for the time being

Totally unneeded, this can be be used still with scope unexported.

b5 · 2021-03-02T13:55:21Z

lib/dispatch.go

+ InType: inType,
+ OutType: outType,
+ }
+ log.Infof("%d: registered %s(*%s) %v", k, funcName, inType, outType)


Don't use log.Infof, drop this to log.Debugf

b5 · 2021-03-02T14:00:02Z

lib/fsi.go

+ got, err := m.inst.Dispatch(ctx, m.Name()+".createlink", p)
+ if res, ok := got.(*dsref.VersionInfo); ok {
+ return res, err
+ }
+ return nil, dispatchReturnError(got, err)


the m.Name()+".createLink" feels a bit too fragile. I'd prefer a private instance method defined in dispatch.go:

func (inst *Instance) dispatchMethodName(m Methods, name string) string

This mainly helps us identify all of these method name constructions for later refactoring

I agree it's not ideal. Not sure what dispatchMethodName is doing, is the input the current short method name (like "createLink") while the output is the fully qualified name?

ah my bad I totally should have filled in the method body:

func dispatchMethodName(m Methods, name string) string { return fmt.Sprintf("%s.%s", m.Name(), name) }

filling it out here it totally doesn't need the inst receiver. The big thing this is doing is only accepting (and utilizing) the Name method on the Methods interface to force the convention to agree

b5 · 2021-03-02T14:01:12Z

lib/scope.go

+}
+
+// Filesystem returns a filesystem
+func (s *Scope) Filesystem() qfs.Filesystem {


have this return a *qfs.Mux.

b5

🎉

dustmop self-assigned this Mar 2, 2021

dustmop requested review from Arqu, b5 and ramfox March 2, 2021 00:54

dustmop added Breaking Change things that change existing API contracts. lib labels Mar 2, 2021

Arqu reviewed Mar 2, 2021

View reviewed changes

b5 reviewed Mar 2, 2021

View reviewed changes

dustmop added 4 commits March 2, 2021 14:22

fix(lib): Improve context passing and visibility of internal structs

8f6509b

test(cmd): Fix commands that were not calling shutdown upon error

95fb6b7

fix(dispatch): Fix for fsi plumbing commands, to work over http

8070537

fix(dispatch): MethodSet interface to get name for dispatch

9b73d4b

b5 approved these changes Mar 3, 2021

View reviewed changes

b5 merged commit cc79182 into master Mar 3, 2021

b5 deleted the feat-dispatch branch March 3, 2021 00:47

b5 mentioned this pull request Mar 4, 2021

WIP: feat(lib): combination RPC, HTTP api using dispatch #1650

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(lib): Dispatch methods call, used by FSI #1653

feat(lib): Dispatch methods call, used by FSI #1653

dustmop commented Mar 2, 2021

dustmop commented Mar 2, 2021

Arqu left a comment

Arqu Mar 2, 2021

Arqu Mar 2, 2021

dustmop Mar 2, 2021

Arqu Mar 2, 2021

dustmop Mar 2, 2021

Arqu Mar 2, 2021

b5 Mar 2, 2021

dustmop Mar 2, 2021

b5 Mar 2, 2021

b5 Mar 2, 2021

dustmop Mar 2, 2021

b5 Mar 2, 2021

b5 Mar 2, 2021

b5 Mar 2, 2021

dustmop Mar 2, 2021

b5 Mar 2, 2021

dustmop Mar 2, 2021

dustmop Mar 2, 2021

b5 Mar 2, 2021

dustmop Mar 2, 2021

b5 Mar 2, 2021

b5 Mar 2, 2021

dustmop Mar 2, 2021

b5 Mar 2, 2021

b5 Mar 2, 2021

b5 left a comment

endpoint	HTTP methods
"/wd/status/{path:.*}"	GET
"/wd/whatchanged/{path:.*}"	GET
"/wd/init/{path:.*}"	POST
"/wd/checkout/{path:.*}"	POST
"/wd/restore/{path:.*}"	POST
"/wd/fsi/write/{path:.*}"	POST

		// TODO(dustmop): ipfs repo error: "this repo is currently being accessed by another process"
		// Figure out why this is failing and restore this check

		log.Errorf("%s: bad number of inputs: %d", funcName, f.NumIn())
		continue

feat(lib): Dispatch methods call, used by FSI #1653

feat(lib): Dispatch methods call, used by FSI #1653

Conversation

dustmop commented Mar 2, 2021

dustmop commented Mar 2, 2021

Arqu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b5 left a comment

Choose a reason for hiding this comment