Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(lib): Dispatch methods call, used by FSI #1653

Merged
merged 5 commits into from
Mar 3, 2021
Merged

feat(lib): Dispatch methods call, used by FSI #1653

merged 5 commits into from
Mar 3, 2021

Conversation

dustmop
Copy link
Contributor

@dustmop dustmop commented Mar 2, 2021

Dispatch is a new mechanism for handling calls invoked on the lib.Instance. Calls are sent to dispatch as a method name and input parameters. The actual implementation is looked up and then invokved using reflection. This allows the http api to layer itself directly on top of these same methods, which also lets us replace the old rpc style with the http api. It also nicely sets us up to introduce multi-tenancy and multi-processing, by having a single place that handles incoming requests.

This mechanmism is introduced here, and only used for FSI now. Dispatch can be introduced gradually, and does not require changing the whole world at once. This PR should serve as a guide for how to do the same refactoring for other method groups. Note that each FSI method in FSIMethods is now a very thin call to Dispatch, and then a type coercion afterwards to get the correct return value. Actual implementations live at the bottom of the same source file, and each take a Scope, which is a new structure to control access to the otherwise global resources in Instance.

Dispatch is a new mechanism for handling calls invoked on
the lib.Instance. Calls are sent to dispatch as a method name
and input parameters. The actual implementation is looked up and
then invokved using reflection. This allows the http api to
layer itself directly on top of these same methods, which also
lets us replace the old rpc style with the http api. It also
nicely sets us up to introduce multi-tenancy and multi-processing,
by having a single place that handles incoming requests.

This mechanmism is introduced here, and only used for FSI now.
Dispatch can be introduced gradually, and does not require
changing the whole world at once. This PR should serve as a guide
for how to do the same refactoring for other method groups. Note
that each FSI method in FSIMethods is now a very thin call to
Dispatch, and then a type coercion afterwards to get the correct
return value. Actual implementations live at the bottom of the
same source file, and each take a Scope, which is a new structure
to control access to the otherwise global resources in Instance.
@dustmop
Copy link
Contributor Author

dustmop commented Mar 2, 2021

Reviewers: the meat of this change is in lib/dispatch.go, which implements dispatch and contains some prose explaining how it works. After reading that, lib/fsi.go is the next most interesting place, in that it uses Dispatch for each FSI method, turning each call into a very thin function, with the actual implementations at the bottom of the file. The changes in api/fsi.go are also mostly mechanical, as much of the special behavior is being removed, such that each handler resembles the other very closely.

I want to do some manual testing of this feature before we move ahead. I did some testing during development, but further changes were introduced, so I want to make sure that everything is operating as expected. There's also a lack of testing on the dispatch mechanism itself (such as tests for function registration and invocation) but I want to make sure the design works okay before adding those.

@dustmop dustmop self-assigned this Mar 2, 2021
@dustmop dustmop requested review from Arqu, b5 and ramfox March 2, 2021 00:54
@dustmop dustmop added Breaking Change things that change existing API contracts. lib labels Mar 2, 2021
Copy link
Contributor

@Arqu Arqu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some commentary, but am generally amazed and thrilled to see this land.
Scope feels very much like the right way forward and properly insulating lib.Instance. Also love that it's piecemeal.

lib/dispatch.go Outdated
"strings"
)

// Dispatch is system for handling calls to lib. Currently only implemented for FSI methods.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Currently only... part is slated to change soon, think we should omit as it goes stale quick


// methodEndpoint returns a method name and returns the API endpoint for it
func methodEndpoint(method string) APIEndpoint {
if method == "fsi.write" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this hardcoded here?
Actually, I don't really understand the encoding schema here.
fsi.write is a special case as the route has the extra prefix. On the remaining routes we simply take the package.method fromat and return a /method/ route, however that's not always the case. registry, remote, profile, IPFS are all in a similar boat as fsi.write.
Additionally a lot of routes do not support a trailing / and will fail in that case.

I know we need a translation layer for method name > endpoint but I'd rather have it just be directly assigned instead of handling all the edge cases with auto parsing. Maybe move the parsing function into api/api.go and use that to set the methodEndpoints along with the handler so that in future extensions it's hard to miss.

It's a bit anti-pattern in that it surfaces some lib level knowledge up to API but given that we rely on that same HTTP API for our RPC layer, we have to be strict about it regardless and the Qri implementation for the API needs to be aware of those details. Other users can skip that part if they don't intend on using the HTTP RPC layer at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably should have left a comment, but I definitely intended this hardcoded part to be temporary in there's a better plan. You're absolutely right that the correspondence between API paths and methods needs to be better codified, for now this PR is keeping it the same by pointing out the special case. We should make this do something better in a follow-up change.


// If the route has a dataset reference in the url, parse that ref, and
// add it to the request object using the field "refstr".
func addDsRefFromURL(r *http.Request, routePrefix string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am not a huge fan of this, but this should go away once UnmarshalFromRequest is utilized across the board. Maybe just leave a TODO that also says so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what you mean; which part are you not a fan of? The need for this function addDsRefFromURL? Or keeping the dsref in the url like /status/dustmop/my_dataset? From my understanding, UnmarshalFromRequest is basically a hook that lets parameters customize their conversion from http to concrete struct types. There's a lot of methods that allow dsrefs to live in the url, so I would prefer to avoid needing to repeat the same conversion code over and over if we can. I left a TODO comment somewhat to what effect on line 52 (the first usage of addDsRefFromURL), and plan on making a follow-up PR after this to demonstrate what I'm thinking.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, my bad, should have been more clear on this.
So the reason I don't like it is that this further solidifies the pattern of prefix + hand parsing the ref at the end.
The intent of the comment was to steer this towards the recently added form of explicitly naming the dsref parts and using a common, global middleware to be super condensed on how we parse these.

In api/api.go we have

func handleRefRoute(m *mux.Router, ae lib.APIEndpoint, f http.HandlerFunc) {
	m.Handle(ae.String(), f)
	m.Handle(fmt.Sprintf("%s/%s", ae, "{peername}/{name}"), f)
	m.Handle(fmt.Sprintf("%s/%s", ae, "{peername}/{name}/{selector}"), f)
	m.Handle(fmt.Sprintf("%s/%s", ae, "{peername}/{name}/at/{fs}/{hash}"), f)
	m.Handle(fmt.Sprintf("%s/%s", ae, "{peername}/{name}/at/{fs}/{hash}/{selector}"), f)
}

And when we add add a route based ref handler we use the above form to define the route/handler mapping.
eg handleRefRoute(m, lib.AEGet, s.Middleware(dsh.GetHandler))

In api/middleware we then have

func refStringMiddleware(next http.Handler) http.Handler {
	return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
		setRefStringFromMuxVars(r)
		next.ServeHTTP(w, r)
	})
}

Which acts as the propagation engine for the refstr query param.
It will also make it easier to switch later on to attach to context/scope instead of query param.

Finally, the addDsRefFromURL fails on any url that has content/path components after the ref itself in this case. At the same time I was aiming to remove entirely lib/api.go DsRefFromPath and similar functions once the refactor is complete.

But to distill my own fluff, I would prefer we use the handleRefRoute approach.

Comment on lines 13 to 17
type FSIHandlers struct {
lib.FSIMethods
inst *lib.Instance
dsm *lib.DatasetMethods
ReadOnly bool
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big fan of how you're tackling this handler transition. Just to keep everyone oriented in the right direction, we'd like to end up with a set of API endpoints that look something like this, with the handlers themselves being generated by a dispatch factory function:

endpoint HTTP methods
"/wd/status/{path:.*}" GET
"/wd/whatchanged/{path:.*}" GET
"/wd/init/{path:.*}" POST
"/wd/checkout/{path:.*}" POST
"/wd/restore/{path:.*}" POST
"/wd/fsi/write/{path:.*}" POST

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two questions:

  1. Once we have a /wd/ prefix, do we also need /fsi/ for the last endpoint? How about just /wd/write/?
  2. /init/ does not require a path, it often takes just the target directory and dataset name. Of course the dataset name is just the second half of the dsref, should it change to require the full dsref?

As to the question of mapping endpoints to methods: right now FSIMethods has Name that just returns a string. I suppose we could change that to Mapping to return a map[string]{string, string} instead: mapping the method name to a endpoint and http verb. I think that's very similar to what you did in #1650

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we have a /wd/ prefix, do we also need /fsi/ for the last endpoint? How about just /wd/write/?

whoops that's a typo. All for /wd/write. Apologies that's a major throw-off given what we're discussing here 😄

/init/ does not require a path, it often takes just the target directory and dataset name. Of course the dataset name is just the second half of the dsref, should it change to require the full dsref?

I'm of the mind that any singular dataset reference should be captured uniformly in the request URL. The upside of allowing a full ref would be support for initializing a working directory at a commit other than HEAD, or using an initID

I suppose we could change that to Mapping to return a map[string]{string, string} instead: mapping the method name to a endpoint and http verb. I think that's very similar to what you did in #1650

I feel like this is the path of least resistance for now, and would love revisit this after we do a bunch of this refactoring to see if we can't write a cleaner abstraction

cmd/checkout.go Outdated
@@ -70,6 +71,9 @@ func (o *CheckoutOptions) Complete(f Factory, args []string) (err error) {

// Run executes the `checkout` command
func (o *CheckoutOptions) Run() (err error) {
ctx := context.Background()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use context.TODO() here. We should be passing the root context from cmd/cmd.go into the command constructor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I wasn't aware that cmd had an abandoned context, and thought that this would always be the top-level of the call stack. Will do.

cmd/fsi.go Outdated
@@ -89,12 +87,15 @@ func (o *FSIOptions) Link() (err error) {
return err
}

ctx := context.Background()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, use context.TODO() instead

cmd/fsi.go Outdated
@@ -104,17 +105,18 @@ func (o *FSIOptions) Link() (err error) {

// Unlink executes the fsi unlink command
func (o *FSIOptions) Unlink() error {
var res string
ctx := context.Background()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since Unlink is a one-off method, have it accept a context argument instead, and pass in a context.TODO defined at the call site

Comment on lines 1732 to 1733
// TODO(dustmop): ipfs repo error: "this repo is currently being accessed by another process"
// Figure out why this is failing and restore this check
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a hunch, but it's possible this is caused by passing in context.Background instead of a context that is cancelled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it's only happening for ExecCommandCombinedOutErr, but I don't understand how that function is different from other Exec* calls.

lib/dispatch.go Outdated
Comment on lines 150 to 151
log.Errorf("%s: bad number of inputs: %d", funcName, f.NumIn())
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should be panics. You've mentioned above a need for vet rules, having panics will get us half of that value without needing a fancy linter. Keep in mind it's possible to test a function that panics with a recover

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. I was thinking of how the gorpc system will silently ignore invalid method signatures, but I guess it's easy enough to make helpers that we don't need to do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the vet rules are to test that the FSIMethods.Status signature matches the FSIImpl.Status signature (for the single input struct and output type). That's different then validating the signatures of only the FSIImpl methods.

lib/dispatch.go Outdated
}
// Second input must be a scope
inType = f.In(1)
if inType.Name() != "Scope" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this why the Scope struct is exported? I'd really like the scope struct to be unexported, but if it needs to be exported to satisfy the rules of reflection, then we may have to just deal with it via documentation comments for the time being

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally unneeded, this can be be used still with scope unexported.

lib/dispatch.go Outdated
InType: inType,
OutType: outType,
}
log.Infof("%d: registered %s(*%s) %v", k, funcName, inType, outType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use log.Infof, drop this to log.Debugf

lib/fsi.go Outdated
Comment on lines 63 to 67
got, err := m.inst.Dispatch(ctx, m.Name()+".createlink", p)
if res, ok := got.(*dsref.VersionInfo); ok {
return res, err
}
return nil, dispatchReturnError(got, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the m.Name()+".createLink" feels a bit too fragile. I'd prefer a private instance method defined in dispatch.go:

func (inst *Instance) dispatchMethodName(m Methods, name string) string 

This mainly helps us identify all of these method name constructions for later refactoring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it's not ideal. Not sure what dispatchMethodName is doing, is the input the current short method name (like "createLink") while the output is the fully qualified name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah my bad I totally should have filled in the method body:

func dispatchMethodName(m Methods, name string) string  {
  return fmt.Sprintf("%s.%s", m.Name(), name)
}

filling it out here it totally doesn't need the inst receiver. The big thing this is doing is only accepting (and utilizing) the Name method on the Methods interface to force the convention to agree

lib/scope.go Outdated
}

// Filesystem returns a filesystem
func (s *Scope) Filesystem() qfs.Filesystem {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have this return a *qfs.Mux.

Copy link
Member

@b5 b5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@b5 b5 merged commit cc79182 into master Mar 3, 2021
@b5 b5 deleted the feat-dispatch branch March 3, 2021 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Breaking Change things that change existing API contracts. lib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants