-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(lib): Dispatch methods call, used by FSI #1653
Conversation
Dispatch is a new mechanism for handling calls invoked on the lib.Instance. Calls are sent to dispatch as a method name and input parameters. The actual implementation is looked up and then invokved using reflection. This allows the http api to layer itself directly on top of these same methods, which also lets us replace the old rpc style with the http api. It also nicely sets us up to introduce multi-tenancy and multi-processing, by having a single place that handles incoming requests. This mechanmism is introduced here, and only used for FSI now. Dispatch can be introduced gradually, and does not require changing the whole world at once. This PR should serve as a guide for how to do the same refactoring for other method groups. Note that each FSI method in FSIMethods is now a very thin call to Dispatch, and then a type coercion afterwards to get the correct return value. Actual implementations live at the bottom of the same source file, and each take a Scope, which is a new structure to control access to the otherwise global resources in Instance.
Reviewers: the meat of this change is in I want to do some manual testing of this feature before we move ahead. I did some testing during development, but further changes were introduced, so I want to make sure that everything is operating as expected. There's also a lack of testing on the dispatch mechanism itself (such as tests for function registration and invocation) but I want to make sure the design works okay before adding those. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left some commentary, but am generally amazed and thrilled to see this land.
Scope feels very much like the right way forward and properly insulating lib.Instance. Also love that it's piecemeal.
lib/dispatch.go
Outdated
"strings" | ||
) | ||
|
||
// Dispatch is system for handling calls to lib. Currently only implemented for FSI methods. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Currently only...
part is slated to change soon, think we should omit as it goes stale quick
|
||
// methodEndpoint returns a method name and returns the API endpoint for it | ||
func methodEndpoint(method string) APIEndpoint { | ||
if method == "fsi.write" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this hardcoded here?
Actually, I don't really understand the encoding schema here.
fsi.write
is a special case as the route has the extra prefix. On the remaining routes we simply take the package.method
fromat and return a /method/
route, however that's not always the case. registry
, remote
, profile
, IPFS
are all in a similar boat as fsi.write
.
Additionally a lot of routes do not support a trailing /
and will fail in that case.
I know we need a translation layer for method name
> endpoint
but I'd rather have it just be directly assigned instead of handling all the edge cases with auto parsing. Maybe move the parsing function into api/api.go
and use that to set the methodEndpoints
along with the handler so that in future extensions it's hard to miss.
It's a bit anti-pattern in that it surfaces some lib level knowledge up to API but given that we rely on that same HTTP API for our RPC layer, we have to be strict about it regardless and the Qri implementation for the API needs to be aware of those details. Other users can skip that part if they don't intend on using the HTTP RPC layer at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably should have left a comment, but I definitely intended this hardcoded part to be temporary in there's a better plan. You're absolutely right that the correspondence between API paths and methods needs to be better codified, for now this PR is keeping it the same by pointing out the special case. We should make this do something better in a follow-up change.
|
||
// If the route has a dataset reference in the url, parse that ref, and | ||
// add it to the request object using the field "refstr". | ||
func addDsRefFromURL(r *http.Request, routePrefix string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am not a huge fan of this, but this should go away once UnmarshalFromRequest is utilized across the board. Maybe just leave a TODO that also says so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious what you mean; which part are you not a fan of? The need for this function addDsRefFromURL
? Or keeping the dsref in the url like /status/dustmop/my_dataset
? From my understanding, UnmarshalFromRequest is basically a hook that lets parameters customize their conversion from http to concrete struct types. There's a lot of methods that allow dsrefs to live in the url, so I would prefer to avoid needing to repeat the same conversion code over and over if we can. I left a TODO comment somewhat to what effect on line 52 (the first usage of addDsRefFromURL
), and plan on making a follow-up PR after this to demonstrate what I'm thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, my bad, should have been more clear on this.
So the reason I don't like it is that this further solidifies the pattern of prefix + hand parsing the ref at the end.
The intent of the comment was to steer this towards the recently added form of explicitly naming the dsref parts and using a common, global middleware to be super condensed on how we parse these.
In api/api.go
we have
func handleRefRoute(m *mux.Router, ae lib.APIEndpoint, f http.HandlerFunc) {
m.Handle(ae.String(), f)
m.Handle(fmt.Sprintf("%s/%s", ae, "{peername}/{name}"), f)
m.Handle(fmt.Sprintf("%s/%s", ae, "{peername}/{name}/{selector}"), f)
m.Handle(fmt.Sprintf("%s/%s", ae, "{peername}/{name}/at/{fs}/{hash}"), f)
m.Handle(fmt.Sprintf("%s/%s", ae, "{peername}/{name}/at/{fs}/{hash}/{selector}"), f)
}
And when we add add a route based ref handler we use the above form to define the route/handler mapping.
eg handleRefRoute(m, lib.AEGet, s.Middleware(dsh.GetHandler))
In api/middleware
we then have
func refStringMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
setRefStringFromMuxVars(r)
next.ServeHTTP(w, r)
})
}
Which acts as the propagation engine for the refstr query param.
It will also make it easier to switch later on to attach to context/scope instead of query param.
Finally, the addDsRefFromURL
fails on any url that has content/path components after the ref itself in this case. At the same time I was aiming to remove entirely lib/api.go DsRefFromPath
and similar functions once the refactor is complete.
But to distill my own fluff, I would prefer we use the handleRefRoute
approach.
type FSIHandlers struct { | ||
lib.FSIMethods | ||
inst *lib.Instance | ||
dsm *lib.DatasetMethods | ||
ReadOnly bool | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big fan of how you're tackling this handler transition. Just to keep everyone oriented in the right direction, we'd like to end up with a set of API endpoints that look something like this, with the handlers themselves being generated by a dispatch factory function:
endpoint | HTTP methods |
---|---|
"/wd/status/{path:.*}" | GET |
"/wd/whatchanged/{path:.*}" | GET |
"/wd/init/{path:.*}" | POST |
"/wd/checkout/{path:.*}" | POST |
"/wd/restore/{path:.*}" | POST |
"/wd/fsi/write/{path:.*}" | POST |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two questions:
- Once we have a
/wd/
prefix, do we also need/fsi/
for the last endpoint? How about just/wd/write/
? /init/
does not require a path, it often takes just the target directory and dataset name. Of course the dataset name is just the second half of the dsref, should it change to require the full dsref?
As to the question of mapping endpoints to methods: right now FSIMethods has Name
that just returns a string. I suppose we could change that to Mapping
to return a map[string]{string, string}
instead: mapping the method name to a endpoint and http verb. I think that's very similar to what you did in #1650
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we have a /wd/ prefix, do we also need /fsi/ for the last endpoint? How about just /wd/write/?
whoops that's a typo. All for /wd/write
. Apologies that's a major throw-off given what we're discussing here 😄
/init/ does not require a path, it often takes just the target directory and dataset name. Of course the dataset name is just the second half of the dsref, should it change to require the full dsref?
I'm of the mind that any singular dataset reference should be captured uniformly in the request URL. The upside of allowing a full ref would be support for initializing a working directory at a commit other than HEAD, or using an initID
I suppose we could change that to Mapping to return a map[string]{string, string} instead: mapping the method name to a endpoint and http verb. I think that's very similar to what you did in #1650
I feel like this is the path of least resistance for now, and would love revisit this after we do a bunch of this refactoring to see if we can't write a cleaner abstraction
cmd/checkout.go
Outdated
@@ -70,6 +71,9 @@ func (o *CheckoutOptions) Complete(f Factory, args []string) (err error) { | |||
|
|||
// Run executes the `checkout` command | |||
func (o *CheckoutOptions) Run() (err error) { | |||
ctx := context.Background() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use context.TODO() here. We should be passing the root context from cmd/cmd.go
into the command constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I wasn't aware that cmd
had an abandoned context, and thought that this would always be the top-level of the call stack. Will do.
cmd/fsi.go
Outdated
@@ -89,12 +87,15 @@ func (o *FSIOptions) Link() (err error) { | |||
return err | |||
} | |||
|
|||
ctx := context.Background() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, use context.TODO()
instead
cmd/fsi.go
Outdated
@@ -104,17 +105,18 @@ func (o *FSIOptions) Link() (err error) { | |||
|
|||
// Unlink executes the fsi unlink command | |||
func (o *FSIOptions) Unlink() error { | |||
var res string | |||
ctx := context.Background() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since Unlink
is a one-off method, have it accept a context argument instead, and pass in a context.TODO
defined at the call site
cmd/fsi_integration_test.go
Outdated
// TODO(dustmop): ipfs repo error: "this repo is currently being accessed by another process" | ||
// Figure out why this is failing and restore this check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a hunch, but it's possible this is caused by passing in context.Background
instead of a context that is cancelled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it's only happening for ExecCommandCombinedOutErr
, but I don't understand how that function is different from other Exec*
calls.
lib/dispatch.go
Outdated
log.Errorf("%s: bad number of inputs: %d", funcName, f.NumIn()) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should be panics. You've mentioned above a need for vet rules, having panics will get us half of that value without needing a fancy linter. Keep in mind it's possible to test a function that panics with a recover
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, right. I was thinking of how the gorpc system will silently ignore invalid method signatures, but I guess it's easy enough to make helpers that we don't need to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the vet rules are to test that the FSIMethods.Status
signature matches the FSIImpl.Status
signature (for the single input struct and output type). That's different then validating the signatures of only the FSIImpl methods.
lib/dispatch.go
Outdated
} | ||
// Second input must be a scope | ||
inType = f.In(1) | ||
if inType.Name() != "Scope" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this why the Scope
struct is exported? I'd really like the scope struct to be unexported, but if it needs to be exported to satisfy the rules of reflection, then we may have to just deal with it via documentation comments for the time being
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally unneeded, this can be be used still with scope unexported.
lib/dispatch.go
Outdated
InType: inType, | ||
OutType: outType, | ||
} | ||
log.Infof("%d: registered %s(*%s) %v", k, funcName, inType, outType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't use log.Infof
, drop this to log.Debugf
lib/fsi.go
Outdated
got, err := m.inst.Dispatch(ctx, m.Name()+".createlink", p) | ||
if res, ok := got.(*dsref.VersionInfo); ok { | ||
return res, err | ||
} | ||
return nil, dispatchReturnError(got, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the m.Name()+".createLink"
feels a bit too fragile. I'd prefer a private instance method defined in dispatch.go
:
func (inst *Instance) dispatchMethodName(m Methods, name string) string
This mainly helps us identify all of these method name constructions for later refactoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's not ideal. Not sure what dispatchMethodName
is doing, is the input the current short method name (like "createLink") while the output is the fully qualified name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah my bad I totally should have filled in the method body:
func dispatchMethodName(m Methods, name string) string {
return fmt.Sprintf("%s.%s", m.Name(), name)
}
filling it out here it totally doesn't need the inst
receiver. The big thing this is doing is only accepting (and utilizing) the Name
method on the Methods
interface to force the convention to agree
lib/scope.go
Outdated
} | ||
|
||
// Filesystem returns a filesystem | ||
func (s *Scope) Filesystem() qfs.Filesystem { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have this return a *qfs.Mux
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Dispatch is a new mechanism for handling calls invoked on the lib.Instance. Calls are sent to dispatch as a method name and input parameters. The actual implementation is looked up and then invokved using reflection. This allows the http api to layer itself directly on top of these same methods, which also lets us replace the old rpc style with the http api. It also nicely sets us up to introduce multi-tenancy and multi-processing, by having a single place that handles incoming requests.
This mechanmism is introduced here, and only used for FSI now. Dispatch can be introduced gradually, and does not require changing the whole world at once. This PR should serve as a guide for how to do the same refactoring for other method groups. Note that each FSI method in FSIMethods is now a very thin call to Dispatch, and then a type coercion afterwards to get the correct return value. Actual implementations live at the bottom of the same source file, and each take a Scope, which is a new structure to control access to the otherwise global resources in Instance.