Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize path based space operations and regex based registry #2862

Closed
wants to merge 3 commits into from

Conversation

ishank011
Copy link
Contributor

@ishank011 ishank011 commented May 12, 2022

fixes #2859

@ishank011 ishank011 force-pushed the sp-mount-paths branch 2 times, most recently from af3eb6e to 8b3f759 Compare May 12, 2022 12:44
SpaceType: "+mountpoint",
},
},
}

spaces, err = r.findStorageSpaceOnProvider(ctx, p.Address, filters, unrestricted)
if err != nil {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@butonic @kobergj the checks below this must be moved to the storage drivers which must have all the logic to return only the correct spaces. I can't remove this code as the tests running on decomposed fs will fail. Can we merge this PR and you can create another one for that? It'll unblock us

@ishank011 ishank011 marked this pull request as ready for review May 12, 2022 12:48
@ishank011 ishank011 requested review from labkode, glpatcern and a team as code owners May 12, 2022 12:48
@micbar
Copy link
Member

micbar commented May 12, 2022

@ishank011 seems like the nextcloud server mock needs to adapt.


// Prepend the mount path of the storage provider if the request was for an absolute reference
if ref != nil && !utils.IsRelativeReference(ref) {
ref.Path = path.Join(mountPath, ref.Path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes storage providers aware of their mount point. which we deliberatly removed because shares inevitabley create multiple valid paths for the same resource. But this path must not be exposed to prevent share receivers from getting to know how someoneelse organized his files ...

With spaces I tried to make mountpoint and grant type spaces distinguishable. We are currently looking at the /dav/spaces endpoint to make it possible to use a full resuorceid, including storageproviderid, spaceid and nodeid to represent the root from which to further navigate down the tree with the relative path that follows the spaceid.

That will unify how shares as well as spaces are accessed using the /dav/spaces endpoint. This became clear when we were fleshing out the share jail in ocis web and the desktop client. The web client has to discover the granted resources because it needs to use the same fileid to start collaborative editing sessions. Mountpoints actually represent different resources, which is why the mountpoint spaces in the /me/drives endpoint also carry the remoteItem property which holds the fileid for the actual shared resource, aka the grant space.

I think one clarification is that grants cannot have an absolute path, all you can do is expose the root of the grant and then navigate relative to it. You must not be able to escape that jail.

For path based access we can take that relative reference to the sharesstorage provider, which will create a path for it.

But when accessing a normal storage provider, the storage space registry is the only place that should know the path prefix for that storage provider. The gateway can then wrap and unwrap paths. What the storage provider is responsible for is a path from his conceptual root to a storage space:

/users/e/einstein/relative/path consists of
{storage_path}/{space_path}/{relative_path} where

  • {storage_path} is /users, managed by the registy, wrapped an unwrapped by the gateway
  • {space_path} is e/einstein, an alias for the personal space of einstein, managed by the storage provider
  • {relative_path}, relative/path the path to a resource relative to einsteins personal space

Now this could be configured with regexes, so that /users/[a-k] is routed to a specific storage provider. the registry would still use the mount_point property for that space and cut off only /users/, the storage provider would then have to navigate to the space using e/einstein. It would alse prefix the relatuve path with the space_path and the gateway would prefix /users/ again.

Or the mount_point includes the regex, then the space_path that reaches the storage provider becomes just einstein and the gateway prefixes the request with what it cut off before forwarding the request to the storage provider.

It might just be terminology, but the storage provider must not know about the mount_point. It cannot, because different registries might have different ideas about what path leads to a certain storage provider.

Another example: you only have one storage provider, but you want to have a /users and a /projects namespace, where /users holds all personal spaces and /projects all project spaces. As long as the storage provider is oblivious of where it is mounted you can configure a providers with two types of spaces, each with their own mount_point. That is currently not implemented, but it certainly makes sense to allow more than one type of space per storage provider and also mount them at a different location.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just my current braindump, I need to think about this further ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this path must not be exposed to prevent share receivers from getting to know how someoneelse organized his files

But this is exactly what we want to do. Just leave it empty and it'll work how you want it to. I could move this to the EOS driver but this fits better in the storageprovider layer IMO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, as a feature it makes sense to have a way to enable disclosing the paths 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ishank011 If all you want to achive is being able to shard the user directory, why can you not implement that logic in the registry, with the gateway doing the path wrapping and unwrapping? Is the problem that the storage provider will then only see a relative path inside a space, eg:

  • gateway receives /users/e/einstein/some/path
  • registry matches /users/e with the /users/[a-k] mount_point regex
  • registry cuts off the matched prefix for the provider /users/e and returns a provider with the mount_point set to /users/e
  • gateway cuts off the mount_point from the path, which becomes einstein/some/path
  • storage driver has to looks up the space for einstein, lets call it the space_path
  • the driver can navigate to some/path, relative to einsteins home

now this does not involve any id lookups ... this would be purely path based resource access ... without the storage provider having to know the mount_point?

At this point the storage provider still has to know about the space_path, maybe using a template like {.Spaceid} or {.Owner.Id}. Which is why the space registy can be configured with a path_template. It serves the same purpose, but now all path logic is in the registry.

IMO the code complexity would decrease if we use the path / storageproviderid to let the registry look up the storage space once and then always continue with relative references.

We should have a call to clarify the how to proceed ...

@sonarcloud
Copy link

sonarcloud bot commented May 12, 2022

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 7 Code Smells

No Coverage information No Coverage information
1.1% 1.1% Duplication

resp.StorageSpace.Id.OpaqueId = storagespace.FormatStorageID(s.conf.MountID, resp.StorageSpace.Id.GetOpaqueId())
resp.StorageSpace.Root.StorageId = storagespace.FormatStorageID(s.conf.MountID, resp.StorageSpace.Root.GetStorageId())
// The storage driver might set the mount ID by itself, in which case skip this step
if mountID, _ := storagespace.SplitStorageID(resp.StorageSpace.Id.GetOpaqueId()); mountID == "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no such thing as a mount id in the cs3 api. The Storage Space ID consists of {provider id}${space id}(!{opaque id}), which is split into {storage provider id} and the {space id}(!{opaque id}). We should name this providerID.

however ... why would the driver know anything about provider ids? When we update the cs3 api we will make the resource id contain three properties: providerid, spaceid any opaqueid. This will get rid of any id wrapping / unwrapping ...but why would you make the driver aware of the provider id? it is just a routing information. ... hmmm the only use case I see for that is when the driver can host multiple providers ... which we might want to support when dealing with migrating spaces between providers ... but for now this seems linke watering the boundaries to me.

@butonic
Copy link
Contributor

butonic commented May 13, 2022

one more thing: ocis web started using /dav/spaces requests ... as does the client. all requests will become relative ...
are there any CS3 clients directly interacting with the cs3 gateway that use path based requests?

@butonic butonic marked this pull request as draft May 13, 2022 13:51
@butonic
Copy link
Contributor

butonic commented May 13, 2022

@ishank011 I set this to draft. Please let us know your thoughts after evaluating current web whch uses the /dav/spaces endpoint and only makes relative requests

@ishank011
Copy link
Contributor Author

@butonic the web client behavior is based on a capability called share_jail https://github.com/cs3org/reva/blob/edge/internal/http/services/owncloud/ocs/data/capabilities.go#L68

How do you plan to support it in the future? With this set to false, path-based approaches work and these changes would again make sense

@ishank011
Copy link
Contributor Author

Incorporated in #2919

@ishank011 ishank011 closed this Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants