Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Support COPY use case #19

Open
dmitrizagidulin opened this issue Jul 29, 2019 · 20 comments
Open

Proposal: Support COPY use case #19

dmitrizagidulin opened this issue Jul 29, 2019 · 20 comments

Comments

@dmitrizagidulin
Copy link
Member

dmitrizagidulin commented Jul 29, 2019

(This is a more detailed proposal continuation of issue solid/solid#49.)

Motivation / Problem Statement

We need a bandwidth-efficient method to copy data to and from Solid pods.

Imagine you're building a 'Save to Solid' widget / app / browser extension. The idea is - the user is browsing some Web resource (a PDF file, or an image, or a video, etc), and would like to save it to their pod (to be able to tag it and do other sorts of CMS stuff on it).

Currently, the only way to perform this operation would be multi-step:

  1. GET resource and somehow store it temporarily in the client (in browser memory?)
  2. PUT the temp resource on the pod (upload it)

Putting aside the implementation details, this presents two problems: temporary storage space (if the file is being held in a Javascript variable in a web app, or in LocalStorage, this quickly presents challenges when the resource is large), and bandwidth.

This is especially problematic on resource-constrained clients (like mobile apps or browsers) -- the user first has to use their mobile data to download a file temporarily, and then use mobile data to upload that resource to their pod.

Proposed Solution 1: COPY Method

Note: As @RubenVerborgh points out, we should separate the problem / use case from the proposed solution.

Add a new Solid-specific LDP method, COPY, inspired by the existing WebDAV COPY method.

This proposed solution lets pods play to their strength, to act as always-connected high(er) bandwidth servers. Specifically, it allows a client to issue a single COPY command, and the server would perform the necessary data transfer (using its own bandwidth, not the client's).

This would be just a single step:

  1. issue a command to COPY resource from source URL to destination URL (server would proceed with the operation).

COPY Example

To copy FROM an external URL, https://example.com/example.pdf, TO a user's POD, alice.inrupt.net:

COPY /papers/example.pdf HTTP/1.1
Host: alice.inrupt.net
Source: https://example.com/example.pdf
HTTP/1.1 201 Created
Date: Mon, 23 May 2019 22:38:34 GMT
Content-Type: application/pdf
Location: https://alice.inrupt.net/papers/example.pdf

COPY Method Specs

This method is idempotent, but not safe (see Section 9.1 of RFC2616). Responses to this method must not be cached.

Copying non-Container resources

(Note, this is what's currently implemented on node-solid-server; copying of containers is not implemented.)

When the source resource is not a collection, the result of the COPY method is the creation of a new resource at the destination whose state and behavior match that of the source resource as closely as possible.

To copy a resource FROM an external URL TO a Solid pod:

COPY <destination url>
Source: <source url>

(Note that this is backwards from the current WebDAV semantics, which uses COPY <source url>, Destination: <destination url>, see below.)

Copying Container resources

(Question: Should this be supported?)

See the WebDAV COPY for Collections spec for discussion of what's involved, including the handling of recursion via the Depth: header.

Difference from WebDAV COPY

WebDAV's COPY is intended primarily for transfering data out of the WebDAV server and into an external destination. Its syntax is: COPY <source url> with the Destination: <destination url> header.

It does not, however, support the common use case where the source URL resides on an external server, and you want to copy it to yours. (In other words, it does not support the Source: header.)

Since the motivation for Solid's COPY method is the latter (transfering from an external resource to a Solid pod), the Solid COPY method should support the Source: header (in addition or instead of the Destination: header).

ACL Interactions

Copying from a non-LDP public Web resource to a container:
A COPY operation requires that the authenticated user has Write access to the destination container.

Copying FROM an LDP container to an LDP container:
A COPY operation requires that the authenticated user has Read access on the source container, and Write access on the destination container.

Interfacing with actual WebDAV servers:
Out of scope for the moment. We just need this as a convenience method to move to and from Solid pods.

Design Questions

  1. This Solid/LDP Copy method uses a Source: request header, to handle the use case of transfering resources from an external source to a Solid pod. Question: Should the WebDAV style Destination: header operation be supported as well? (For transfering resources from a pod to another external pod).
  2. Should COPY be supported on LDP Containers? (What about recursive copy?)

COPY Implementation Notes

  • The proposed COPY method is currently implemented on node-solid-server, for experimental purposes.
  • The current implementation only supports the most common authentication use case (the external resource is public).
  • Developers using this method should be aware of .acl files when copying resources (for example, if a file has its own .acl, first copy the .acl, and then the resource itself).

Proposed Solution 2: ?

?? (Discuss whether the use case can be solved using existing LDP methods).

@RubenVerborgh
Copy link
Contributor

I'd propose perhaps to keep the problem being addressed separate from the solution; then we can distinguish between a) who supports the use case (and maybe which additional requirements they have), and b) who supports the solution.

I +1 the use case, but unsure about the solution (need to think if we can have an alternative without a custom HTTP method).

@dmitrizagidulin
Copy link
Member Author

@RubenVerborgh sure, good point. (I'll edit the proposal text to separate problem from solution).

@dmitrizagidulin dmitrizagidulin changed the title Proposal: Extend Solid LDP with a COPY method Proposal: Support COPY use case Jul 29, 2019
@dmitrizagidulin
Copy link
Member Author

@RubenVerborgh Are you thinking like, that a potential solution would be to use POST or PUT with the Source: header?

Minor comment on COPY being a custom header: fetch() supports the COPY method, as does Express.js, so, I figure, not so bad?

@RubenVerborgh
Copy link
Contributor

RubenVerborgh commented Jul 29, 2019

@RubenVerborgh Are you thinking like, that a potential solution would be to use POST or PUT with the Source: header?

Drawback about POST is the lack of idempotence; PUT has idempotence but requests the exact representation to be saved (so the copy logic would probably not be "allowed").

So there is a case for a custom method, I think. Just would want to consider alternatives as well.

@acoburn
Copy link
Member

acoburn commented Jul 29, 2019

It is worth noting that the Fedora specification created a feature called "External Content". That feature is specific to binary content but there are many similarities between that and what is proposed here. In addition, the use case described here follows the copy (rather than redirect) handling mechanism of Fedora's external content feature.

This feature is tremendously appealing, but it also opens the door to various security holes, so one would want to tread very carefully. For example, if an agent is able construct a COPY (or similar) request that causes the server to behave as a generic HTTP client, there are several categories of exploits that suddenly become possible.

First, if the Solid pod is running inside a firewall, this feature can be used as a type of port/resource scanner and it may be possible to gain access to resources that a user shouldn't typically have access to.

Second, DoS attacks become really easy. While it is common to limit the amount of data being transferred into a server via POST or PUT, in this scenario, the COPY request would be tiny, but it could trigger the download of an arbitrarily large resource into the server.

Third, authorization becomes more complicated if the remote resource requires special access that differs from the credentials required for writing to the destination resource.

As an alternative, it may be easier and (arguably) more secure to build this feature as a stand-alone sort of application that sits along side of a resource server, rather than being a feature of the resource server itself. The distinction here is small but architecturally important because one could treat all data as ephemeral and isolated from the resource server itself, which insulates the resource server from the kinds of exploits described above.

@RubenVerborgh
Copy link
Contributor

As an alternative, it may be easier and (arguably) more secure to build this feature as a stand-alone sort of application that sits along side of a resource server, rather than being a feature of the resource server itself.

+1

@dmitrizagidulin
Copy link
Member Author

@acoburn all excellent points! (And yeah, I was wondering if Trellis etc had a way to do something similar).

Fedora "External Content" feature

Yep, that would work too! (uses a PUT or POST with a Link: header)

And good point, we should add a section on Security Considerations.

if the Solid pod is running inside a firewall, this feature can be used as a type of port/resource scanner

Agreed. It'd be worth being able to enable/disable this feature on the config level.

Second, DoS attacks become really easy.

Not sure I fully agree there. I assume the COPY implementation would use the same quota limits as PUT or POST. (And of so, it's as resistant to DDOS as those verbs.)

Third, authorization becomes more complicated if the remote resource requires special access

Sure, yeah. The primary use case is for public web-accessible resources.

As an alternative, it may be easier and (arguably) more secure to build this feature as a stand-alone sort of application that sits along side of a resource server, rather than being a feature of the resource server itself.

Interesting!
I'm not sure it would be more secure. (It would be exactly as secure as implementing it in the server, given the caveats above (operators can disable it, and quota limits apply).)
Architecturally.. So that means pod operators need to set up yet another server, alongside?

@RubenVerborgh
Copy link
Contributor

Second, DoS attacks become really easy.

Not sure I fully agree there. I assume the COPY implementation would use the same quota limits as PUT or POST. (And of so, it's as resistant to DDOS as those verbs.)

Yeah, but suddenly I can ask 100 servers to perform 10 COPY requests each, for the same resource on a server A, which finds itself now bombarded with 1000 requests coming from 100 sources.

So that means pod operators need to set up yet another server, alongside?

Generic copy server/app/agent.

@dmitrizagidulin
Copy link
Member Author

@acoburn Oh, I just remembered (re 'port scanner' / firewall security consideration) -- our existing /proxy endpoint also has this issue. (And I believe the current mitigation is - you can turn it off on the config level).

@dmitrizagidulin
Copy link
Member Author

@RubenVerborgh

Yeah, but suddenly I can ask 100 servers to perform 10 COPY requests each

Hmm, good point.

@dmitrizagidulin
Copy link
Member Author

Generic copy server/app/agent.

Is the idea that there would be less instances of the copy server running than there are of pods?
Because if most pod providers also provide the copy server, the 100 * 10 COPY requests thing still applies to a separate agent as well.

@RubenVerborgh
Copy link
Contributor

I'd imagine that the copy servers require auth.

@dmitrizagidulin
Copy link
Member Author

I'd imagine that the copy servers require auth.

So does the COPY verb, though..

I dunno, the more I ponder it, the more it seems that the DDOS angle is not a deal-breaker.
Because the following conditions would have to apply:

  • There are 100 Solid Pod providers :) (good problem to have)
  • An attacker has the resources of having 100 accounts with high enough quotas (which means, likely paid) that are able to download large files

That second point - I think an attacker with that much resources would have no problem spinning up a 100 docker instances or droplets or whatever, outside of Solid.

@dmitrizagidulin
Copy link
Member Author

Another thought, re DDOS -- is sending COPY commands to a 100 solid pod providers that much more of a risk than, say, putting a link in a post on Slashdot or HackerNews?

@elf-pavlik
Copy link
Member

elf-pavlik commented Jul 30, 2019

This proposed solution lets pods play to their strength, to act as always-connected high(er) bandwidth servers. Specifically, it allows a client to issue a single COPY command, and the server would perform the necessary data transfer (using its own bandwidth, not the client's).
This is especially problematic on resource-constrained clients (like mobile apps or browsers) -- the user first has to use their mobile data to download a file temporarily, and then use mobile data to upload that resource to their pod.

As an alternative, it may be easier and (arguably) more secure to build this feature as a stand-alone sort of application that sits along side of a resource server, rather than being a feature of the resource server itself.

I also like this approach, we should not limit clients to just to a subset of possible clients which run locally on the device. Any application should be able to run code locally in the browser (main thread and workers) and remotely on servers. Leaving some responsibility to the client shouldn't meant that it needs to get handled locally on the device. Such copy capable client running on remote machine MAY run on same machine as storage server but doesn't need to. I believe in many cases such copy task would run 'in background' and it might not make big difference if it happens on single machine or not.

@TallTed
Copy link
Contributor

TallTed commented Jul 30, 2019

It seems to me that the COPY verb should take either/both Source: and Destination: headers, and if only one is present, the other necessary value MAY be supplied by direct argument OR user prompt. In other words, these SHOULD all be equivalent --

COPY {dest-file-uri}
Source: {source-file-uri}
COPY 
Source: {source-file-uri}
Destination: {dest-file-uri}
COPY {source-file-uri}
Destination: {dest-file-uri}

It also seems that --

  • Telling server x to COPY a remote resource (i.e., a resource it doesn't control) to a local resource (i.e., a resource it fully controls), is telling server x to GET that remote resource.

  • Telling server x to COPY a local resource (i.e., a resource it fully controls) to a remote resource (i.e., a resource it doesn't control), is telling server x to PUT that local resource.

  • Telling server x to COPY a remote resource (i.e., a resource it doesn't control) to a remote resource (i.e., a resource it doesn't control), is telling server x to tell server y (which might control either source or destination) to do one of the first two.


Where these resources actually reside is immaterial for the Solid Spec, though it may matter to NSS, vNext, or other implementations of that spec -- similar to the way that the question of where and how metadata is stored is immaterial to the requirement that clients must be able to use specific methods to read and write that metadata.

@elf-pavlik
Copy link
Member

elf-pavlik commented Jul 30, 2019

One more thought, if we have source resource on server A and we want to make a copy of it to server B, given that source resource stays under some ACL, application having responsibility of copying it will act as Solid client at least with regard to the server A - application may need authenticate with Bearer token etc. (also stay listed as trustedApp?) I understand that we discuss giving that responsibility to the solid server itself. Do we have any other scenarios where we would have solid server also act as solid client?

@kjetilk
Copy link
Member

kjetilk commented Nov 26, 2019

Just came to think of, SPARQL Update has a COPY operation. Could be another alternative.

@jeff-zucker
Copy link
Member

My collaborators and I have approached some of these issues from the client side with Solid-File-Client. We now offer recursive container copying/moving and handling of .acl and .meta. You may want to play with it to see how some of the issues work out in practice.

One issue is when the .acl gets copied. Yes, obviously safer to not leave the resource unprotected and copy the .acl first BUT suppose that I am copying from a server that keeps its .acl for /foo/bar.ttl in /foo/bar.ttl.acl to a server that keeps it in /foo/wac/bar.ttl.acl or some other location which it is free to do. In other words we can't know where the target server wants to place the .acl until AFTER the resources is there and we can read its headers. So we currently copy the resource first, read its link header and then place its .acl and .meta where the new location's server tells us to. I wonder if it's possible to require servers to have a single place that gives the pattern for placement of its .acls. So instead of having to read that location for each file with a second HEAD request, we could read a location pattern for the server, and apply that to all files copied in rather than reading it from each file.

Another issue with copying is if the user has absolute URLs in the .acl file's accessTo such that it would no longer point to the copied file, only to the original file. Currently we handle that by offering the ability to turn absolute links on the source to relative links on the target.

A related issue is this; to whom does the copied resource belong? Should it's .acl point to the owner of the source or the owner of the target? Currently we default to leaving the agent alone but a user can, with option flags, specify where it should point.

@kjetilk kjetilk closed this as completed Oct 15, 2021
@kjetilk
Copy link
Member

kjetilk commented Oct 15, 2021

Errr, sorry, closed by accident

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants