-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Support COPY use case #19
Comments
I'd propose perhaps to keep the problem being addressed separate from the solution; then we can distinguish between a) who supports the use case (and maybe which additional requirements they have), and b) who supports the solution. I +1 the use case, but unsure about the solution (need to think if we can have an alternative without a custom HTTP method). |
@RubenVerborgh sure, good point. (I'll edit the proposal text to separate problem from solution). |
@RubenVerborgh Are you thinking like, that a potential solution would be to use Minor comment on COPY being a custom header: |
Drawback about So there is a case for a custom method, I think. Just would want to consider alternatives as well. |
It is worth noting that the Fedora specification created a feature called "External Content". That feature is specific to binary content but there are many similarities between that and what is proposed here. In addition, the use case described here follows the This feature is tremendously appealing, but it also opens the door to various security holes, so one would want to tread very carefully. For example, if an agent is able construct a First, if the Solid pod is running inside a firewall, this feature can be used as a type of port/resource scanner and it may be possible to gain access to resources that a user shouldn't typically have access to. Second, DoS attacks become really easy. While it is common to limit the amount of data being transferred into a server via POST or PUT, in this scenario, the COPY request would be tiny, but it could trigger the download of an arbitrarily large resource into the server. Third, authorization becomes more complicated if the remote resource requires special access that differs from the credentials required for writing to the destination resource. As an alternative, it may be easier and (arguably) more secure to build this feature as a stand-alone sort of application that sits along side of a resource server, rather than being a feature of the resource server itself. The distinction here is small but architecturally important because one could treat all data as ephemeral and isolated from the resource server itself, which insulates the resource server from the kinds of exploits described above. |
+1 |
@acoburn all excellent points! (And yeah, I was wondering if Trellis etc had a way to do something similar).
Yep, that would work too! (uses a PUT or POST with a Link: header) And good point, we should add a section on Security Considerations.
Agreed. It'd be worth being able to enable/disable this feature on the config level.
Not sure I fully agree there. I assume the COPY implementation would use the same quota limits as PUT or POST. (And of so, it's as resistant to DDOS as those verbs.)
Sure, yeah. The primary use case is for public web-accessible resources.
Interesting! |
Yeah, but suddenly I can ask 100 servers to perform 10 COPY requests each, for the same resource on a server A, which finds itself now bombarded with 1000 requests coming from 100 sources.
Generic copy server/app/agent. |
@acoburn Oh, I just remembered (re 'port scanner' / firewall security consideration) -- our existing |
Hmm, good point. |
Is the idea that there would be less instances of the copy server running than there are of pods? |
I'd imagine that the copy servers require auth. |
So does the COPY verb, though.. I dunno, the more I ponder it, the more it seems that the DDOS angle is not a deal-breaker.
That second point - I think an attacker with that much resources would have no problem spinning up a 100 docker instances or droplets or whatever, outside of Solid. |
Another thought, re DDOS -- is sending COPY commands to a 100 solid pod providers that much more of a risk than, say, putting a link in a post on Slashdot or HackerNews? |
I also like this approach, we should not limit clients to just to a subset of possible clients which run locally on the device. Any application should be able to run code locally in the browser (main thread and workers) and remotely on servers. Leaving some responsibility to the client shouldn't meant that it needs to get handled locally on the device. Such copy capable client running on remote machine MAY run on same machine as storage server but doesn't need to. I believe in many cases such copy task would run 'in background' and it might not make big difference if it happens on single machine or not. |
It seems to me that the
It also seems that --
Where these resources actually reside is immaterial for the Solid Spec, though it may matter to NSS, vNext, or other implementations of that spec -- similar to the way that the question of where and how metadata is stored is immaterial to the requirement that clients must be able to use specific methods to read and write that metadata. |
One more thought, if we have source resource on server A and we want to make a copy of it to server B, given that source resource stays under some ACL, application having responsibility of copying it will act as Solid client at least with regard to the server A - application may need authenticate with Bearer token etc. (also stay listed as trustedApp?) I understand that we discuss giving that responsibility to the solid server itself. Do we have any other scenarios where we would have solid server also act as solid client? |
Just came to think of, SPARQL Update has a COPY operation. Could be another alternative. |
My collaborators and I have approached some of these issues from the client side with Solid-File-Client. We now offer recursive container copying/moving and handling of .acl and .meta. You may want to play with it to see how some of the issues work out in practice. One issue is when the .acl gets copied. Yes, obviously safer to not leave the resource unprotected and copy the .acl first BUT suppose that I am copying from a server that keeps its .acl for /foo/bar.ttl in /foo/bar.ttl.acl to a server that keeps it in /foo/wac/bar.ttl.acl or some other location which it is free to do. In other words we can't know where the target server wants to place the .acl until AFTER the resources is there and we can read its headers. So we currently copy the resource first, read its link header and then place its .acl and .meta where the new location's server tells us to. I wonder if it's possible to require servers to have a single place that gives the pattern for placement of its .acls. So instead of having to read that location for each file with a second HEAD request, we could read a location pattern for the server, and apply that to all files copied in rather than reading it from each file. Another issue with copying is if the user has absolute URLs in the .acl file's accessTo such that it would no longer point to the copied file, only to the original file. Currently we handle that by offering the ability to turn absolute links on the source to relative links on the target. A related issue is this; to whom does the copied resource belong? Should it's .acl point to the owner of the source or the owner of the target? Currently we default to leaving the agent alone but a user can, with option flags, specify where it should point. |
Errr, sorry, closed by accident |
(This is a more detailed proposal continuation of issue solid/solid#49.)
Motivation / Problem Statement
We need a bandwidth-efficient method to copy data to and from Solid pods.
Imagine you're building a 'Save to Solid' widget / app / browser extension. The idea is - the user is browsing some Web resource (a PDF file, or an image, or a video, etc), and would like to save it to their pod (to be able to tag it and do other sorts of CMS stuff on it).
Currently, the only way to perform this operation would be multi-step:
Putting aside the implementation details, this presents two problems: temporary storage space (if the file is being held in a Javascript variable in a web app, or in LocalStorage, this quickly presents challenges when the resource is large), and bandwidth.
This is especially problematic on resource-constrained clients (like mobile apps or browsers) -- the user first has to use their mobile data to download a file temporarily, and then use mobile data to upload that resource to their pod.
Proposed Solution 1:
COPY
MethodNote: As @RubenVerborgh points out, we should separate the problem / use case from the proposed solution.
Add a new Solid-specific LDP method,
COPY
, inspired by the existing WebDAV COPY method.This proposed solution lets pods play to their strength, to act as always-connected high(er) bandwidth servers. Specifically, it allows a client to issue a single COPY command, and the server would perform the necessary data transfer (using its own bandwidth, not the client's).
This would be just a single step:
COPY Example
To copy FROM an external URL,
https://example.com/example.pdf
, TO a user's POD,alice.inrupt.net
:COPY
Method SpecsThis method is idempotent, but not safe (see Section 9.1 of RFC2616). Responses to this method must not be cached.
Copying non-Container resources
(Note, this is what's currently implemented on
node-solid-server
; copying of containers is not implemented.)When the source resource is not a collection, the result of the COPY method is the creation of a new resource at the destination whose state and behavior match that of the source resource as closely as possible.
To copy a resource FROM an external URL TO a Solid pod:
(Note that this is backwards from the current WebDAV semantics, which uses
COPY <source url>, Destination: <destination url>
, see below.)Copying Container resources
(Question: Should this be supported?)
See the WebDAV COPY for Collections spec for discussion of what's involved, including the handling of recursion via the
Depth:
header.Difference from WebDAV COPY
WebDAV's COPY is intended primarily for transfering data out of the WebDAV server and into an external destination. Its syntax is:
COPY <source url>
with theDestination: <destination url>
header.It does not, however, support the common use case where the source URL resides on an external server, and you want to copy it to yours. (In other words, it does not support the
Source:
header.)Since the motivation for Solid's COPY method is the latter (transfering from an external resource to a Solid pod), the Solid COPY method should support the
Source:
header (in addition or instead of theDestination:
header).ACL Interactions
Copying from a non-LDP public Web resource to a container:
A COPY operation requires that the authenticated user has Write access to the destination container.
Copying FROM an LDP container to an LDP container:
A COPY operation requires that the authenticated user has Read access on the source container, and Write access on the destination container.
Interfacing with actual WebDAV servers:
Out of scope for the moment. We just need this as a convenience method to move to and from Solid pods.
Design Questions
Source:
request header, to handle the use case of transfering resources from an external source to a Solid pod. Question: Should the WebDAV styleDestination:
header operation be supported as well? (For transfering resources from a pod to another external pod).COPY Implementation Notes
node-solid-server
, for experimental purposes..acl
files when copying resources (for example, if a file has its own.acl
, first copy the.acl
, and then the resource itself).Proposed Solution 2: ?
?? (Discuss whether the use case can be solved using existing LDP methods).
The text was updated successfully, but these errors were encountered: