Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requiring support for Content-Type: message/external-body #37

Closed
emetsger opened this issue Feb 2, 2017 · 62 comments
Closed

Requiring support for Content-Type: message/external-body #37

emetsger opened this issue Feb 2, 2017 · 62 comments

Comments

@emetsger
Copy link

emetsger commented Feb 2, 2017

In section 1.3, HTTP POST:

Implementations MUST support Content-Type: message/external-body extensions for request bodies for HTTP POST that would create LDP-NRs. This content-type requires a complete Content-Type header that includes the location of the external body, e.g Content-Type: message/external-body; access-type=URL; URL="http://www.example.com/file\", as defined in [rfc2017].

I'm curious as to the motivation for calling out this specific content type? I initially read it as: a Fedora implementation, upon receiving a POST with Content-Type: message/external-body would dereference the URL and ingest the content of that URL (as an LDP-NR?). Later I understood that the use case was to support the idea of Fedora 3 external data streams.

I wonder if it would be better for this to be a MAY, instead of a MUST, or simply remove this paragraph all together? It seems odd to me that the specification requires all Fedora implementations to carry over a use case from Fedora 3. Or, I may just not fully understand the motivation.

@awoods
Copy link
Collaborator

awoods commented Feb 2, 2017

The "external datastream" functionality is something that many Fedora 4 installations depend on. I see the specification of this interaction playing the beneficial role of defining a common approach for accomplishing redirection to external content.
It would be interesting to discuss the possibility of downgrading this requirement from a MUST to a MAY ...but since this represents a fundamental use of Fedora, my inclination is to keep the wording as-is.

@ajs6f
Copy link
Contributor

ajs6f commented Feb 2, 2017

"external datastream" is the wrong term. It is confusing the issue. An external datastream, in Fedora 3, when retrieved, streamed through the repo allowing authZ to be employed (or disseminators to bind). There is nothing like this in F4 (the "federation" functionality was as close a match as there was).

@ajs6f
Copy link
Contributor

ajs6f commented Feb 2, 2017

We're talking about https://wiki.duraspace.org/display/FEDORA4x/External+Content-- if that resembles anything in Fedora 3, it would be Redirect Datastreams. Does that make the purpose more clear? No bits flow through the repo itself on retrieval of such a bitstream. It supports the extremely common use case of content outside the repo, the location of which is managed by the repo.

@ajs6f
Copy link
Contributor

ajs6f commented Feb 2, 2017

If we do that, people will complain that they have to follow an extra hop. The question arises: why is no one complaining about the issue you raise? Perhaps it's not hard enough to managed redirect behavior in a client to annoy people? Don't know.

@ajs6f
Copy link
Contributor

ajs6f commented Feb 2, 2017

That's only our own code. That doesn't mean anything. If we held ourselves to the standards to which we hold others, we'd never get anywhere.

I don't have really strong opinions about this. I leave it to @awoods to comment on how serious an annoyance the lack of a facility like this would be perceived as by users.

@birkland
Copy link
Contributor

birkland commented Feb 2, 2017

@acoburn Ha, I was just about to link to that issue. I'm beginning to question how well how such redirects interoperate with the LDP spec at all. For example, the resource at the other end of the redirect may or may not be an LDPR, which section 4.4.1.1 says an LDP-NR MUST be. Could that confuse LDP clients?

@emetsger
Copy link
Author

emetsger commented Feb 2, 2017

@acoburn yes, that was more what I was thinking: is there a "LDP-way" of doing things?
@ajs6f @awoods thanks for the clarification!

It seems like there's at least two different patterns for obtaining externally-hosted LDPNRs, and it just seemed odd to me that the spec used MUST in this section.

@ajs6f
Copy link
Contributor

ajs6f commented Feb 2, 2017

I think it's more fine-grained that that. There's managing the URI, there's managing the dereferencing, and there's managing the bits. We're trying to let people decide to give some but not all of those things to the repo.

@ajs6f
Copy link
Contributor

ajs6f commented Feb 2, 2017

@birkland We can make that more clear, if we want to keep this. We can say that the external-body thing doesn't have to be an LDPR at all, if we want to. It's not governed by LDP, it's our extension.

@birkland
Copy link
Contributor

birkland commented Feb 2, 2017

@ajs6f I think the spec would need to do something like that. The worrisome part to me is that the LDP spec defines the notion of containment as fundamentally a relationship between LDPRs:

Containment
The relationship binding an LDPC to LDPRs whose lifecycle it controls and is aware of.

So if these external body resources are not LDPRs, then as-written the LDP spec would suggest that containment is not an appropriate concept to apply to them (i.e. a container can't contain them via ldp:contains). We'd need to relax that part, I think?

@ajs6f
Copy link
Contributor

ajs6f commented Feb 2, 2017

I think the relationship is between resources, but it is accessed via URIs (and because this is HTTP, only via URIs). So the fact that we are dealing with two different URIs (the one requested and the one to which we redirect) is what matters for your objection.

In any event, I really don't care that much. I think that if we remove this feature, we will get a slew of different and in some cases incompatible patterns, and I would rather not see that, but maybe we won't.

@ajs6f
Copy link
Contributor

ajs6f commented Feb 2, 2017

To be clear, I think that the fact that we have two different URIs here means that from the HTTP POV, we have two different resources; the one in the repo (the requested URI) is LDP contained, the one outside (the one to which we redirect) is not.

@birkland
Copy link
Contributor

birkland commented Feb 2, 2017

we have two different resources; the one in the repo (the requested URI) is LDP contained, the one outside (the one to which we redirect) is not.

Exploring this perspective a little bit, would it be appropriate for the spec to say that the repository MUST return a 303 response to all requests to the resource? I think that can help address my concern with reconciling with the LDP spec, and highlight the difference between the two.

Still, the issue remains that clients that transparently follow redirects are likely to be confused, (as in the recent UI issue), at least if they're looking to ascertain the nature of the resource, discover its describedby description, etc. via inspecting headers. If external content redirects remain in the spec, I think there ought to be at least some non-normative language that explains this.

@ajs6f
Copy link
Contributor

ajs6f commented Feb 2, 2017

MUST is fine by me-- isn't that what it says now? @barmintor moved that behavior under GET and somewhere else, can't remember where.

"clients that transparently follow redirects" are intentionally hiding the nature of the web they are traversing. That's something they can do, but it's not unreasonable that in that case, they're going to miss some details. Like description. 303 is not 200. I don't think we should conflate them.

@birkland
Copy link
Contributor

birkland commented Feb 2, 2017

@ajs6f It says 3xx now; I'm wondering if it should specifically say 303.

As far as transparently following redirects, I just think the semantics and behavior spelled out in the spec related to external content are unusual enough that it deserves some prose.

@ajs6f
Copy link
Contributor

ajs6f commented Feb 2, 2017

@birkland No prob with extra non-normative language to clarify things. Why 303 in particular? That doesn't square for me. What about impls that might want to fully proxy POST (for authZ)?

@awoods
Copy link
Collaborator

awoods commented Feb 2, 2017

@birkland : The current implementation responds to GET requests with 307. Do you feel strongly that 303 is more suitable?

And I agree with adding "extra non-normative language to clarify things".

Whichever status code is most appropriate on GET (303 or 307), further specifying it makes sense.
@ajs6f : do you have concerns on GET or just POST?

@ajs6f
Copy link
Contributor

ajs6f commented Feb 2, 2017

It appears to me to be overspecified behavior. We don't have to have a reason to leave it as 3xx; we need a reason to be any more specific than that.

@birkland
Copy link
Contributor

birkland commented Feb 2, 2017

@ajs6f @awoods I think a 303 is the only response response makes sense If the following is true:

we have two different resources; the one in the repo (the requested URI) is LDP contained, the one outside (the one to which we redirect) is not.

So if a client does a GET (or POST, or whatever) on a one of these external-body repository URIs, the repository sends them elsewhere:

303 response to a GET request indicates that the origin server does
not have a representation of the target resource that can be
transferred by the server over HTTP. However, the Location field
value refers to a resource that is descriptive of the target
resource, such that making a retrieval request on that other resource
might result in a representation that is useful to recipients without
implying that it represents the original target resource

This seems to describe what this redirect is trying to achieve, right? What's being redirected to is not an LDPR managed by the repository, it's something else. The client needs to interact separately with that resource in order to discover what it is, what can be done with it, etc.

A 307 would imply:

target resource resides temporarily under a different URI and the user agent
MUST NOT change the request method if it performs an automatic
redirection to that URI

That seems strictly contrary to this external-content-as-distinct-from-the-ldpr scenario, right?

@birkland
Copy link
Contributor

birkland commented Feb 2, 2017

Looking back at the spec, it only says that a 3xx must be returned by a GET. Does that mean that all other interactions (e.g. PUT, POST, HEAD) would not result in an a redirect at all, and would instead interact with the repository LDPR (e.g. changing the external URI, deleting the LDPR, etc)?

@awoods
Copy link
Collaborator

awoods commented Feb 2, 2017

@birkland , that is my understanding.

@barmintor
Copy link
Contributor

@birkland I would still expect HEAD to return the same status as GET.

As to the original issue: I'm lukewarm on it, to be honest. There are persistent requests for external content references, but this is not that. If we don't have this, but instead a link, I'd argue that we no longer have a LDP-NR at all, just a LDP-RS that has an additional rdf:type (PCDM folks antennae go up). That might be fine.

@awoods
Copy link
Collaborator

awoods commented Feb 3, 2017

@barmintor , I share a similar sentiment in that my primary concern is supporting the requirement for external content. I would be interested to hear why you do not believe the specified approach is not that.

I believe the use case we are trying to address with the message/external-body Content-Type is the ability to create an LDP-NR in the repository, the body of which is a resource found outside of the repository. Additionally, requests for that LDP-NR are expected to redirect to the external content.

The current approach does this in a standardized way with (RFC3230); but if there is another suggestion that satisfies the requirement, let's discuss.

@barmintor
Copy link
Contributor

@awoods I'm happy to discuss in another issue?

@awoods
Copy link
Collaborator

awoods commented Feb 3, 2017

@barmintor , ok. Does that imply that we close this one as: "leave as-is for now"?

@birkland
Copy link
Contributor

birkland commented Feb 3, 2017

@awoods I don't think the spec is acceptable as-is, but if the intent of the new issue is to fix it, that would be fine.

@birkland
Copy link
Contributor

birkland commented Feb 3, 2017

@barmintor @awoods I don't believe the spec currently spells out what happens on HEAD requests to resources with external bodies, so as-written it would suggest it's up to the implementation to do something reasonable.

@ajs6f
Copy link
Contributor

ajs6f commented Feb 3, 2017

@birkland I think there is an obvious choice (return a 307). I would be happy to see that written down.

@awoods
Copy link
Collaborator

awoods commented Feb 3, 2017

@birkland : To close this issue, may I suggest:

  1. We decide on whether to specify 303? 307? or 3xx?
  2. We add text that "spells out what happens on HEAD", i.e. also redirect with 3?? (as decided above)
  3. We add non-normative clarifying text per: Requiring support for Content-Type: message/external-body #37 (comment)

@birkland
Copy link
Contributor

birkland commented Feb 3, 2017

@awoods Sure, though I think feedback from @barmintor and @acoburn would suggest that use cases around external content may be addressable just by creating an appropriate LDP-RS that link to the content, rather than creating a redirecting LDP-NR. So to your list a step 0?

  1. Decide if creating LDP-NRs that redirect via external-body shall remain in the spec

.. if the answer is 'yes', then we go on to 1, 2, and 3 to clarify the existing spec?

@birkland
Copy link
Contributor

birkland commented Feb 3, 2017

New issue #40 created to fix existing spec.

Scoping this issue to "Decide if creating LDP-NRs that redirect via external-body shall remain in the spec as a MUST", my instinct would be to make it a MAY. To be honest, I'd use API-X to implement redirection or proxying of content as appropriate.

@awoods
Copy link
Collaborator

awoods commented Feb 3, 2017

@acoburn : that is a fair point. Currently on the table are both ends of the spectrum:

  • Create iana:describes links from repository LDP-RSs to external resources
  • Use external-body with redirects

In the new issue #40 (thanks @birkland !) we can collect the requirements, including the ones you just noted to help drive towards a more comfortable resolution.

@barmintor
Copy link
Contributor

@awoods I think iana:describes must be on the LDP-RS that describes some other binary resource. If we don't redirect, we can still use external-body. Using other links really means the resource in question is not a LDP-NR.

@awoods
Copy link
Collaborator

awoods commented Feb 3, 2017

@barmintor : thanks, I updated my comment.

@ajs6f
Copy link
Contributor

ajs6f commented Feb 3, 2017

Using other links really means the resource in question is not a LDP-NR.

@barmintor, this confuses me: I see that it would mean that we can't infer from LDP alone that it is an LDP-NR, but how can we infer that it definitely is not?

@barmintor
Copy link
Contributor

@ajs6f if we have an existing HTTP mechanism for documenting a reference in Content-Type: external-body, and another one for redirecting in 3xx with Location: ..., and we're not a RDF source, and we're not binary content, I don't know what we're talking about anymore.

@ajs6f
Copy link
Contributor

ajs6f commented Feb 3, 2017

@barmintor That's not my point. My question is: if I have an RDF source (in an LDP server) and it has a link (of some non-iana:describes variety) to something elsewhere in the world, what on earth lets you infer that that something else isn't an LDP-NR? It doesn't seem to me that you know anything about that other resource, on an LDP basis. It could be an LDP-NR or a fried egg, for all we know.

@barmintor
Copy link
Contributor

@ajs6f you have me backwards. I'm saying that if a resource itself doesn't have those things, that resource is not a LDP-NR.

@ajs6f
Copy link
Contributor

ajs6f commented Feb 3, 2017

You're saying that if a resource doesn't something that describes it it cannot be an LDP-NR?

@barmintor
Copy link
Contributor

No, I'm saying if GET X is not local binary content, and GET X doesn't have Content-Type: external-body, and GET X doesn't have Location: ..., but GET X has linked data that points to other resources, X is not a LDP-NR.

@ajs6f
Copy link
Contributor

ajs6f commented Feb 3, 2017

Okay, this side issue between @barmintor and myself has been discussed in IRC, and it turns out to be a disagreement about what kinds of resources can use what kinds of links to set up a relationship of description/described, which is not at all an important part of this ticket. Sorry for sidetracking things!

@ruebot
Copy link
Contributor

ruebot commented Feb 10, 2017

Reading through this issue, and trying to figure out if this one should be closed since it is superseded by #40 and #41. If that's the case, shall we close this issue?

@birkland
Copy link
Contributor

@ruebot Looking through the comments, it looks like there isn't consensus on keeping MUST in the language:

Quoth @acoburn

it seems to me that this feature fits rather uncomfortably with the other components of the specification.

Quoth @barmintor

If we don't redirect, we can still use external-body
(I think that's suggesting that returning the originally supplied message/external-body rather than a redirect would be a reasonable thing to do)

Quoth me

my instinct would be to make it a MAY. To be honest, I'd use API-X to implement redirection or proxying of content as appropriate

So how about this: Change to language MAY or SHOULD (then close this ticket), clarify the specification with #40 , and explain this all in non-normative text in #41 ?

@ruebot
Copy link
Contributor

ruebot commented Feb 10, 2017

@birkland #54

@kefo
Copy link

kefo commented Mar 30, 2017

Three things about this.

  1. While my intent is not, yet, to re-open this, I'd be curious if @barmintor or @escowles, after considerable discussion at LDCX, believe external content MUST be supported versus SHOULD. Many current Fedora installations rely on this feature. At AIC we too are evaluating not storing some content in Fedora.

  2. MUST was changed to SHOULD in 3.3.1, which is about POST, but not in 3.4.1, which is about PUT.

  3. I think the issues @acoburn raises about supporting external-body resources and the impact that support will have on other elements of the specification deserve wider discussion. Such a discussion may surface strategies to help address the issues or at least inform any future discussion about what it means to be "in conformance" with the specification.

@escowles
Copy link
Contributor

@acoburn @kefo I agree that we could probably dispense with the redirect functionality and replace it with a triple. On the other hand, there was some discussion here at LDCX yesterday about proxying instead of redirecting (using message/external-body; access-type=local-file), so I think it's worth considering that angle too.

@ajs6f
Copy link
Contributor

ajs6f commented Mar 31, 2017

I have no idea what local-file means in the context of a distributed impl.

@escowles
Copy link
Contributor

@acoburn @ajs6f I agree, it's unclear to me what some of the access-type values would mean in different contexts (and even whether proxying or redirecting is a better fit, or should be indicated separately). There's currently no mention of access-type in the spec, and maybe there's nothing more we can add. I've been trying to figure out if there's anyone with hard requirements for proxying vs. redirecting, but so far all the use cases I've heard sound like either approach could work.

The two other things I would add related to that are:

  • Proxying seems like it would potentially address situations where Fedora could access the content but the client could not (either because of auth, private network, etc.).
  • Proxying would also make access-type=local-file make sense for some implementations, but not all (as you note).
  • Redirecting might avoid network contention where the client had good access to the content, but Fedora was on a slower network, or had to retrieve the content over the same connection the client was using.
  • Redirecting would also make access-type=local-file work where the client had access to a local file, but Fedora did not (necessarily). This is also the scenario where the client would not have to have knowledge of Fedora's access to files, but was just storing its own reference to the content.

In short, I think proxying vs. redirecting boils down to: do we mean that Fedora is managing content in an external system, or do we mean that the client has a reference to content it wants to mange outside of Fedora?

@ajs6f
Copy link
Contributor

ajs6f commented Mar 31, 2017

In short, I think proxying vs. redirecting boils down to: do we mean that Fedora is managing content in an external system, or do we mean that the client has a reference to content it wants to mange outside of Fedora?

Yes. And while the former is more powerful, it is also harder and makes much stronger demands on impls.

@barmintor
Copy link
Contributor

Links require the client to be able to resolve them, which is not the case in the scenarios in which installations use external file data. In this scenario Fedora doesn't manage the contents of the file path, it points to them.

@ajs6f
Copy link
Contributor

ajs6f commented Mar 31, 2017

@barmintor Isn't that what @escowles said above? Or are you making a claim about all "scenarios in which installations use external file data"?

@barmintor
Copy link
Contributor

I'm only reiterating that proxied content can't be replaced with Location headers.

@birkland
Copy link
Contributor

birkland commented Mar 31, 2017 via email

@ajs6f
Copy link
Contributor

ajs6f commented Mar 31, 2017

@barmintor That is my understanding of what @escowles said. I don't think anyone is arguing that proxy and reference are interchangeable...

@barmintor
Copy link
Contributor

@ajs6f I'm sorry I fell off this, and also sorry for misunderstanding what @acoburn was saying. I'm fielding specific requests for proxied content, and that's also the migration impediment Columbia has, so my hackles got up at links addressing it. We use file: URIs to point to preservation content and specifically expect that the client cannot resolve them.

I'm trying to put together a document with some explanation of the proxy, redirect, and reference use cases that might be communicated with external-body that should close #41 and result in a PR for LDP-NR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants