-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i18n and n11n of resource identifiers #575
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Wouter Termont <woutermont@gmail.com>
Signed-off-by: Wouter Termont <woutermont@gmail.com>
Signed-off-by: Wouter Termont <woutermont@gmail.com>
Signed-off-by: Wouter Termont <woutermont@gmail.com>
Signed-off-by: Wouter Termont <woutermont@gmail.com>
Signed-off-by: Wouter Termont <woutermont@gmail.com>
Signed-off-by: Wouter Termont <woutermont@gmail.com>
Signed-off-by: Wouter Termont <woutermont@gmail.com>
Co-authored-by: Matthieu Bosquet <matthieubosquet@gmail.com>
|
||
<p><span about="" id="server-iris-to-http" rel="spec:requirement" resource="#server-iris-to-http"><span property="spec:statement">When using an IRI in an <a href=#http>HTTP</a> message, except in the content, a Solid <span rel="spec:requirementSubject" resource="#Server">server</span> <span rel="spec:requirementLevel" resource="spec:MUST">MUST</span> map the IRI to a URI according to the algorithm provided by [<cite><a class="bibref" href="#bib-rfc3987">RFC3987</a></cite>] (<a href="https://datatracker.ietf.org/doc/html/rfc3987#section-3.1">section 3.1</a>).</span></span></p> | ||
|
||
<p><span about="" id="server-iris-from-http" rel="spec:requirement" resource="#server-iris-from-http"><span property="spec:statement">When interpreting a URI in an <a href=#http>HTTP</a> message, except in the content, as a resource identifier, a Solid <span rel="spec:requirementSubject" resource="#Server">server</span> <span rel="spec:requirementLevel" resource="spec:MUST">MUST</span> map the URI to an IRI according to the algorithm provided by [<cite><a class="bibref" href="#bib-rfc3987">RFC3987</a></cite>] (<a href="https://datatracker.ietf.org/doc/html/rfc3987#section-3.2">section 3.2</a>), and normalize the resulting IRI to the <a href="#iris-norm">normal form</a> provided in this section.</span></span></p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about reusing the language from 3987 (section 3.2's "Converting"):
<p><span about="" id="server-iris-from-http" rel="spec:requirement" resource="#server-iris-from-http"><span property="spec:statement">When interpreting a URI in an <a href=#http>HTTP</a> message, except in the content, as a resource identifier, a Solid <span rel="spec:requirementSubject" resource="#Server">server</span> <span rel="spec:requirementLevel" resource="spec:MUST">MUST</span> map the URI to an IRI according to the algorithm provided by [<cite><a class="bibref" href="#bib-rfc3987">RFC3987</a></cite>] (<a href="https://datatracker.ietf.org/doc/html/rfc3987#section-3.2">section 3.2</a>), and normalize the resulting IRI to the <a href="#iris-norm">normal form</a> provided in this section.</span></span></p> | |
<p><span about="" id="server-iris-from-http" rel="spec:requirement" resource="#server-iris-from-http"><span property="spec:statement">When interpreting a URI in an <a href=#http>HTTP</a> message, except in the content, as a resource identifier, a Solid <span rel="spec:requirementSubject" resource="#Server">server</span> <span rel="spec:requirementLevel" resource="spec:MUST">MUST</span> convert the URI to an IRI according to the algorithm provided by [<cite><a class="bibref" href="#bib-rfc3987">RFC3987</a></cite>] (<a href="https://datatracker.ietf.org/doc/html/rfc3987#section-3.2">section 3.2</a>), and normalize the resulting IRI to the <a href="#iris-norm">normal form</a> provided in this section.</span></span></p> |
Or do you think that re-using the wording "map" from #server-iris-to-http
would be simpler and used more liberally.
|
||
<ul id="iris-norm"> | ||
<li id="iris-nfc">The IRI is a Unicode string in Normalization Form C (NFC) [<cite><a class="bibref" href="#bib-uax15">UAX15</a></cite>].</li> | ||
|
||
<li id="#iris-unreserved">The IRI does not contain percent-encoding triplets corresponding to <em>unreserved</em> characters.</li> | ||
|
||
<li id="iris-hex">Hexadecimal digits within percent-encoding triplets corresponding to <em>reserved</em> characters are represented using <em>uppercase</em> letters.</li> | ||
|
||
<li id="iris-scheme">The <code>http</code> or <code>https</code> scheme of the IRI is represented using <em>lowercase</em> characters.</li> | ||
|
||
<li id="#iris-host">The host of the IRI is represented using <em>lowercase</em> characters.</li> | ||
|
||
<li id="iris-port">If the port of the IRI is the default port for its scheme, the port subcomponent is left out.</li> | ||
|
||
<li id="iris-path">The path of the IRI does not contain dot-segments.</li> | ||
</ul> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about leaving out the section #iris-norm
? Would conforming to 3987's sections 3.1-3.2 (#server-iris-to-http
, #server-iris-from-http
) suffice?
<ul id="iris-norm"> | |
<li id="iris-nfc">The IRI is a Unicode string in Normalization Form C (NFC) [<cite><a class="bibref" href="#bib-uax15">UAX15</a></cite>].</li> | |
<li id="#iris-unreserved">The IRI does not contain percent-encoding triplets corresponding to <em>unreserved</em> characters.</li> | |
<li id="iris-hex">Hexadecimal digits within percent-encoding triplets corresponding to <em>reserved</em> characters are represented using <em>uppercase</em> letters.</li> | |
<li id="iris-scheme">The <code>http</code> or <code>https</code> scheme of the IRI is represented using <em>lowercase</em> characters.</li> | |
<li id="#iris-host">The host of the IRI is represented using <em>lowercase</em> characters.</li> | |
<li id="iris-port">If the port of the IRI is the default port for its scheme, the port subcomponent is left out.</li> | |
<li id="iris-path">The path of the IRI does not contain dot-segments.</li> | |
</ul> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a good idea. I specifically wrote this out explicitly because [A] the conversion in RFC3987 is not deterministic (it leaves certain choices, which is a one of the big issues opposers like WHATWG have with it), and [B] these steps should be compatible with both RFC3987 and WHATWG URL. Of course, in practices, a single deterministic URI/IRI spec would exists to which we can simply refer, but it doesn't.
|
||
<p><span about="" id="server-iris-norm" rel="spec:requirement" resource="#server-iris-norm"><span property="spec:statement">A Solid <span rel="spec:requirementSubject" resource="#Server">server</span> <span rel="spec:requirementLevel" resource="spec:MUSTNOT">MUST NOT</span> create IRIs that do not conform to this normal form.</span></span></p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a server conforms to 3987's sections 3.1-3.2 (#server-iris-to-http
, #server-iris-from-http
), does it satisfy the #server-iris-norm
requirement? If so, can #server-iris-norm
be removed?
<p><span about="" id="server-iris-norm" rel="spec:requirement" resource="#server-iris-norm"><span property="spec:statement">A Solid <span rel="spec:requirementSubject" resource="#Server">server</span> <span rel="spec:requirementLevel" resource="spec:MUSTNOT">MUST NOT</span> create IRIs that do not conform to this normal form.</span></span></p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As clarified in reply to your other comment, it does not in all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, good stuff! I think this PR captures the essence and needs of the referenced issues. As I understand it, this PR (or in this direction) would satisfy https://www.w3.org/TR/international-specs/#resid_what_to_spec .
It would be good to document some implementation experience. What currently conforms to this? How complex is it implement?
Is there a reason to introduce additional restrictions to WHATWG URL and UTF-8 ( https://www.w3.org/TR/international-specs/#resid_what_to_spec_protocol mentions )?
Yes and no. It is compliant with the therein prefered specs, but our text does not mention WHATWG URL, which the iSpecs document demands. I have not done so because it really does not help clarity at all. |
There are two orthogonal issues here. One is about normalization of identifiers, and other is about allowing iris as identifiers for solid resources. Though it is important to address identifier-normalization, it is not going to be correct/unambiguous/practical to allow iris as identifiers for solid resources. There are many issues. I will specify a few. 1. Incompatible with http & web toolsSolid protocol is a profile of http. Http allows only uris for resources. Thus, as all Solid resources are http resources, their identifiers must also be uris. Otherwise Solid will not be compatible with much of the web and most of the tools. For example any generic link traverser (like rdflib/tpf's/communica/...) don't kmow these random mappings, and directly use uri as the identifier for resource they fetched. Many tools like rdf parsers, jsonld processors use http resource uri as the base uri for processing representations. Thus it is going to be nightmare to deal with all existing generic tools. Basically, if we cross the semantics of http, solid is no longer going to be a profile of it, and none of the http tools work seamlessly. 2. Impossible to specifyEssentially, it is not possible to specify about these mappings with sufficient rigour. I will specify few fundamental issues in proposed spec for example. Proposed:
First of all, resource servers doesn't even recieve a URI for target resource in a http request message. And what server receives as part of message is also varies between http versions. A request essentially contains only request-target in a message in one of allowed forms (asterik-/origin-/absolute-/authority- forms). It is server's work to reconstruct final resource uri using request-target hint and it's own custom routing config, as per its discretion. As rfc 9110 says:
Thus resolved resource uri will never be part of the message, and thus proposed statement has no effect. Along with that, there can be relative uri-references, which are not uris. And the most serious issue is, given a uri in a message (e.g in These issues are just on the surface, and already breaks a lot. Thus it may be good, if we address only uri-normalization restriction part only. |
A somewhat related issue. The WebID 1.0 spec says this :
Would that need to be modified to accomodate IRIs? |
Thanks @damooo, for the elaborate comment, which raises some good questions. I will try to address them all below.
True, there are two issues here; but they are not orthogonal. The choice for URI, IRI and/or WHATWG URL as identifiers is tightly bound with the normalization procedures, since it is the specifications of these identifiers that specify the possible procedures. Compatibility with HTTP
To be entirely correct, there is no such thing as an "HTTP resource" (apart from the trivial fact that HTTP URIs are URIs, which refer to resources). But to address your concerns about compatibility: any resource can be identified by more than one identifier. This is how the IRI specification maintains compatibility with URI: it provides a unique mapping from IRIs to URIs. Every HTTP IRI therefore has an HTTP URI that refers to the same resource.
Given the existance of the mapping, as explained above, this danger is avoided: HTTP programs can simply continue to use the corresponding (mapped) URI. They do not need to know anything about IRI or the IRI<->URI mapping. It is the IRI-capable server that does the mapping upon entry and exit of messages.
As concluded above, this is not true. Moreover, I would contend the idea that "Solid is a profile of HTTP." A specification profile is an extension point defined by the specification itself (see the W3C Note on Variability in Specifications). HTTP, however, does not define such an extensoin point; "HTTP profiles" do not exist. Solid is a separate, standalone specification that uses HTTP as a communication protocol. Given the compatibility of HTTP and IRI, as sketched above, it is thus perfectly possible to combine that with using IRI as an identification scheme. URI<->IRI mapping
What you describe is correct: an HTTP server reconstructs the URI of the target resource based on some algorithm (that we can treat as a blackbox here). However, this does not contradict my proposal (although it points out that it should probably be rephrased): when I say "a URI in an HTTP message", I do mean "the URI of the target resource, as reconstructed by the server." According to my proposal, given that URI, a Solid server should then immediately map it to its corresponding IRI, thereby forming the boundary between the URI-based communication channel (i.e. HTTP) and the IRI-based resource storage (i.e. Solid).
The 'blackbox' algorithm the server applies to identify the target resource will always result in an URI; i.e. the algorithm includes the logic for going from a URI reference to a URI (typically by prepending a base URI). This is described in RFC 9110 § 7.1 Determining the target resource (¶ 2).
This is true, but the proposal only applies to the target URI. Any other URI in the request (Link headers, redirects etc.) is left alone, since they will typically be used in subsequent HTTP requests. It is not the job of Solid to interfere in that. Compatibility with WebIDI will also shortly address @jeff-zucker's question:
No, that is not necessary. Since any URI is also an IRI, WebIDs can still be Solid resources. For some time, WebIDs will then not benefit of the extension of Solid to IRI, but I suspect that eventually WebID will also make the move, so that WebIDs can also include international characters. In any case, using IRIs in Solid will not break existing usage of WebIDs. Hope to have addressed your issues/questions adequately. They shed some missing clarity that I will try to add to the phrasing of the proposal. Thanks for that. |
But since the Solid WebID Profile spec does not define a WebID, rather counts on WebID 1.0 to define it, we would not be able to remain spec compliant if we use IRIs in WebIDs. |
No. We we would only break compliance if we use non-URI IRIs as a WebID. So we simply continue to mint only URIs for WebIDs, until WebID also supports IRIs. This is not an issue. |
That is an issue. Thus, all the http tools determine the uri as the identifier. And they will use that as the base uri, to resolve the resource representations. Thus any relative identifiers in a rdf resource will be resolved against the uri by these clients. While server uses the iri for the resolution. This will soon lead to chaos, as clients also start writing. For a single resource, some places an iri will be used, some places a uri. Some places absolute, some places relative. We cannot deterministically do anything with that data. This is what i meant to be incompatible.
That is an issue. Solid uses link headers to link to other resources (acls, descriptions, storage root, ....). If their identifiers are iris, there is no way to send them in the links, as link headers accept only uris. If we encode as uris and send, then client will deduce the uri as identifier for acls, etc. and uses them when writing statements. While server using iris. Repeat of the above problem. With severe security consequences. |
@damooo, the concerns you raise are interesting questions, but should not pose an issue for the adoption of IRIs.
This is not a problem. I should correct my own clarification though: what I meant with "left alone" was that the Solid spec should not do unnecessary mappings. However, I seem to have implied that thus no mappings should take place except for the target URI. That is not the case.
Resolution is a more complex concern, since i.m.o. IRI resolution in the RDF spec is underdefined. However, the way you frame it misses the point. When I say "HTTP program" I mean the code that performs HTTP requests. Any client that can follows the RDF spec to parse a document knows that RDF contains IRIs rather than URIs, and thus that for resolution and dereference it should perform the necessary mappings. In practice, this is easy, since almost all those clients rely on HTTP libraries following the WHATWG Fetch spec, which can handle requests for any URL (whether it is a URI or an IRI). For any direct request, this suffices: the Solid app knows the base IRI for resolution within RDF documents, and can map to URI for dereferencing. However, I am very glad you brought up this concern around resolution, because there seems to be something missing with regard to redirected requests: when an app is redirected from somewhere, it does not know whether the resource it ends up with originates from a Solid server or not (at least not without additional checks), and thus does not know whether to interpret the
My preference goes out to the last option. What do you think, @damooo? An exampleThe following example describes a scenario using IRIs in Solid, using the base directive as a solution to the resolution issue. To make it somewhat realistic, I assume WebIDs can also be IRIs (which is still an ongoing effort); but one could rewrite the example using any other RDF resource instead. The example: I use SolidRock, a music app built on Solid, and want to share a song with my Ukranian friend Ivan. He is not yet in my contacts, so I copy his WebID IDUA ( Сховище.ua, Ivan's Solid storage provider, then receives the request and maps it back to the corresponding IRI
Furthermore, the Solid server also includes a
Before sending Ivan a notification, the SolidRock app needs to check some capabilities of his Solid server. It extracts the URI from the After learning whatever it needed to know about the server's capabilities, the SolidRock app parses Ivan's WebID document using any Turtle parser. It then queries the parsed RDF data for an Having found the inbox IRI, the SolidRock app directs its fetch library to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed text looks equivalent to https://www.w3.org/TR/rdf12-concepts/#section-IRIs
Since Solid heavily builds on RDF, we should closely follow the latest specifications.
This PR updates the Solid protocol with a normative section on the use of Internationalized Resource Identifiers (IRIs).
It attempts to address the following issues (and possibly other related ones):
The gist of it is as follows:
What I did not (yet) do:
Changes
HMLT Preview | HTML Diff