-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPIKE] irIIIFService: Fix bug with encoding encoded strings #150
Comments
There is explicit encoding happening here:
Introduced here: And here: To address some sort of dspace situation (based on the branch name containing "dspace" in it). See also:
|
@kaladay and @qtamu Adding this here just for science. One thing that I thought about last week and wanted to try but couldn't as I don't have access to FCREPO on dev and pre was to see if Fedora can even have a URI that is not URL encoded. My interpretation of RDF Concepts is that this shouldn't be allowed, but who knows if Fedora and DSPACE follow the specification closely enough to say for sure. To test the idea, I was thinking we should directly target the problematic URI in question, delete it, and then reinsert an unescaped URI. The request will likely fail altogether as the SPARQL update wouldn't even be valid with the URI unescaped, but it would be interesting to see if it does in fact pass. How to Test on Dev with an Existing ResourceCreate a file called
Do a curl request via shell on local host like so:
You'll need to also pass username and auth and both can be found in environmental variables in Rancher dev. What I think will happen when we run this?
|
I would note that a Cantaloupe issue mentions Percent encoding problems here: |
Other links regardding URIs and percent encoding in Cantaloupe: |
The example script as described above fails like this: # bash curl-update_ru.curl
Encountered " "<" "< "" at line 7, column 23.
Was expecting one of:
<IRIref> ...
<PNAME_NS> ...
<PNAME_LN> ...
<BLANK_NODE_LABEL> ...
<VAR1> ...
<VAR2> ...
"true" ...
"false" ...
<INTEGER> ...
<DECIMAL> ...
<DOUBLE> ...
<INTEGER_POSITIVE> ...
<DECIMAL_POSITIVE> ...
<DOUBLE_POSITIVE> ...
<INTEGER_NEGATIVE> ...
<DECIMAL_NEGATIVE> ...
<DOUBLE_NEGATIVE> ...
<STRING_LITERAL1> ...
<STRING_LITERAL2> ...
<STRING_LITERAL_LONG1> ...
<STRING_LITERAL_LONG2> ...
"(" ...
<NIL> ...
"[" ...
<ANON> ... |
In my SPARQL update, we are attempting to insert a URL as a URI rather than a Literal. By enclosing the value in In RDF, a binary file, like an RDF resource, must always be a URI (not a literal). This is because the file identifies a specific resource (the binary file), and in RDF, URIs are used to uniquely identify resources. What this shows is that at least with update / patch, fcrepo is saying you can't have spaces that are unescaped in a file. I think we should take this one step further and also show on dev that we can't do this with a post. If we can, whatever writes to fcrepo is potentially a problem that must be accounted for, but I'm confident that it will be (famous last words). |
However, in the Java, using: String rdf = restTemplate.getForObject(url, String.class); It instead fails with:
The code is returning: throw new NotFoundException("RDF not found! " + url); Rather than providing the actual error message from the response. A curl to that URL, returns data: @prefix premis: <http://www.loc.gov/premis/rdf/v1#> .
@prefix ns022: <http://avalonmediasystem.org/rdf/vocab/common#> .
@prefix ns021: <http://avalonmediasystem.org/rdf/vocab/derivative#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix ns020: <http://avalonmediasystem.org/rdf/vocab/encoding#> .
@prefix ns004: <http://purl.org/dc/elements/1.1/relation:> .
@prefix ns003: <http://digital.library.tamu.edu/schemas/> .
@prefix local: <http://digital.library.tamu.edu/schemas/local/> .
@prefix ns024: <http://purl.org/dc/elements/1.1/title:> .
@prefix ns002: <http://www.openarchives.org/ore/terms#> .
@prefix ns023: <http://schema.org/> .
@prefix xsi: <http://www.w3.org/2001/XMLSchema-instance> .
@prefix ns001: <http://pcdm.org/models#> .
@prefix ns008: <http://purl.org/dc/elements/1.1/subject:> .
@prefix ns007: <http://purl.org/dc/elements/1.1/identifer:> .
@prefix xmlns: <http://www.w3.org/2000/xmlns/> .
@prefix ns006: <http://purl.org/dc/elements/1.1/identifier:> .
@prefix ns005: <http://purl.org/dc/elements/1.1/rights:> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix ns009: <info:fedora/fedora-system:def/model#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix fedoraconfig: <http://fedora.info/definitions/v4/config#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix authz: <http://fedora.info/definitions/v4/authorization#> .
@prefix test: <info:fedora/test/> .
@prefix ns011: <http://avalonmediasystem.org/rdf/vocab/collection#> .
@prefix ns010: <http://bibframe.org/vocab/> .
@prefix ns015: <http://projecthydra.org/ns/relations#> .
@prefix ns014: <info:fedora/fedora-system:def/relations-external#> .
@prefix ns013: <http://projecthydra.org/ns/auth/acl#> .
@prefix ns012: <http://www.w3.org/ns/auth/acl#> .
@prefix ns019: <http://www.openarchives.org/ore/terms/> .
@prefix ns018: <http://avalonmediasystem.org/rdf/vocab/master_file#> .
@prefix ns017: <http://avalonmediasystem.org/rdf/vocab/transcoding#> .
@prefix ns016: <http://avalonmediasystem.org/rdf/vocab/media_object#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix fedora: <http://fedora.info/definitions/v4/repository#> .
@prefix ebucore: <http://www.ebu.ch/metadata/ontologies/ebucore/ebucore#> .
@prefix ldp: <http://www.w3.org/ns/ldp#> .
@prefix iana: <http://www.iana.org/assignments/relation/> .
@prefix xs: <http://www.w3.org/2001/XMLSchema> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
<https://api-dev.library.tamu.edu/fcrepo/rest/3b/6f/c3/25/3b6fc325-f6ca-41d8-b91e-8c5db3be8c13/basbanes-exhibit-texts-todd-magpietest_objects/17/pages/page_0/files/blumberg-holiday%20card_1.jpg>
rdf:type ns001:File ;
rdf:type fedora:Binary ;
rdf:type fedora:Resource ;
dc:filename "blumberg-holiday card_1.jpg"^^<http://www.w3.org/2001/XMLSchema#string> ;
fedora:lastModifiedBy "fedoraAdmin"^^<http://www.w3.org/2001/XMLSchema#string> ;
premis:hasSize "537844"^^<http://www.w3.org/2001/XMLSchema#long> ;
ebucore:hasMimeType "image/jpeg"^^<http://www.w3.org/2001/XMLSchema#string> ;
fedora:createdBy "fedoraAdmin"^^<http://www.w3.org/2001/XMLSchema#string> ;
fedora:created "2024-10-02T17:01:15.365Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
premis:hasMessageDigest <urn:sha1:5be16973c518fb173ee604d64616ac1f082dfb36> ;
fedora:lastModified "2024-10-02T17:01:15.365Z"^^<http://www.w3.org/2001/XMLSchema#dateTime> ;
ebucore:filename "blumberg-holiday card_1.jpg"^^<http://www.w3.org/2001/XMLSchema#string> ;
rdf:type ldp:NonRDFSource ;
fedora:writable "false"^^<http://www.w3.org/2001/XMLSchema#boolean> ;
iana:describedby <https://api-dev.library.tamu.edu/fcrepo/rest/3b/6f/c3/25/3b6fc325-f6ca-41d8-b91e-8c5db3be8c13/basbanes-exhibit-texts-todd-magpietest_objects/17/pages/page_0/files/blumberg-holiday%20card_1.jpg/fcr:metadata> ;
fedora:hasParent <https://api-dev.library.tamu.edu/fcrepo/rest/3b/6f/c3/25/3b6fc325-f6ca-41d8-b91e-8c5db3be8c13/basbanes-exhibit-texts-todd-magpietest_objects/17/pages/page_0/files> ;
fedora:hasFixityService <https://api-dev.library.tamu.edu/fcrepo/rest/3b/6f/c3/25/3b6fc325-f6ca-41d8-b91e-8c5db3be8c13/basbanes-exhibit-texts-todd-magpietest_objects/17/pages/page_0/files/blumberg-holiday%20card_1.jpg/fcr:fixity> . Note how this has a space: dc:filename "blumberg-holiday card_1.jpg"^^<http://www.w3.org/2001/XMLSchema#string> ; This URL fails:
But the error message printed uses this URL:
Which is misleading and a regression around the error message. |
irIIIFService encodes a URI string even if it's already encoded. This results in a 404 for manifest generation if a percent for an encoded character is in the value (it encodes the encoded char with another
%25
.#149
#139
Acceptance Criteria
Manifests for files with spaces are generated as expected.
The text was updated successfully, but these errors were encountered: