You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
mmo opened this issue
Jul 5, 2022
· 1 comment
· Fixed by #908
Labels
bugBreaks something but is not blockingf: dataAbout data model, importation, transformation, exportation of data, specific for bibliographic datap-HighTo set a high priority!
When a document record contains character encoding problems, caused for instance when the cataloguer enters abstracts or other metadata by copying-pasting from PDF files, this affects OAI-PMH behaviour. Every PMH request that includes that record will fail with a server error:
All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters
Improvement suggestion
The editor prevent to submit non authorised characters or, ideally, automatically correct it.
Alternative:
OAI-PMH requests should not fail due to character encoding problems in a single record. Records should be checked for character encoding problems. Possible approaches are (1=worst ... 4=best):
During the OAI-PMH response: check each record for encoding problems and exclude it from the response, if needed
During the OAI-PMH response: check each record for encoding problems and automatically sanitize it, if needed, before including it in the response
During record creation: automatically sanitize the record before saving
During record creation: issue an error and prevent the record to be created (ckeck server-side/client-side implications)
The text was updated successfully, but these errors were encountered:
pronguen
added
bug
Breaks something but is not blocking
f: data
About data model, importation, transformation, exportation of data, specific for bibliographic data
p-High
To set a high priority!
and removed
enhancement
Enhancement of an existing feature
labels
Jul 5, 2022
pronguen
changed the title
Make OAI-PMH responses more robust against bad character encodings
The editor should prevent bad character encodings
Aug 8, 2022
* Adds new `safety` exceptions.
* Removes controls chars when the dublin core xml file is produced.
* Closesrero#867.
Co-Authored-by: Johnny Mariéthoz <Johnny.Mariethoz@rero.ch>
jma
added a commit
to jma/sonar
that referenced
this issue
Nov 16, 2022
* Adds new `safety` exceptions.
* Removes controls chars when the dublin core xml file is produced.
* Closesrero#867.
Co-Authored-by: Johnny Mariéthoz <Johnny.Mariethoz@rero.ch>
bugBreaks something but is not blockingf: dataAbout data model, importation, transformation, exportation of data, specific for bibliographic datap-HighTo set a high priority!
How it works
When a
document
record contains character encoding problems, caused for instance when the cataloguer enters abstracts or other metadata by copying-pasting from PDF files, this affects OAI-PMH behaviour. Every PMH request that includes that record will fail with a server error:Improvement suggestion
The editor prevent to submit non authorised characters or, ideally, automatically correct it.
Alternative:
The text was updated successfully, but these errors were encountered: