-
Notifications
You must be signed in to change notification settings - Fork 129
General: Fix loading of unused chars in xml format #2729
Conversation
@iLLiCiTiT does this resolve itself when parsing from a |
Encoding is not an issue in this case. The issue is that one attribute has value with escaped xml value but Example
The |
the only predefined character sequences in xml are:
everything else is illegal except character and entity references:
but: Well-formedness constraint: Legal Character Characters referred to using character references must match the production for Char. And that is defined as:
so So my point is that replacing ampersands is not trivial, you would need to validate the value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am afraid this is not enough, see my comment. This must be solved with character ranges.
To be honest I don't know how. Right now this breaks extract review because xml export from oiiotool put
I'm trying to find values from |
Maybe it makes more sense to rely on |
or just find with regex all character references, parse them and escape it only if it doesnt fit into range for Char - and feed it to xml parser then? |
Only these are valid ranges? |
this is in xml specs. |
Modified to use regex which checks if xml from oiiotool contain valid values and replace ampresand of invalid values. The loaded value matches string value from xml text. These values are metadata that are not needed for us so I would not care about their real value until they're needed.
Is loaded as node with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just add there comment what is a result of this - this will affect all character entities, even the valid one. This fix is quick and dirty, but I can imagine that someone will try in the future get something from these information and he'll hit it?
Co-authored-by: Ondřej Samohel <33513211+antirotor@users.noreply.github.com>
Brief description
Class
ElementTree
in xml parser don't know how to handle all escaped values which cause parse error.Description
Not sure what is proper fix. Propably would be to modify xml parser which is more complicated or define all possible espace values (e.g. from this source). It currently breaks loading of data for some exr.
Changes
&
in some unused xml ampresand characters with&
soElementTree
can parse it