Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitespace collapse on datatypes in XSD foils bi-directional conversion (Metaschema M3) #67

Closed
wendellpiez opened this issue Sep 2, 2020 · 1 comment
Labels
bug Something isn't working wontfix This will not be worked on

Comments

@wendellpiez
Copy link
Collaborator

Describe the bug

Probably related to the bug described in #66.

In the generated XSDs, a property whiteSpace is set to collapse on datatyped values. This provides whitespace normalization which (among other things) makes values valid to patterns that would otherwise be invalid due to whitespace. This was detected on a UUID value that passes XSD validation even when it has trailing space.

Since the conversion utility does not collapse whitespace, a value (with whitespace) that is not actually valid to the datatype's lexical pattern, appears in the result. This causes problems downstream, for example when the data is cast into JSON, where its schema shows it is invalid.

Who is the bug affecting?

Edge cases with sloppy data in datatyped values - which makes it especially annoying since it only happens sometimes.

What is affected by this bug?

Potentially anyone, especially anyone relying on bidirectional conversion.

When does this occur?

In any datatype now marked with <whiteSpace value='collapse'/> in its definition.

Note however that for the markup-line datatype, which converts to Markdown, we may wish to provide whitespace normalization -- although the problem there might be the reverse, if whitespace that persists in XML mixed content, is stripped in Markdown. More testing is in order.

How do we replicate the issue?

Try and validate a document with a datatype value (such as a UUID or boolean) with extra whitespace. It should be valid in XSD, but when converted to JSON, it won't be valid to the corresponding JSON Schema (which does not provide for collapsing).

Expected behavior (i.e. solution)

Data with extraneous whitespace should not validate against datatypes whose patterns do not permit it, in XSD.

Bidirectional conversion of all data, including pathological cases, should work.

Other Comments

This needs to be addressed in both Metaschema M3, and M4.

Also, unit tests for schema validation and data conversion should provide some edge cases of data invalid only due to whitespace anomalies (extra whitespace not permitted by the given pattern).

@wendellpiez wendellpiez added the bug Something isn't working label Sep 2, 2020
wendellpiez added a commit to wendellpiez/metaschema that referenced this issue May 21, 2021
wendellpiez added a commit to wendellpiez/metaschema that referenced this issue May 21, 2021
david-waltermire pushed a commit that referenced this issue May 21, 2021
* Addressing datatype validation issues: whitespace collapsing; non-empty values; ncname-workalike in JSON Schema - see usnistgov/OSCAL#911  usnistgov/OSCAL#805 also #33 #67 #68
* Improvements to XSD production; fully aligning 'token' datatype across XSD and JSON Schema implementations.
david-waltermire added a commit that referenced this issue Jun 6, 2021
* Rework of docs focusing on JSON docs and model pipeline
* Improvements to composition toolchain
* Fixed a few small bugs in the metaschema-check. Improved performance of the compose pruning using an accumulator.
* Moved edge-case samples into testing directory
* Made shadowing warning a warning
* Initial commit of an Oxygen Metaschema framework.
* Creation of new compose schematron unit tests.
* Cross-linking XML and JSON syntax pages and other improvements to links
* Now building XML and JSON indexes to reference pages, with links to steps
* Reconfigured docs pipeline (XSLT entry points); adding new files including pipeline steps
* Migrating schema generation tools to new/improved composition pipeline
* Addressing usnistgov/OSCAL#902 thanks for finding this bug
* Enhancements to JSON Schema definition (with better performance too)
* Adding support for json-base-uri as a metaschema property
* Updated JSON schema $id; factoring out common docs XSLT
* Fixing IDs in JSON schema per issue usnistgov/OSCAL#933.
* Addressing datatype validation issues: whitespace collapsing; non-empty values; ncname-workalike in JSON Schema - see usnistgov/OSCAL#911  usnistgov/OSCAL#805 also #33 #67 #68
* Improvements to XSD production; fully aligning 'token' datatype across XSD and JSON Schema implementations.
* Updating bidirectional XML/JSON converter generators (#143)
* Committing a version that handles test data correctly (so far) from rebuilt metaschema composition addressing #51 #53 #76
* Now displaying constraints in documentation at point of definition;
* Docs generation revamp Reworked reference and other pages to sketch - #128 and others

Co-authored-by: Wendell Piez <wendell.piez@nist.gov>
@david-waltermire david-waltermire added the wontfix This will not be worked on label Mar 10, 2022
@david-waltermire
Copy link
Collaborator

See #68. This will be fixed against the M4 implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants