Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emit only valid N-Quads from toRdf. #284

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Emit only valid N-Quads from toRdf. #284

wants to merge 1 commit into from

Conversation

davidlehn
Copy link
Member

  • Check for valid language format.
  • Check for valid subject, predicate, object, and datatype IRIs.
  • Drop invalid N-Quads.

Unsure if this should be merged as is. I think there are two issues with this:

  • performance: In performance critical code where the input data is already known to be valid this is wasted work. One solution is to add a skipValidation option to toRdf that would just omit all validation checks.
  • error handling: I think silently dropping invalid data is a poor idea. There should be a callback or similar when bad data is found that allows the user to choose to drop it or report errors. This would certainly be useful for debugging, but I imagine would be desired in production too.

@gkellogg
Copy link
Collaborator

@davidlehn I wonder if my update in 804c15b (PR #354) to isAbsoluteRegex relates to this? Can that be used instead of IRIREF_RE?

@davidlehn
Copy link
Member Author

@gkellogg I think it depends how strict the checks should be. I think those regexes I added were derived from a spec. Hard to check how correct they even are! Probably covers way more than is normally used. I'd rather see something that's easier to understand. I'm guessing test data could be constructed to fail that other regex. It's just checking for scheme and non-whitespace. Advice on what is appropriate here is welcome.

@gkellogg
Copy link
Collaborator

It is certainly a balance, but maybe we can consolidate to just one expression.

- Check for valid language format.
- Check for valid subject, predicate, object, and datatype IRIs.
- Drop invalid N-Quads.
Copy link
Member

@dlongley dlongley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need an alternative to this silent "drop" invalid data approach. Digital signature applications need a way to catch that quads have been omitted so that they can throw exceptions. We're working in other areas to ensure we close all gaps where code drops invalid/undefined data without allowing the caller to be made aware of it. so we don't want to introduce any new "silent drop code" that will also need changes.

@gkellogg
Copy link
Collaborator

I agree that the "silently drop ..." wording in the spec is unfortunate. Recall that the spec doesn't have a normative way to show warnings, but I think we use such language elsewhere.

From an implementation perspective, it would be reasonable to have some option that does cause an error if an attempt to emit an invalid triple (typically IRI) is made, and maybe you can suggest some wording for a future version of the spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants