New work item: crate `r2c2_statement` #6

pchampin · 2025-03-19T08:32:10Z

The idea of this crate is to be the first component of the "common API".

It would focus on RDF terms, triples and quads, and would provide

lightweight wrapper types (either defined or imported from utility crates) to guarantee the syntactic validity of some building blocks (IRI, language tags...)
traits for different term types (Subject, Predicate, Object, GraphName)
possibly other smaller traits that would be shared by those above (something like MaybeIri, MaybeLiteral...)

Also, since triple terms will force use to define a notion of Triple, it might make sense to also define Quad in this crate, although this stretches the scope of the crate a little bit. Should we name it instead r2c2_term_statement, which is more accurate, but a little verbose...

Tpt

Thank you! It's definitely the most important goal of our CG but sadly likely one of the trickiest to get right. We need to find a compromise between ease of use and versatility and I fear it won't be easy.

Tpt · 2025-03-19T08:36:46Z

term/src/lib.rs

+//! 1. define or import simple wrapper types for building blocks
+//!    (IRIs, language tags...)
+//! 2. define traits for different kinds of terms
+//!    (Subject, Predicate, Object, GraphName)


This is imho going a bit too much into the "how" direction. It does not sound obvious that these should be traits and not enums.

I'll argue in favor of traits here:

What I aim is to avoid as much data transformation as possible when communicating between two implementations. That's why I try to favor lighweight wrapper types, and traits.

Imagine I want to consume some triples produced by oxttl to canonicalize them with sohpia_c14n. (I'll focus on subjects but of course the same would apply to predicates and objects). If Subject was an enum, I would have to transform the subjects produced by oxttl into that enum. And then sophia_c14n would have to transform this enum again into its own internal representation.

If OTOH Subject is a trait, which the types of oxttl implement, and which sophia_c14n accepts as input, then the data produced by oxttl can be passed directly to sophia_c14n, which then will transform it directly into its own internal representation. That's one transformation less.

On the other side having an enum makes manipulation easier. I tend to think this is a compromise to be done when we know more about how we represent IRIs/blank nodes/... and should not be set in stone at the beginning of this work item.

I'm happy to defer this discussion, the goal was not to set anything in stone. I've just pushed a commit to clarify that the proposed design was just an example.

Thank you! Perfect!

pchampin · 2025-03-19T08:45:57Z

Thank you! It's definitely the most important goal of our CG but sadly likely one of the trickiest to get right. We need to find a compromise between ease of use and versatility and I fear it won't be easy.

Agreed. I tried to not be too specific in the PR, but on the other hand, keeping things too abstract make them without substance. I don't think it would make sense to agree an a very abstract work-item if we don't have some agreement on what it will contain.

But of course, we don't need to figure out all the details up-front.

Tpt · 2025-03-19T08:49:14Z

I don't think it would make sense to agree an a very abstract work-item if we don't have some agreement on what it will contain.

Yes! What about something in the line of "It would provide types to encode and manipulate RDF concepts like IRI, blank node, literal, term and triple", making the scope clear while leaving the struct vs trait undefined?

Should we name it instead r2c2_term_statement, which is more accurate, but a little verbose...

I would tend to prefer r2c2_model in the line of RDF/JS DataModel or r2c2_concepts in the line of RDF concepts & abstract syntax. I agree that Quad is likely in scope.

pchampin · 2025-03-19T09:06:18Z

Re. terminology:

I consider, maybe wrongly, that "type" encompasses "struct" and "enum" (as well as atomic types), but not "trait". I believe this is consistent with the use of the use of the keyword type in Rust, but I can see how traits are a kind of (higher level) types as well.
I would expect a crate named r2c2_model or r2c2_concepts to also include the notion of Graph and Datatype, which is not the goal here. That's why I didn't go for that. r2c2_foundation ?

term/src/lib.rs

Add comment to clarify that the proposed design can be challenged.

pchampin · 2025-03-24T14:53:55Z

I would expect a crate named r2c2_model or r2c2_concepts to also include the notion of Graph and Datatype, which is not the goal here. That's why I didn't go for that. r2c2_foundation ?

thinking a little more about this... r2c2_statement would also work, IMO. I would understand that a crate named "statement" also includes the building blocks of statements (i.e. terms), while the opposite sounds like scope creep.

pchampin · 2025-03-24T15:39:38Z

@Tpt

On the other side having an enum makes manipulation easier.

I've been giving this more thoughts, and I believe that there is a way to have the best of both words (traits and enums). More specifically, a Subject trait would provide one main method (let's call it subject_info() as working title), whose result would be a lightweight enum similar to oxrdf::SubjectRef -- and similarly, of course, for other traits Predicate, Object.

That method enum providing everything there is to know about the subject (resp. predicate, object), any other method that the traits may provide could have a default impl based on the result of subjer_info. So implementers would generally only need to implement that one method to implement the trait.

see #6 (comment)

in addition to defining the core traits and types, it proposes 2 proof-of-concept implementations, for oxrdf and rdf_types, (behind the feature gate 'poc_impl') and demonstrate interoperability between the two by testing roundtripping of both implementation via the other

Significant changes

Tpt · 2025-04-21T15:47:38Z

Thank you so much for pushing this.

Some major pain points I have with it as a starting point:

It contains what I consider to be details like the ground method. Imho this should be the topic of a v2 after we get a first design ready and is definitely not something we should have in a starting point that enables fast iteration.
It enforces data structures for Literal/IRI/LangTag with validation. Imho this should not be part of a basic interoperability crate, implementations might want to be more or less lenients. The starting point should only be a set of traits and enum without significant algorithms in it. This is an interop crate, not an implementation. But happy to be challenged on it.

pchampin · 2025-04-23T12:41:17Z

It contains what I consider to be details like the ground method. Imho this should be the topic of a v2 after we get a first design ready and is definitely not something we should have in a starting point that enables fast iteration.

Absolutely agreed. The ground methods were mostly here as an example of additional methods that could be provided (as a convenience for users) with default implementation (as a convenience for implementers). Which methods are or are not included there is indeed to be discussed later.

I'm happy to comment out the ground method for the moment.

It enforces data structures for Literal/IRI/LangTag with validation. Imho this should not be part of a basic interoperability crate, implementations might want to be more or less lenients. The starting point should only be a set of traits and enum without significant algorithms in it. This is an interop crate, not an implementation.

I hear your point, but I still have mixed feelings about this...

But happy to be challenged on it.

Here we go :)

If IRIs in R2C2 (the same reasoning applies to language tags) did not provide any guarantee of validity, it would mean that

there would never be any cost for producers: just ship whatever you have as an R2C2 IRI ;
there would always be a cost for consumers: alwats check the IRIs that you get, you never know.

This is not ideal, in particular because many producers will actually produce valid IRIs (hopefully!), but consumers will still need to check them every time.

With the proposed design:

lenient producers must pay the cost of checking their data before (Iri::new)
conservative producers can still ship whatever they have without any additional cost (Iri::new_unchecked)
lenient consumers can accept whatever they get without checking them, confident that it satisifies RFC3987
conservative consumers must pay the cost of checking their additional constraints

As you can see, the sweet spot is for implementations following Postel's law: conservative producers and lenient consumers. The burden of additional checks is taken only by the implementations who depart from Postel's law.

For this to work, we need to ensure that the guarantees provided by R2C2 correspond to the MUSTs in the spec, nothing more, nothing less. That's why, for example, blank node labels are not constrained (while several implementations, including mine, constrain them to be valid SPARQL bnode labels).

pchampin · 2025-06-11T18:19:53Z

I just had a long discussion about this with @labra and an idea came up to hide all validation code behind a feature gate:

without the feature, the wrapper types provided by the API would only have a new_unchecked method; the crate would therefore remain very lean, but the responsibility of producing valid data would be entirely left to the user
with the feature, the wrapper types would also provide a new method that would perform some validation.

In my story above, strict producer would probably use the crate without the feature (they only need the new_unchecked method), while lenient producers will enable it in order to use the new method.

@Tpt would that be an acceptable middle-ground for you?

Tpt · 2025-06-17T16:35:22Z

@pchampin Sorry for the answer delay. This sounds much better. However, I am still a bit scared that mixing the two features in the same crate will make versioning more painful: it is likely we will want the traits crates to be as stable as possible whereas it is more fine for implementations to see breaking changes

pchampin · 2025-06-18T12:55:18Z

@pchampin Sorry for the answer delay. This sounds much better. However, I am still a bit scared that mixing the two features in the same crate will make versioning more painful: it is likely we will want the traits crates to be as stable as possible whereas it is more fine for implementations to see breaking changes

that's a very valid point, thanks.
Another option, then, would be to have two crates:

r2c2_statement contains traits and wrapper types but no validation code (only new_unchecked consructors for wrapper types)
r2c2_statement_validation defines extension traits for the wrapper types, providing the "validating" constructors (new)

This way, we could keep the versioning of the API independant from the versioning of the validation code.

layout for a new work-item 'term'

Loading
Loading status checks…

a343dbc

pchampin added the new-work-item label Mar 19, 2025

Tpt reviewed Mar 19, 2025

View reviewed changes

Tpt previously approved these changes Mar 19, 2025

View reviewed changes

pchampin commented Mar 19, 2025

View reviewed changes

term/src/lib.rs Outdated Show resolved Hide resolved

pchampin changed the title ~~New work item: crate r2c2_term~~ New work item: crate r2c2_statement Apr 21, 2025

pchampin added 2 commits April 21, 2025 16:32

rename 'r2c2_term' to 'r2c2_statement'

cac0c53

see #6 (comment)

New work item: crate r2c2_statement #6

Are you sure you want to change the base?

New work item: crate r2c2_statement #6

Conversation

pchampin commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Tpt left a comment

Choose a reason for hiding this comment

Uh oh!

Tpt Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

pchampin Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

Tpt Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

pchampin Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

Tpt Mar 19, 2025

Choose a reason for hiding this comment

Uh oh!

pchampin commented Mar 19, 2025

Uh oh!

Tpt commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pchampin commented Mar 19, 2025

Uh oh!

Uh oh!

Uh oh!

pchampin commented Mar 24, 2025

Uh oh!

pchampin commented Mar 24, 2025

Uh oh!

Uh oh!

Tpt commented Apr 21, 2025

Uh oh!

pchampin commented Apr 23, 2025

Uh oh!

pchampin commented Jun 11, 2025

Uh oh!

Tpt commented Jun 17, 2025

Uh oh!

pchampin commented Jun 18, 2025

Uh oh!

New work item: crate `r2c2_statement` #6

New work item: crate `r2c2_statement` #6

pchampin commented Mar 19, 2025 •

edited

Loading

Tpt commented Mar 19, 2025 •

edited

Loading