Skip to content
This repository has been archived by the owner on Jun 29, 2022. It is now read-only.

CBOR: Data model serialization in Rust #323

Open
prataprc opened this issue Oct 20, 2020 · 22 comments
Open

CBOR: Data model serialization in Rust #323

prataprc opened this issue Oct 20, 2020 · 22 comments

Comments

@prataprc
Copy link
Contributor

    .-----------------.    .------------.    .--------.
    | Serialized data |<-->| Data-model |<-->| Schema |
    ˙-----------------˙    ˙------------˙    ˙--------˙

This writeup focus on the data-model definition in Rust language
and its (de)serialization in CBOR format. Draft implementation can be
found here,

Data model

enum Kind {
    Null,
    Bool(bool),
    Integer(i128),
    Float(f64),
    Text(String),
    Bytes(Vec<u8>),
    Link(Cid),
    List(Vec<Kind>),
    Dict(BTreeMap<String, Kind>),
}
  • Null, Bool, Integer, Float, Bytes, Text, Link are the scalar
    kinds.
  • List and Dict are recursive kinds.
  • Integer, is represeted as signed 128-bit number, and overflow/underflow are
    treated as errors.
  • Float is always represented as 64-bit at the data-model layer.
  • Text, is utf8 encoded string.
  • Link, is Content Identification. Here Cid type is defined as enumeration
    over cid-versions, hence it can be considered future proof, atleast
    from the point of serialization and de-serialization.
  • List, is a hetergenous collection of kinds.
  • Dict, support only utf8 encoded String as key.

Additionally,

  • Kind is a sized definition whose memory footprint is known at compile time.
  • Do not implement equality operation due to the presence of floating-point.

Cbor serialization

This section is essentially going to carve out a subset of CBOR specification
that is required to have as much completeness and as much fittedness
as possible, to say it IPLD parlance.

  • Major-type-0, full compatibility.
  • Major-type-1, full compatibility, since we are using i128 we can still
    represent -(unsigned-64-bit-number + 1) that comes on the wire.
  • Major-type-2, only length prefixed encoding supported.
  • Major-type-3, only length prefixed encoding supported.
  • Major-type-4, only length prefixed list.
  • Major-type-5, only length prefixed map, supports only string-keys, with
    strict sorting order for map-entries.
  • Major-type-6, only tag 42 is used for IPLD links (cid).
  • Major-type-7, only following simple-values are supported,
    • Null, denotes that the expected value is nullable.
    • True, False, for representing boolean kinds.
    • F32, deserialization is supported, internally converted to double-precision.
    • F64, both serialization and deserialization is supported.
    • And all other simple-value are left un-supported.

A note on recursion, recursive type naturally fit recursive encoding and
decoding implementation. But this can lead to stack overflow issue if
data-model values are recursively composed of large number of list and dict
values. Similar problem exists when deserializing incoming wire-data.
To avoid this, either we have to convert the recursive encoding and
decoding implementation into a loop, or we may have to add cap on recursion
depth.

Open points:

  • Since string is utf8 encoded, and there is collation spec for unicode,
    we have an option of doing byte-sort or confirming to collation spec.
  • Does it make sense to have a separate f32 and f64 for float-kind ?
  • Looks like serialization and deserialization is not isomorphic across codec.
    even if same codec is used but implemented by different language, can
    it be isomorphic ?
@prataprc prataprc changed the title Cbor: Data model serialization in Rust CBOR: Data model serialization in Rust Oct 20, 2020
@rvagg
Copy link
Member

rvagg commented Oct 20, 2020

F32, deserialization is supported, internally converted to double-precision.
&
Does it make sense to have a separate f32 and f64 for float-kind ?

You could consider dropping everything but double precision, even on decode. Our ideal is to have just one way, although our current decoders are quite sloppy in what they expect. The JavaScript encoder currently does a smallest-possible float encode while Go does an always 64-bit. We're moving to the latter (as per the recent revision to the DAG-CBOR spec) and we don't anticipate there's a whole lot of DAG-CBOR in the wild with floats anyway so hopefully this isn't a pain for anyone.

My advice on building a new DAG-CBOR implementation is to make it minimal and strict and see how far it gets you before it breaks on real-world data. I would hope that you wouldn't find any breakage and as we move our other encoders to a stricter form the odds will just improve over time.

The i128 might be overkill too? You only get 64-bits in CBOR anyway so maybe it's best to have the API make that clear?

@vmx
Copy link
Member

vmx commented Oct 20, 2020

Your Data Model aligns with the one outlines here https://github.com/ipfs-rust/rust-ipld/blob/a2b6f30631bfd45a0ffdc78a9aa7e12c2b809f1e/core/src/ipld.rs#L8. The only difference is naming one things String or Text. Though I kind of like the name Text as this aligns well with how I see that type.

@prataprc
Copy link
Contributor Author

https://github.com/ipfs-rust/rust-ipld/blob/a2b6f30631bfd45a0ffdc78a9aa7e12c2b809f1e/core/src/ipld.rs#L8.

@volker thanks for this reference. Actually thanks for the many references ;)

I am kind of making this quick draft implementation to understand the specs in finer details. Other than that nothing much.

@prataprc
Copy link
Contributor Author

You could consider dropping everything but double precision, even on decode.

In situation similar to IoT, where small embedded devices are expected to communicate using DAG-CBOR, do we assume double-precision float on those devices. It might be okay to expect low-end devices falling short in double-precision support, but an adequately powered computing node, ignoring single-precision might be a bad excuse. Other than this I prefer float-64 only design.

before it breaks on real-world data.

Do we have real-world data, say from goland, that we can work with ?

The i128 might be overkill too?

Yes, I definitely feel it is a overkill. Actually thanks for bringing it up. Found this in the CBOR spec,

Major types 0 and 1 are designed in such a way that they can be
encoded in C from a signed integer without actually doing an if-then-
else for positive/negative (Figure 2). This uses the fact that
(-1-n), the transformation for major type 1, is the same as ~n
(bitwise complement) in C unsigned arithmetic; ~n can then be
expressed as (-1)^n for the negative case, while 0^n leaves n
unchanged for non-negative. The sign of a number can be converted to
-1 for negative and 0 for non-negative (0 or positive) by arithmetic-
shifting the number by one bit less than the bit length of the number
(for example, by 63 for 64-bit numbers).

From the above description we understand that, if both Major-type-0 and Major-type-1 are to be used we are encouraged to use it as signed-64. And may be, if only Major-type-0 is to be used it is left to the application choice.

Major type 1: a negative integer. The encoding follows the rules
for unsigned integers (major type 0), except that the value is
then -1 minus the encoded unsigned integer. For example, the
integer -500 would be 0b001_11001 (major type 1, additional
information 25) followed by the two bytes 0x01f3, which is 499 in
decimal.

But the above description that comes as part of the main part of the spec. is not clear about this. May be it is the bane of designing a serialization spec., we have to keep it a little open-ended.

I will book this as a TODO for now.

@rvagg
Copy link
Member

rvagg commented Oct 21, 2020

Yeah, some of these details get a bit too in-the-weeds for the data model and we have to be careful not to drive the data model according to one codec's quirks (or one language's quirks!). It's a juggling act of compromise everywhere. If you read https://github.com/ipld/specs/blob/master/block-layer/codecs/codecs-and-completeness.md carefully you'll see that basically nothing we have now is up to standard, and in some places it's not even clear what the standard is (let's not even get in to data model map "ordering" questions ...).

@warpfork
Copy link
Contributor

warpfork commented Oct 21, 2020

As a tree of bullet points, forgive me if it's terse:

  • I'd suggest making Kind an enum (an actual enum, not a sum type), and making a Node "interface" ("trait", in Rust, I guess? I'm not sufficiently oxidized to know exactly how this should fly) that is the container for actual values.

    • The reason this is important is that you might want to have more than one memory layout that implements the Node methods.
      • E.g., if you do some stuff with IPLD Schemas later, you'll maybe have some types where you do custom memory layouts, but that have macros attached to them that spell out all the Node methods so other parts of the IPLD library can work on them through that interface. (Or maybe the macro generates the whole type specification, too. Same idea.)
      • If you want to have Advanced Data Layouts, those should also be accessible in a uniform way through this interface. E.g. you might have a HAMT ADL, and it should be interrogatable like any other Data Model map.
    • Maybe there's also another type that's an "enum" (not an enum, a sum type -- rust, sigh) that looks like this one you've got here -- but I'd make it also "just" an implementation of Node (even if it's used as a default) , not make it the centerpiece itself. So it would be something like the basicnode implementation in golang -- it's just one (common, but not particularly special) Node implementation.
    • This probably implies you don't get compile time fixed sizes out of Node. I'm afraid this is probably the trade you want to make, though. As much as I like compile time performance, transparent operation over ADLs and macro-produced Schema types is cooler.
      • (Maybe there's still a way to have both the interface for things that want to use it, and also compile-time fast options for things that know the concrete types? We've been able to do a fair amount of this dualism in golang. No idea how it might translate to Rust though, so I wish you good luck :))
  • I feel like the BTreeMap showing up in your enum should also probably not be quite that concrete, for similar reasoning as the whole above point about making Node implementable by many things. Probably some sort of interface/trait that just says Map in the abstract?

    • I'm not familiar enough with Rust stdlib, but I feel like a similar comment probably goes for that Vec. (Some people have implemented chunkable "big" arrays to be used as ADLs -- you'll wanna be able to read those through some consistent interface, and they're definitely not something you'll want to eagerly load into one in-memory thing.)
  • I'd suggest you'll have a library that has better compatibility with data out in the wild if you use a string type more like ffi.String or something that also supports non-utf-8 bytes.

    • You may have seen us going back and forth about this a bit on irc lately :)
    • If you also take the Node suggestion above: that might give you a nice way to handle this: Node could have an AsString "unboxing" method that returns a (utf-8 strictly) String (but also has to return an error, instead, potentially, if the string contained other non-utf-8 bytes). It can simultaneously have an AsFFIString method (which doesn't need an error return option, since it's always going to work).
  • Speaking of returning errors on some "unboxing" methods: you might find this design issue from the golang project interesting: API design question: Should Node.AsFoo methods return error? go-ipld-prime#43

    • tl;dr: previously we had a lot of error returns on our node unboxing methods. We're going to move to panics for most of them, though: in particular, the "unboxing method called on wrong Kind" errors will become panics. No idea how this might conceptually translate to Rust error handling sensibilities! Just food for thought :)
      • EDIT to note: we actually ended up closing that issue without carrying it into action. We turned out to want those errors as returns more often than we though; see final comment. We're still working on adding ease-of-use layers around this interface, instead (because the error returns do definitely get high-friction).
  • Sorting in the dagcbor codec:

    • definitely do not use unicode collation.
      • CBOR has their own (somewhat weird) little sorting spec, and it sorts based on bytes. We said (long ago) that dag-cbor does the same. (We're not even necessarily happy with this choice, but that ship is sailed way, way, WAY over the horizon.)
      • More generally (but with "IMO" disclaimer attached): collation has been a bugs-and-surprises farm for decades, most visibly in the land of databases, despite the best efforts of uncountable very bright people: I think it may actually just be a bad idea that should be left alone.
    • I'd recommend making a conditional switch for sorting strictness at all. Other people will probably argue with me on this too, but data exists in the wild that isn't sorted, and if you write an implementation that's strict about this, it won't be able to handle that data.
    • So the sweet spot I'd aim for is: if you want to make your library sort newly created data by default, go for it. But it's probably best to make sure it can both parse and re-emit data in other orderings too.
  • Recursive types and decoding: yeah, resource budgets are good :) I've just been doing a bunch of work on that stuff.

    • I'd suggest building a budget system that starts with some fixed amount of budget, and just decrements that as it goes. This defends you from messages that ask for a single absurdly-sized byte field, and messages that ask you for absurd recursion, and messages that do a million-laughs style "i want list of a thousand elements of lists of a thousand elements of lists of a [...]" resource attack -- all three at the same time.
  • CBOR and ints: gosh, I didn't even realize that was ambiguous. I've gone with positive ints are always encoded as Major-0 and negative ints are always encoded as Major-1. Negative ints should never be found in Major-0.

    • Sounds like we may need to add a clarification line about that to the dag-cbor spec! Thanks for pointing this out :D

@prataprc
Copy link
Contributor Author

making a Node "interface" that is the container for actual values.

Trait in rust is type-system-only concept. But there is something called trait-object that "kind of matches" with interface{}. But trait-object has some limitations.

you might want to have more than one memory layout that implements the Node methods.

I am yet to study the higher-level abstractions in IPLD. But I think I get it.

Or maybe the macro generates the whole type specification.

Sounds like a good idea.

AsString "unboxing" method that returns a (utf-8 strictly) String (but also has to return an error, instead, potentially, if the string contained other non-utf-8 bytes). It can simultaneously have an AsFFIString

+1

previously we had a lot of error returns on our node unboxing methods. We're going to move to panics for most of them, though:

Rust error handling is cooler than most other language. With some glue logic, we can chain error across components and only noise we might see in our functional logic is the ? operator. Having said this, rust's stdlib follow the advice of "use panic where ever it make sense". Though, I had this habit of using Result and err_at!() all the time, I remove them later for better looking API/code. The code change can be safely done.

definitely do not use unicode collation.

+1


Overall I agree to the idea of keeping Kind as an abstract type. Now the implementation has two choice,

a. use type-parameters and define traits that match with the operational behaviour of Kind.

enum Kind<V, M>
where 
    V: Node,
    M: Node,
{
    Null,
    Bool(bool),
    Integer(i128),
    Float(f64),
    Text(String),
    Bytes(Vec<u8>),
    Link(Cid),
    List(V),
    Dict(M),
}

b. using trait-objects.

Leaning towards (a) now. Also keeping Text as String, String's underlying value can be either checked-utf8 or unchecked binary. API can be added on top of kind for utf8 checks, and other type-conversions.

I think I might have to re-visit all this after studying more on IPLD. Thanks for the comments.

@prataprc
Copy link
Contributor Author

prataprc commented Oct 22, 2020

My bad, didn't realize that I will end up with HKT issue with (a), that is, parameterize Kind<V,M> over V, and M. Will have to try (b). But that might mean we have to forego type-parameters on methods in Node.

@prataprc
Copy link
Contributor Author

After looking at the Node interface{}, able to understand why Kind was suggested to be simple-enumeration and not a sum-type.

/// Kind of data in data-model.
pub enum Kind {
    Null,
    Bool,
    Integer,
    Float,
    Text,
    Bytes,
    Link,
    List,
    Map,
}

IPLD basic data-model can be implemented as,

/// Basic defines IPLD data-model.
pub enum Basic {
    Null,
    Bool(bool),
    Integer(i128), // TODO: i128 might an overkill, 8 more bytes than 64-bit !!
    Float(f64),
    Text(String),
    Bytes(Vec<u8>),
    Link(Cid),
    List(Box<dyn Node + 'static>),
    Map(Box<dyn Node + 'static>),
}

which is now truly a recursive-type, where sub-trees are implemented as Node.

list/map lookup type (key) is implemented as sum-type, this way we can keep it simple and at the same time give scope for future additions.

/// A subset of Basic, that can be used to index into recursive type, like
/// list and map. Can be seen as the path-segment.
#[derive(Clone)]
pub enum Key {
    Null,
    Bool(bool),
    Offset(usize),
    Text(String),
    Bytes(Vec<u8>),
}

Key type implement Eq and Ord traits to address the IPLD's operational behaviour.

Nevertheless, Cbor type still holds on to Vec<Cbor> for list and BTreeMap<String, Cbor> for map. Hope that should be okay.

Finally Node interface{} is defined in Rust as:

/// Every thing is a Node, almost.
pub trait Node {
    /// return the kind.
    fn to_kind(&self) -> Kind;

    /// if kind is recursive type, key lookup.
    fn get(&self, key: &Key) -> Result<&dyn Node>;

    /// iterate over values.
    fn iter<'a>(&'a self) -> Box<dyn Iterator<Item = &dyn Node> + 'a>;

    /// iterate over (key, value) entry, in case of list key is index
    /// offset of value within the list.
    fn iter_entries<'a>(&'a self) -> Box<dyn Iterator<Item = (Key, &dyn Node)> + 'a>;

    /// if kind is container type, return the length.
    fn len(&self) -> Option<usize>;

    fn is_null(&self) -> bool;

    fn to_bool(&self) -> Option<bool>;

    fn to_int(&self) -> Option<i128>;

    fn to_float(&self) -> Option<f64>;

    fn as_string(&self) -> Option<Result<&str>>;

    fn as_ffi_string(&self) -> Option<&str>;

    fn as_bytes(&self) -> Option<&[u8]>;

    fn as_link(&self) -> Option<&Cid>;
}

@warpfork
Copy link
Contributor

I'm probably going to have to look at that repeatedly, but to first glance, I think that looks like it'll fly!

@warpfork
Copy link
Contributor

The Key type idea is really interesting. I'm not sure how it'll go. It's definitely not the same as we've done in golang, but it might work well.

Key probably shouldn't have Null in the sum... I think we've declared those outlawed in the the data model. (Because it would be "mixed kind keys" to have that -- e.g. string-or-null -- and we didn't want to deal with it.) (Possible that the data model spec needs clarification on this, though.)

One more thought I hadn't mentioned earlier in this issue, but realized during other discussions today might be interesting/important... in IPLD Schemas, there is the ability to specify maps with typed keys and typed values. So, in golang, this ended up with us having the MapIterator types having return signiture of (key Node, value Node, maybeIOerror Error) -- the interesting bit being that keys are actually Node cropping up yet again. Having Node here lets us keep a hold of any schema type information when we pass that value around!

The features IPLD Schemas introduced for typed keys go even further: schemas can have maps with complex keys, as long as they fit certain rules. Namely: you can use even structs and unions as map keys on the condition that they have representation strategies which can reduce them to strings (because that means we can still serialize them, when we get that data all the way to the bottom of the stack). So again, having map iterators use Node for keys turns out pretty consequential for this -- it lets us keep even those complex keys intact and typed as we pass them around. (One could, I think, maybe also make this work by just to-string'ing complex keys, and telling the user to "deal with it" if they wanted to use that information later again in a typed context as their complex form (the struct or union or whatnot). But it probably wouldn't be pretty and it probably wouldn't be fast.)

@prataprc
Copy link
Contributor Author

Having Node here lets us keep a hold of any schema type information when we pass that value around!

Any pointers, say in goland, where to find the schema type information ? May be with some real examples I can understand this better.

@prataprc

This comment has been minimized.

@prataprc
Copy link
Contributor Author

prataprc commented Oct 23, 2020

Keeping the Kind, Basic types as it is from the earlier comment, Key type and Node trait is updated as follows.

pub enum Key {
    Bool(bool),
    Offset(usize),
    Text(String),
    Bytes(Vec<u8>),
    Keyable(Box<dyn ToString>),
}

Note that we are making the ToString as a hard contraint for Keyable types. Concrete types (struct, union) can implement the to_string() api and can choose to memoize it for repeated calls.

pub trait Node {
    fn as_key(&self) -> Option<Key>;
    ...
}

@vmx
Copy link
Member

vmx commented Oct 23, 2020

Please note that the "strings" @warpfork mentions can contain arbitrary bytes. So in Rust it's a Vec<u8> and not a String. Also the current Data Model spec allows for arbitrary bytes in what you call Text.

@prataprc
Copy link
Contributor Author

@vmx Thanks for bringing it up.

If Text(String) needs to be Text(Vec<u8>), can we use Bytes(Vec<u8>) ?

The way I understood:

  • While reading from codec, if codec calls it as string, read it is as string but don't check for utf8 compliance or any other checks.
  • Provide two API, one to check and return the previously-assumed-string as utf8-validated-string. Second API is to return the ffi.string as it is and let application (higher layers) validate it as per need.
  • While reading the Map keys, read it as utf8-validated-string, if the codec calls it as string. In future we can add support for other kinds as well.

@vmx
Copy link
Member

vmx commented Oct 23, 2020

If Text(String) needs to be Text(Vec<u8>), can we use Bytes(Vec<u8>) ?

I don't understand that question.

  • Provide two API, one to check and return the previously-assumed-string as utf8-validated-string. Second API is to return the ffi.string as it is and let application (higher layers) validate it as per need.

Yes. I think this is how it would need to be done. I had hoped the Data Model could just define Strings being valid Unicode.

  • While reading the Map keys, read it as utf8-validated-string, if the codec calls it as string. In future we can add support for other kinds as well.

No, those always need to be bytes. There is currently an ongoing discussion on how to exactly specify it in a clean way, but those should really be bytes.

@alexjg
Copy link
Contributor

alexjg commented Oct 23, 2020

@warpfork Correct me if I'm wrong here, my understanding is that the primary reason for the Node interface in Go is to allow minimal allocations whilst still allowing deserialization to many implementing types (the 'different memory layouts' you mention)?

I'm not certain that this is necessary in Rust, we could take some inspiration from serde's zero copy deserialization, which achieves the same goal (I think) via lifetimes on the Deserialize trait. I personally feel like it would be worth investigating as I think the Serde style "implement Deserialize<MyObject>" idiom would be much more familiar to Rust programmers and more amenable to automation via custom derive macros.

@vmx
Copy link
Member

vmx commented Oct 23, 2020

I'm not certain that this is necessary in Rust, we could take some inspiration from serde's zero copy deserialization, which achieves the same goal (I think) via lifetimes on the Deserialize trait

I think its similar to what rust-ipld is doing here with the Encode and Decode trait:
https://github.com/ipfs-rust/rust-ipld/blob/a2b6f30631bfd45a0ffdc78a9aa7e12c2b809f1e/core/src/codec.rs

The DAG-CBOR implementation using those traits is here: https://github.com/ipfs-rust/rust-ipld/tree/a2b6f30631bfd45a0ffdc78a9aa7e12c2b809f1e/dag-cbor/src

You go straight from codec to native types an back.

@prataprc
Copy link
Contributor Author

Yes. I think this is how it would need to be done. I had hoped the Data Model could just define Strings being valid Unicode.

I am using std::str::from_utf8_unchecked() so I guess it is similar to Text(Vec<u8>). On second thought, Text(Vec<u8>) feels better because Basic is a pub enum and users might directly match on Text(String) which is bad.

If Text(String) needs to be Text(Vec<u8>), can we use Bytes(Vec<u8>) ?

Right now as_string() and as_ffi_string() do a positive match for Text variant. Should we also check for Bytes variant ?

match self {
    Basic::Text(val) => Some(err_at!(FailConvert, from_utf8(val.as_bytes()))),
    Basic::Bytes(val) => Some(err_at!(FailConvert, from_utf8(val.as_slice()))),
    _ => None,
}

While reading the Map keys, read it as utf8-validated-string, if the codec calls it as string. In future we can add support for other kinds as well

No, those always need to be bytes. There is currently an ongoing discussion on how to exactly specify it in a clean way, but those should really be bytes.

I am guessing the key is going to come from upper layers, say via a path-segment. I might visualize that as,

path-segment-key map-key
Arbitrary Arbitrary
utf8-bytes Arbitrary
Arbitrary utf8-bytes
utf8-bytes utf8-bytes

Note that in the above table we can replace utf8-bytes with 64-bit-uint-big-endian-bytes, but the situation is the same, which is - Can we guarantee consistent key index into the same map-value for all 4 combinations ?

We need a precise definition for the key so that incoming key and the map-keys fall within the same value-domain, aka type, aka kind, like utf8-bytes or u64-big-endian-bytes. And when we are dealing with mixed value-domain, we may have to define a sort order across value-domains (if ordering/iteration is to be implemented on map).

Now, it is also possible to have something called collation-encoding which is a form of encoding multiple-types into a seamless collection of bytes. Please note that I am not assuming map as ordered here, just that there is a collation algorithm across multiple kinds (especially those that are defined by data-model) that not only give point-lookup into map, but also can maintain map as an ordered set, if that is required, and it is byte-represented, memcmp-compatible and isomorphic.

@vmx
Copy link
Member

vmx commented Oct 23, 2020

BTW: I just re-read the CBOR spec about major types. The CBOR type for strings (major type 3, text strings) must be encoded as UTF-8. So even if the IPLD Data Model currently supports arbitrary bytes, CBOR does not. So in case you only care about CBOR encoding it might change things.

@warpfork
Copy link
Contributor

@warpfork Correct me if I'm wrong here, my understanding is that the primary reason for the Node interface in Go is to allow minimal allocations whilst still allowing deserialization to many implementing types (the 'different memory layouts' you mention)?

I'm not certain that this is necessary in Rust, we could take some inspiration from serde's zero copy deserialization, which achieves the same goal (I think) via lifetimes on the Deserialize trait. I personally feel like it would be worth investigating as I think the Serde style "implement Deserialize<MyObject>" idiom would be much more familiar to Rust programmers and more amenable to automation via custom derive macros.

So, part of the reason for having a Node interface is for those various "go fast" specializations to be possible, yes, but it's not the only one. We also use that interface for generalized traversals (and thus for higher level features like pathing, and more powerfully, Selectors), and we also expect it to be implemented by ADLs (thus getting pathing to work transparently over them "for free", etc), and even the Schema stuff implements it twice -- once for the type-level semantics and separately for the representation-level view of the data.

I don't know exactly how some of this will translate to Rust, because I hear that traits are not quite the same as interfaces in other languages. But it's hard for me to imagine getting good foundations to build towards these higher level features without having some kind of a unifying interface or contract like this, in some way or another.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants