Additional value types? #8

Simran-B · 2016-02-17T18:52:03Z

A rather random list of things that came to my mind:

sets (ordered/unordered), unique values only
bags (unordered), as opposed to arrays, which are ordered bags (non-unique and ordered)
sorted variants of sets and bags (?)
enums (symbols, bitfields, ...)
union structs (?)
NaN (invalid number) / N/A (known to be missing its value) / undefined
UTF-16 encoded text, which may use up less memory/space/bandwidth for certain content (Chinese text for instance), but add a cost for (de-)serialization, because JSON requires text encoding to be UTF-8

neunhoef · 2016-02-17T22:40:33Z

First comment: There are not many type bytes left for extensions. Furthermore, the specification is already complex enough for my taste. Therefore I would only be in favour of additions if they bring a lot of additional value and cannot easily be emulated with the available types (or custom types).
I think that arrays can be used for sets (ordered/unordered) and bags, so IMHO this does not give enough reasons to add further complications.
Enums would have to be declared somewhere, VelocyPack is intentionally schema-free, so there is no good place to declare them. One can use integers in most applications.
union structs : I do not understand, since an object is something like a struct and unions do not seem to make sense here.
NaN: we do have IEEE double, which has cases for NaN and infinity and the like. I do not think we should have a separate type for this.
UTF-16 does same some space for certain textual content. I would think that this is an edge case (I know, there are many chinese people!) and UTF-8 isn't too bad for this. In case of need one can always stick the text into a binary blob.

Sorry for being against these suggestions, but I think we always have to keep in mind that every single additional type makes the implementation for another language more complicated.

Simran-B · 2016-02-17T23:05:39Z

Thanks for the detailed reasoning! More types would mean more work indeed and I think pretty much everything can be modeled with already specified types (fuzzy dates being my personal favorite).

Sorted sets seemed interesting to me, in particular in imports, because the type could signal the DBMS that the data doesn't need to sorted after import (because it already is) and that no duplicate values are to be expected. But that's not very useful I guess, and it even moves structural concerns to the data type level, which belong on the document level to stay schema-free at the DB level (like with enums).

I did not know IEEE double could handle NaN and Infinity. Doesn't it also support +0 and -0?

About unions: it's more about how you access the data, not how it's stored actually... I realize that now. I think you would use a binary block of data on the DB level and interpret it in different ways on the application level if necessary.

Regarding UTF-16: non-ansi people probably form the majority, but yeah, blob is always an option and the space savings might not be that large after all - because characters from the 1-byte range (whitespace, inter punctuation, digits, ...) are frequently used in texts with mostly 2-4 byte characters and they may weigh out the differences.

That said, there's probably no type really missing in the VPack specs!

Simran-B · 2016-03-11T00:23:00Z

Some additional data types available in CBOR (all UTF-8 strings):

URI
base64url
base64
RegExp
MIME message

Three more random thoughts:

VPack data split across multiple files - convention to name continuous files? (similar to split rar archives, .rar, .r00, .r01, ...)
Embedding of meta-data (dropped on conversion to JSON?), such as VPack version, date and time of creation, license / restrictions, integrity hash
Subtree hashes for fast deep-equality checks?

Simran-B mentioned this issue May 27, 2016

JavaScript 'Infinity' as a document / edge field value arangodb/arangodb#1839

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional value types? #8

Additional value types? #8

Simran-B commented Feb 17, 2016

neunhoef commented Feb 17, 2016

Simran-B commented Feb 17, 2016

Simran-B commented Mar 11, 2016 •

edited

Loading

Additional value types? #8

Additional value types? #8

Comments

Simran-B commented Feb 17, 2016

neunhoef commented Feb 17, 2016

Simran-B commented Feb 17, 2016

Simran-B commented Mar 11, 2016 • edited Loading

Simran-B commented Mar 11, 2016 •

edited

Loading