Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed length encoding of integer numbers #106

Open
Pique7 opened this issue Jan 5, 2025 · 4 comments
Open

Fixed length encoding of integer numbers #106

Pique7 opened this issue Jan 5, 2025 · 4 comments

Comments

@Pique7
Copy link

Pique7 commented Jan 5, 2025

I've already asked a similar question (#101 (comment)).

Is there a concept in CBOR to encode integers in a fixed byte length (e.g. BYTE, INT16, INT32, ...)?

I haven't found anything helpful yet. Perhaps it would make sense to define a CBOR tag for this purpose, but I am not a low level programming expert. So I can't say what needs to considered when defining this kind of (primitive) data type.

Background of this issue is the need for a CBOR data structure whose length is not changed when an integer value is altered.

@cabo
Copy link
Contributor

cabo commented Jan 6, 2025

Is there a concept in CBOR to encode integers in a fixed byte length (e.g. BYTE, INT16, INT32, ...)?

CBOR is a data representation format.
The answer to that literal question is that you can just encode things in one of the many ways you want.
But you bring up changing a data item, so you seem to have a process in mind where an implementation receives data, processes it in some way, and emits it, all without "changing he encoding" (which may not be as well defined as one may think).
The implementation would need to know about the forms the modified data item can take (e.g., you mention int32, numbers from which are represented using two different major types).
It could simply assume that the encoding received needs to be exactly the encoding to be emitted again, but this doesn't tell the implementation what value range is actually acceptable (e.g., is the value allowed to become negative? Can we use all the 32 bits plus sign, which is more than int32 provides?).

Usually, information of this kind is represented in a data definition (or schema).
Note that if the implicit encoding constraint given in the received data item is not sufficient, some encoding information may need to be in the data definition.
CDDL is not designed to address representation issues, but could be extended to do so.

@Pique7
Copy link
Author

Pique7 commented Jan 6, 2025

The answer to that literal question is that you can just encode things in one of the many ways you want.

I just wanted to stick to any standard or pre-defined regulation if existing.

But you bring up changing a data item, so you seem to have a process in mind where an implementation receives data, processes it in some way, and emits it, all without "changing he encoding" (which may not be as well defined as one may think).

In my current case I just want to keep an (external) index of pointers to items within a CBOR structure. It would be easier if changing a number value somewhere in the CBOR data would not lead to a change of the byte length.

Actually I have found a workaround meanwhile, but accidently posted it in wrong thread:
The idea is to utilize a Typed Array (https://www.rfc-editor.org/rfc/rfc8746.html#name-typed-arrays) containing exactly one element, for example:

# EDITED: incorrect example
D8 46            # tag(70)
   81            # array(1)
      00000078   # uint32, little endian

What I've tried here is to encode a single unsigned 32-bit integer number.

# correct example:
D8 46            # tag(70)
   44            # bytes(4)
      78000000   # uint32, little endian

@cabo
Copy link
Contributor

cabo commented Jan 6, 2025

> D8 46            # tag(70)
>    81            # array(1)
>       00000078   # uint32, little endian

This doesn't parse:

D8 46    # tag(70)
   81    # array(1)
      00 # unsigned(0)


##### 3 unused bytes after the end of the data item:

00 00 78

Tag 70 is also defined to contain a byte string (and that in little-endian), so you probably want to use:

D8 46          # tag(70)
   44          # bytes(4)
      78000000 # "x\u0000\u0000\u0000"

Generally, the idea to use tagged arrays in place of specific data items is useful; the only limitation is that you no longer can see whether an array was actually intended.

In this case,

1A 00000078 # unsigned(120)

does the same (the number is actually in big-endian).

@Pique7
Copy link
Author

Pique7 commented Jan 6, 2025

Yes, sorry, my example was not correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants