Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better support for lists #224

Open
1 task done
theeldermillenial opened this issue Mar 13, 2024 · 2 comments
Open
1 task done

Better support for lists #224

theeldermillenial opened this issue Mar 13, 2024 · 2 comments

Comments

@theeldermillenial
Copy link

Things to check first

  • I have searched the existing issues and didn't find my feature already requested there

Feature description

The decoding mechanism doesn't have a way of differentiating between an indefinite array and a fixed length array such that a round trip decode and encode could be performed without loss of information.

Ideally fixed length arrays are given a different class type than variable length arrays. For example, a fixed length array may be decoded as a tuple or deque while an indefinite array could be assigned the type list.

Use case

In pycardano, cbor is hashed. If an array of length two is defined as an indefinite array (/x9f) rather than an array of length 2, decoding then encoding yields a different cbor result, which gives a different hash. This is problematic when verifying cbor contents.

Since cbor2 does not distinguish between fixed/indefinite arrays, pycardano creates a custom encoder that is used to create an indefinite array when requested (even if the array is smaller than 30 values). However, there is no analogous functionality with decoding, and trying to subclass CBORDecoder will not be straightforward.

Thus, the ideal implementation would be to differentiate between these two encodings by using different classes. One mechanism could be as described, or alternatively dummy list classes could be created to distinguish between the two (e.g. FixedArray and IndefiniteArray, both of which are just lists).

I am happy to implement this in any way @agronholm or any other maintainer would like. Just point me in the right direction. The goal is round trip reproduction of cbor regardless of how the array is encoded.

@T-recks
Copy link

T-recks commented Aug 16, 2024

Are there any updates on this issue? I need a way to encode indefinite length arrays using cbor2.dumps() and can't tell for sure if there's currently any way to do this.

@theeldermillenial
Copy link
Author

@T-recks I do have an open PR on this, but it's currently held up by it causing a divergence from the C extension.

#225

I have not had a chance to come back to it to finish it off. Basically, the request was that I modify the C-extension to make sure that the C-extension and the Python implementation are at parity.

@agronholm Now that I'm thinking about it, there might be a hacky way to make sure the C-extension and the Python implementation are identical. It would involve overloading the C-extension class to contain the dictionary mapper. Would that be acceptable? Or do you only want a modification to the C-extension?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants