Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

orjson disregards numpy byte order #472

Closed
cjtitus opened this issue Apr 10, 2024 · 3 comments
Closed

orjson disregards numpy byte order #472

cjtitus opened this issue Apr 10, 2024 · 3 comments

Comments

@cjtitus
Copy link

cjtitus commented Apr 10, 2024

When trying to serialize/deserialize numpy arrays, information about endianness is lost, causing data to fail to round-trip gracefully when a big-endian numpy array is passed in.

In [5]: test = np.array([0, 1, 0.4, 5.7], dtype='>f8')

In [6]: orjson.dumps(test, option=orjson.OPT_SERIALIZE_NUMPY)
Out[6]: b'[0.0,3.03865e-319,-1.5423487136693333e-180,-6.065988000073924e66]'

In [7]: s = orjson.dumps(test, option=orjson.OPT_SERIALIZE_NUMPY)

In [8]: test2 = orjson.loads(s)

In [9]: test2
Out[9]: [0.0, 3.03865e-319, -1.5423487136693333e-180, -6.065988000073924e+66]

It would be helpful to check test.dtype.byteorder == ">" and byteswap big-endian numpy arrays so that serialization happens consistently. Or make it explicitly clear in the docs that orjson will not handle this.

@ijl
Copy link
Owner

ijl commented Apr 10, 2024

It should error rather than produce bad output. Is working with an array of big-endian values on a little-endian system a thing people do?

@cjtitus
Copy link
Author

cjtitus commented Apr 10, 2024

This case came up for me when reading data in over a network, so it wasn't related to the endianness of the actual system I was working on, but rather the incoming data, which was then being re-serialized with orjson. As I understand it, some libraries/systems/protocols transport data over the network in a big-endian format.

Mostly, I want to clarify the desired behavior -- in this case, a well-written error would indeed have been more clear than just outputting bytes that then caused more problems later on. I'm planning to do my own endianness check when reading data in, but it would certainly be helpful if orjson did a check as well (of course, serializing it properly is also an option).

@ijl
Copy link
Owner

ijl commented Apr 15, 2024

Ok, I've just had it raise an error in 3.10.1. Thanks for the report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants