-
-
Notifications
You must be signed in to change notification settings - Fork 30.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pickle INT
/LONG
base discrepancy
#126992
Comments
INT
base discrepancyINT
/LONG
base discrepancy
I think that using base 10 would be reasonable here. |
I'll make a pull request then |
I think it's an incompatible change. Why not adjust pickletools instead? BTW, I see no failing tests in #127042. Probably we should add some regression tests. |
No pickles produced by |
Yes, this compatibility break doesn't look too severe. After all, I'm not sure that many projects use the version 0 protocol. On another hand, it's really easy to not introduce backward-incompatible changes. |
Yes, and IMO it's a good change. In practice, no one should be impacted since everybody should use the Python module to serialize to pickle, and in this case, you cannot get a base different than 10 in the serialized files. |
I thought about using hexadecimals for LONG. This would fix the issue of quadratic complexity for converting long to/from string. But it was not supported in all implementations, so it would be a breaking change, while the purpose of the LONG opcode is compatibility with old Python versions and non-Python implementations. |
The CPython implementation always accepted non-decimal input for INT and LONG (except short period from 12 April to 15 May in 1996). It can be considered a hidden CPython feature. I do not think it is an issue if the CPython implementation is more lenient than the specification (which was written after the implementation and still not complete). |
@tim-one, this may be interesting to you, as it relates to both Using hexadecimal representation for long integers would be more efficient in terms of time and size. But it is less efficient than using pickle protocol 1 or higher. And not all third-party implementations (for different programming languages) support hexadecimals. |
Bug report
Bug description:
The
INT
opcode in pickle is theI
character followed by an ASCII number and a newline. There are multiple comments asking if the base should be explicitly set to 10, or kept as 0. However, a discrepancy exists between pickle implementations:_pickle.c
usesstrtol(s, &endptr, 0);
with a base of 0, meaning0xf
would succeedpickle.py
usesint(data, 0)
with a base of 0, meaning0xf
would succeedpickletools.py
usesread_decimalnl_short()
, which callsint(s)
, meaning any non-decimal base would failThis same inconsistency exists with the
LONG
opcode:_pickle.c
pickle.py
pickletools.py
This means an attempt to disassemble a pickle bytestream using
pickletools
would fail here, while the actual unpickling process would proceed undisputed.Personally, I don't really care whether all implementations are changed to base 10 or base 0 (
save_long()
only puts it in decimal form), but I think it should be consistent across all implementations. I'd submit a pull request for one way or the other, but I'm not sure which way you'd prefer it.Also as a note, the pickle bytestream
b'I0001\n.'
(INT
with the argument0001
) fails inpickle.py
because having leading 0s in a number with base 0 causes an error. Note that no errors are thrown in_pickle.c
because it usesstrtol
orpickletools.py
because it doesn't have base 0 specified. If we keep the implementation as base 0, that discrepancy betweenpickle.py
and other pickle implementations would stay, whereas if we change it to base 10 (aka remove base 0), that inconsistency would also go away. ForLONG
, bothpickle.py
and_pickle.c
fail withb'L0001L\n.'
, butpickletools.py
has no problem displaying that number (since it has no base specified).CPython versions tested on:
3.11
Operating systems tested on:
Linux
Linked PRs
The text was updated successfully, but these errors were encountered: