- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.2k
gh-99540: Constant hash for _PyNone_Type to aid reproducibility #99541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. | 
f216cd7    to
    0c08cac      
    Compare
  
    | 
 | 
| A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase  | 
| There is some value in making it dependent on the hash secret (PYTHONHASHSEED): 
 The cost of doing that is an extra global variable and a single calculation done once upon setting the hash secret. I think that is a reasonable cost - I'll try writing it that way. EDIT: now that I did, having second thoughts - it's a larger change to CPython, and perhaps people who run their code only on platforms with ASLR disabled today will be surprised by the "non-determinism by default" we'll be introducing here. The hash secret was introduced to negate chosen input complexity attacks on builtin hash sets/maps, which is not relevant here, as None is a singleton. The "fuzzing" use case is secondary. I'm willing to go with either solution. Personally, I will benefit more from the PYTHONHASHSEED dependent hash, but I don't know that it's the right thing to do for general use. | 
c12fc81    to
    961b41f      
    Compare
  
    | Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. | 
| ✅ Deploy Preview for python-cpython-preview canceled.
 | 
961b41f    to
    f154d65      
    Compare
  
    | Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. | 
| I'm waiting with the blurb update until we make a final decision whether to go with a hash-secret-dependent calculation or not. | 
| I have made the requested changes; please review again | 
| Thanks for making the requested changes! @ethanfurman: please review the changes made to this pull request. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do a simple constant hash for None.
| A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How 0xFCA86420 is chosen?
Is it better than random value?
| The constant was chosen basically at whim (that particular value amused me);  it's easier to implement as a constant because if we went with a random number we would have to make it contingent on the hash seed used to randomize  If it was a truly random value (which it currently is if Python was compiled with ASLR), then program results can vary from one run to the next with no way to force them to be the same (since  | 
| 
 I just meant constant random number, like  Since hashtable (dict and set) uses lower bits first, filling some lower bits can reduce hash collision against small constant ints. But this is not a big issue because None is just one singleton. | 
| 
 Just noting that if a 64-bit literal is used, we'd presumably need to add a static check for 32-bit builds, and add a 32-bit "fallback" value for that case. | 
| Good point. Let's just use a 32-bit number instead. | 
| Actually, unless I'm missing something, that is a 32-bit number. | 
| I think that was in response to Inada-san's suggestion of  | 
| Ah, that's what I was missing. I wish we had a reply directly to a comment ability. :-( | 
* main: pythongh-99540: Constant hash for _PyNone_Type to aid reproducibility (pythonGH-99541) pythongh-100039: enhance __signature__ to work with str and callables (pythonGH-100168) pythongh-99830: asyncio: Document returns of remove_{reader,writer} (python#100302) "Compound statement" docs: Fix with-statement step indexing (python#100286) pythonGH-90043: Handle NaNs in COMPARE_OP_FLOAT_JUMP (pythonGH-100278)
* origin/main: (1306 commits) Correct CVE-2020-10735 documentation (python#100306) pythongh-100272: Fix JSON serialization of OrderedDict (pythonGH-100273) pythongh-93649: Split tracemalloc tests from _testcapimodule.c (python#99551) Docs: Use `PY_VERSION_HEX` for version comparison (python#100179) pythongh-97909: Fix markup for `PyMethodDef` members (python#100089) pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890) pythongh-99240: Reset pointer to NULL when the pointed memory is freed in argument parsing (python#99890) pythonGH-98831: Add DECREF_INPUTS(), expanding to DECREF() each stack input (python#100205) pythongh-78707: deprecate passing >1 argument to `PurePath.[is_]relative_to()` (pythonGH-94469) pythongh-99540: Constant hash for _PyNone_Type to aid reproducibility (pythonGH-99541) pythongh-100039: enhance __signature__ to work with str and callables (pythonGH-100168) pythongh-99830: asyncio: Document returns of remove_{reader,writer} (python#100302) "Compound statement" docs: Fix with-statement step indexing (python#100286) pythonGH-90043: Handle NaNs in COMPARE_OP_FLOAT_JUMP (pythonGH-100278) Improve stats presentation for calls. (pythonGH-100274) Better stats for `LOAD_ATTR` and `STORE_ATTR` (pythonGH-100295) pythongh-81057: Move the Cached Parser Dummy Name to _PyRuntimeState (python#100277) Document that zipfile's pwd parameter is a `bytes` object (python#100209) pythongh-99767: mark `PyTypeObject.tp_watched` as internal use only in table (python#100271) Fix typo in introduction.rst (python#100266) ...
| Thank you for this, it allowed to make my project deterministic! I dropped a comment about it there: https://stackoverflow.com/questions/7681786/how-is-hashnone-calculated/75296606#75296606 | 
This way, we don't need the "getPythonNone" function, which was only called once, to create the object. Beginning with Python 3.12, None will have a constant hash anyway. We use this new value. See python/cpython#99541.
This way, we don't need the "getPythonNone" function, which was only called once, to create the object. Beginning with Python 3.12, None will have a constant hash anyway. We use this new value. See python/cpython#99541.
This way, we don't need the "getPythonNone" function, which was only called once, to create the object. Beginning with Python 3.12, None will have a constant hash anyway. We use this new value. See python/cpython#99541.
This way, we don't need the "getPythonNone" function, which was only called once, to create the object. Beginning with Python 3.12, None will have a constant hash anyway. We use this new value. See python/cpython#99541.
| Is this merged into 3.12? There's no mention in the changelog. | 
| 
 The PR did update the blurb, and it appears in  What makes a given PR worthy of appearing in the changelog? | 
Prefix caching makes use of Python's built-in `hash()` function. As of Python 3.12, the behavior of `hash(None)` has changed to be a predictable constant value. This makes it more feasible that someone could try exploit hash collisions. The impact of a collision would be using cache that was generated using different content. Given knowledge of prompts in use and predictable hashing behavior, someone could intentionally populate the cache using a prompt known to collide with another prompt in use. There doesn't seem to be much value to an attacker in doing this, but it's certainly not ideal, and could interfere with the accuracy of results for another user. The invasiveness of this fix should be weighed against the severity of the issue to determine whether this is worth fixing. Using a hashing algorithm that is less prone to collision (like sha256, for example) would be the best way to avoid the possibility of a collision. However, it would have an impact to both performance and memory footprint. An alternative is to continue to use `hash()`, but make it much more difficult to predict the hash value. What we want is that the starting hash value is randomized, which is the behavior we got here prior to Python 3.12. An easy fix is to use a string. Here we use `'None'` to still make it clear we're starting from nothing, but with a string we'll get a different hash value each time vllm runs. Note that within a given run, the value will remain the same. This restores the safer hashing behavior from before. Thank you very much to @kexinoh for reporting this concern privately so that it could be evaluated for its severity prior to our decision to fix this as a security enhancement. The commit that changed this behavior for Python 3.12 is here: - python/cpython@432117c - python/cpython#99541 Signed-off-by: Russell Bryant <rbryant@redhat.com>
Uh oh!
There was an error while loading. Please reload this page.