Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Py3.13 #1212

Closed
wants to merge 14 commits into from
Closed

Py3.13 #1212

wants to merge 14 commits into from

Conversation

Thrameos
Copy link
Contributor

This is a horrible set of hacks required to keep JPype running after the recent changes in the Python type system. They have brutally made it impossible to replicate their allocators by changed all to static or inline with no way to access the memory model. Finding a pattern that worked for the latest version took me several weekends (which is unfortunately time I don't have.) Python 3.13 introduces inline dictionaries to objects which meet certain criteria which basically blows away the last good place to allocate the memory that we require. Worse there is no API to tell how much memory had been tacked on so it was nearly hopeless. Fortunately I found a way to disable the misbehavior by disabling the flag after the type was allocated.

This version will not run with the debug version of Python. They have placed assertions that guard against this type of manipulation. But I don't see any other alternatives.

Fixes #1204

@Thrameos Thrameos requested a review from marscher August 26, 2024 06:37
Copy link

codecov bot commented Aug 26, 2024

Codecov Report

Attention: Patch coverage is 70.83333% with 7 lines in your changes missing coverage. Please review.

Project coverage is 87.17%. Comparing base (66f8c6c) to head (edeafa9).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
native/python/pyjp_value.cpp 69.56% 5 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1212      +/-   ##
==========================================
- Coverage   87.21%   87.17%   -0.05%     
==========================================
  Files         113      113              
  Lines       10281    10283       +2     
  Branches     4065     4060       -5     
==========================================
- Hits         8967     8964       -3     
- Misses        718      723       +5     
  Partials      596      596              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@marscher
Copy link
Member

Thank you so much for finding this hack @Thrameos. We should really talk to the Python guys to not further change/break the type system for us. You said in another thread, that you suggest to propose a PEP for this - which sounds like a huge effort.

@vstinner Dear Victor could you please comment on that?

@marscher
Copy link
Member

@Thrameos could you please allow me to push to your fork so I can add Python 3.13 to the build matrix?

@vstinner
Copy link
Contributor

PEP 697 – Limited C API for Extending Opaque Types was implemented in Python 3.12. Can't you use this API for your needs?

@Thrameos
Copy link
Contributor Author

Thrameos commented Aug 27, 2024

No, that api misses because it adds to basicsize and thus causes an object conflict when used with long, float, exception and string objects. I made a replacement API in the PR on Thrameos/CPython that meets all the requirements. But due to work life issues I can't write a PEP (ended up overseeing a project with visibility all the way up the management chain accidentally.)

It is a trivial matter to change PEP 697 to allocate in the preheader and it would be a great stepping stone to the unified system as you can immediately merge it with managed dict and inline dict they have been adding. I only wish I had time to contribute.

PS this is very low hanging fruit. I can send you a one page writeup on how it fits into the object layout system in Python and how you can even use this to increase the speed of instances hookups to order 1 for all Python objects (not just the bit flagged ones). One just has to have the time to write some docs and get it through acceptance phase and you can be a significant contributor to Python.

Thrameos/cpython#1

@vstinner
Copy link
Contributor

cc @encukou

@encukou
Copy link

encukou commented Aug 28, 2024

@vstinner

They have brutally made it impossible to replicate their allocators by changed all to static or inline with no way to access the memory model.

What do you expect me to say here?
Please stop breaking backwards compatibility. It hurts projects like this.
If these changes were necessary they should at least be more widely discussed.

@vstinner
Copy link
Contributor

@encukou:

What do you expect me to say here?

Is there any API or can we design an API fitting JPypy design/use case? I don't know well PEP 697, since I cannot well it doesn't fit here. You might want to have a look.

It would be nice to avoid the "Horrible kludge because python lacks an API for allocating a GC type with extra memory" :-)

@encukou
Copy link

encukou commented Aug 28, 2024

Is there any API or can we design an API fitting JPypy design/use case?

Well, yes -- see @Thrameos's patches.
PEP 697 does target a more limited use case.
But sadly I don't know much about the design and plans for object preheaders and allocators; as far as I know, those aren't discussed/documented publicly. @markshannon or @mdboom should know more about those.

@Thrameos
Copy link
Contributor Author

Thrameos commented Aug 28, 2024

@vstinner to be fair, jpype has always worked by making a copy of the PyType_GenericAllocator (or its earlier equivalent that called _PyObect_GC_Alloc) which was the only reference I could find on how to use the allocator/finalizer slot. As the only know reference called private fuctions to operate that meant Jpype was misbehaving by duplicating it with the same private calls.

Suddenly someone started making all those functions nonexported or making the structure needed to replicate them obfuscated without replacing it with an API, meaning they didn't consider what those functions were doing and effectively made __alloc__ into a private slot as the only way to use it was to call malloc which means no GC and as they are now adding both preheader data (gc header, weakref, managed dict) and post data (inline dict) all through private calls for which the size is defined entirely through private structures, the entire use of the allocate slot has now been deprecated silently. (At the same time PyObject_Size no longer worked on all varobjects so other parts of the contract broke as well.)

@encukou has been so kind as to reach out and try to resolve the issue but ended up missing as he attempted to place it in the basicsize which causes conflicts (which is the reason jpype use post and prespace in different versions in the past.) It should have been a requirement for the API that it was tested against every concrete type in Python but I missed it when reviewing. Further, it was very unfortunate timing that when we looked to address the defect, the only time I had was my last vacation in December and enucukou wasn't available at that time so it pushed me into a 10 month window of nonavailabilty. I can try again next November but until then I will be a pumpkin.

Solutions at this point:

  • Fix 697 so that there is an API in which the space will be allocated somewhere managed and can pre retrieved (and it can be used for both managed dict and online concepts so less total complexity and ensure it doesn't get silently broken) . Test by applying to every concrete type. (Don't just make those it doesn't work on final as that will break APIs)
  • Expose the private methods so that someone can exactly call the same process as PyType_GenericAlloc. Which means a different version of code for every version of Python. Horrid but a solution until an actual API appears.
  • Expose usable public methods the alloc gc with user specified pre and size (currently gc_alloc in gc.c), methods to compute the offset pre and post (so someone can look up how much space to call the allocator with and retrieve the ends so they can access the address (must be O(1) and not some huge routine best if cache slot for presize and postsize) Postsize is a pain because ob_size has been abused/reused by PyLong such that PyObject_Size no longer works.) THere just needs to be PYtype_Presize, PyType_Sive and PyType_Alloc. Note the alloc slot is doing way more than just alloc at this point as it is not just giving blank memory but instead calling init and filling out the inline memory section. So the alloc method has to replicate all that or the PyObject_Init method needs to with a contract alloc must call init. (Currently true but with private versions)

I would recommend doing BOTH 1 and 3. If something is a public slot like alloc/finalize there should be some reference on how to use it somewhere. If it is impossible to use without calling private functions, setting header mode to full private access, and copying 3 static Python functions into the module with 8 hours of code auditing the guts of the undocumented innards Python (which is how much time it took me to locate, read, copy to a notes page, document, and construct an interaction tree to understand what 3.13 does) then those slots are useless. 697 by itself wouldn't fully address jpype needs even if it hit as jpype requires a matching finalized to clear the private memory, thus for my 7 special uninheritable types I would likely have used the allocator directly anyway for speed knowing that I can't create conflicts.

Jpype is a language binding meaning it had far more constraints than other modules having to play nice with multiple gigantic APIs and often completely contradictory language models. It is thus only natural that it needs to implement concrete primitive objects at the same level as Python primitives. Python developers can make my life a whole lot easier if they realized that the same tools (functions and size of layouts) that they use to make their primitives have to be exposed. The only things that they can hide without implications are the organization of memory layouts and those functions directly accessing the layouts. Everything else like allocators needs public APIs that are fast enough to be used internally so we are playing in the same field. As it stands Python guts abuse the hell out of private parts with no modularity such that it has a whole header system only for things inside the tent. If it is really to be hidden safely, it should have been hidden entirely in the C file that defines it (Yes I know speed demand otherwise.) Concrete primitive heap types like JPype creates, currently requires throwing rocks through the pretty stain glass windows of the cathedral as the doors have all been barred.

PS my Samsung phone autocorrect seems love randomly correcting programing words which given my dyslexia I can't see. I apologize if the resulting text is less than coherent.

@Thrameos
Copy link
Contributor Author

Thrameos commented Aug 28, 2024

Here is the needed API...

/* Get the total memory in bytes required to be allocated before the object head 
 * including gc structure, managed dictionaries, inline dict, user defined managed 
 * memory, etc
 */
Py_ssize_t PyType_Presize(PyType* type);

/* Get the required memory size in bytes for an object of a given type containing 
 * header, basicsize,  and items.  For historical reasons objects always add one to the 
 * nitems when allocating if the types itemsize is greater than 0.
 */
Py_ssize_t PyType_Size(PyType* type, int nitems);

/* Allocate and init a type with specified memory, pure and size must be sufficient to 
 * hold the specified object layout.  User is responsible for setting ob_size if variable items.
 * If bytes request for the presize does not match PyType_Presize then finalizer must 
 * properly specify specify the presize to PyType_Finalize.
 */
PyObject* PyType_Alloc(PyType* type, Py_ssize_t pre, Py_ssize_t size);

/** Deallocates memory created with PyType_Alloc.  Requires the offset to the head of the 
 * memory as a header.  0 will automatically compute the offset.  
 */ 
PyObject* PyType_Finalize(PyObject* obj, Py_ssize_t presize);  // PyObject_Finalize??

/* (optional) Get the end of the objects layout in bytes including basicsize and items.
 */
Py_ssizet PyObject_Sizeof(PyObject* obj);

PyType_GenericAlloc should call these 3 methods. The Python developers can change their layout however they see fit as much as they want. Users implementing their own allocators/finalizer can add their memory in front or behind the object as they need. Sizeof is hard to support as ob_size is often abused thus adding to the fixed preobject area is easier.

Note the finalizer would have to tell where the head of the object is for free. This has been the big headache for JPype trying to add to the front forcing us to use the back as adding arbitrarily to the front cases it to free the wrong address. Thus this may end up API breaking if Python doesn't know the starting address. I need to review the finalizer contract.

@Thrameos
Copy link
Contributor Author

Thrameos commented Aug 28, 2024

Here is the object layout from my best guess. (The conflict happened because something which is an unknown size was automatically added if basicsize=sizeof(PyObject) and itemsize=0 triggering the flag Py_TPFLAGS_INLINE_VALUES.)

memory

@Thrameos
Copy link
Contributor Author

@vstinner I posted my thoughts if it helps. I can't find anything on the Python 3.13 changes. Google search on Py_TPFLAGS_INLINE_VALUES shows nothing. So everything was a surprise to me.

@Thrameos
Copy link
Contributor Author

@marscher Is there still some action required on my part? I see pushes and I certainly didn't intend for you not to be able to update it.

@vstinner
Copy link
Contributor

@Thrameos: It would be great if you could copy/summarize what you wrote (with the nice schema) in a message to https://discuss.python.org/c/core-dev/23

@Thrameos
Copy link
Contributor Author

@vstinner I will try to put something together on Friday.

@Thrameos
Copy link
Contributor Author

I am working on a revision. It appear that there is an access method in PyUnstable_Object_GC_NewWithExtraData(). It still has interference with the INLINE_VALUES which is automatically added, but this will be much less of a hack. Assuming it works then crisis is averted. We can try to push them to move the inline dict some place safer in version 3.14.

@Thrameos
Copy link
Contributor Author

@encukou I tried again using the call from Unstable, but there were many issues with this path. First if I used the extra memory that I requested I would get random crashes in Python 3.12. Doubling the memory request solved that, but I can come up with no reasonable explanation for why my attempt would fail. Second there is no Unstable version call for extra memory that works for PyVarObject which is all of my type classes and the int class. I managed to dodge it by creating objects with extra items then removing the items after init. This is much safer than my original hack, but still far from an ideal.

I put in 6 hours trying to make a more sensible API for the Python release, unfortunately, I ran up on the rocks as one symbol needed to be exposed to pass the size of the inline dict where it was needed. Every path lead to linker errors. I am sure someone more experienced could fix the issue but I don't understand why Python exports and links certain symbols and not others. I am guessing there is an export filter some place but I couldn't locate it.

My review of the Py3.13 code is that it does a lot of needless calculations that would be much better solved with a tp_headersize. If that slot existed then it would only need to be computed once and a large chunk of mostly replicated flag checking would go away.

@Thrameos
Copy link
Contributor Author

@marscher Life just keeps getting F***ING worse.... They backported changes to Python 3.11.10 without the corresponding changes with which I could possibly extra memory allocations work. Okay I am calling this, I have burned every available hour for this project that I had just trying to deal with the removal of a symbol without an appropriate replacement. It would be one thing if I had free time and could actually participate, but time demands from work are too high try to build houses on sand. Python 3.11 is now officially dead to me. I can't possibly make things work when there are no hooks, examples, or no thought out API of how something is supposed to be used. If they chose to backport PyUnstable_Object_GC_NewWithExtraData with a corresponding PyUnstable_VarObject_GC_NewWithExtraData (which actually work without random segmentation faults) I can try again. Sadly, I can't spend my life coming up with hacks for every minor release.

The Python core code is a giant pile of spaghetti without a clear master plan filled with speed hacks for which much of it doesn't appear to have meaningful unit tests. Features that should help us are not for our use case, or turn out to not be functional. I will try once more to deal with the changes to type and type with meta, but if there is no way to perform type with metastable and bases then I am afraid all reasonable paths for this project may be dead.

@Thrameos
Copy link
Contributor Author

@marscher I am less that confident that this PR actually works. Repeated runs on Python 3.12 for the tp_new fix show while it will pass the test bench once, the code is unstable and will occasionally produce random errors. It may be an issue with the new work or a problem with the old. I currently don't have any way of verifying when an error happens infrequently. I recommend that we consider placing our test bench on a for loop and execute it 5 times in a row. If it passes after a large stress test then it may be good.

In addition, if they are back porting changes that affect us into minor releases it is going to get hard to have confidence in our azure results. I only found the latest round of problem when I decided to run an apt upgrade that upped all my python versions on a development machine. That is when 3.11 suddenly went south.

@Thrameos
Copy link
Contributor Author

@marscher Now that I know that there is something wrong with the allocator/GC in the versions of Python 3.11 onward and that trying to use the unstable add it extra memory doesn't work properly (it required extra bytes even when I stayed in my space) I am going to use a "kinder" version of the dirty hack that started this thread. Instead of mutilating the type that I will be allocating, I created a dedicated alloc object, mutilate it, allocate an object, then immediately cast polymorph on it to make it into a sheep... I mean the desired object.

Unfortunately this doesn't fix the instability problem entirely. You will notice that in one of the test runs before I applied the tables 3.12 failed on a random and inexplicable error. I am 99% sure that we never wrote out of our memory space. We always had a valid type on the object. I suppose we can try to add a critical section to make sure we never get interrupted during the allocation routine.

native/python/pyjp_value.cpp Dismissed Show dismissed Hide dismissed
@marscher
Copy link
Member

marscher commented Oct 1, 2024

@Thrameos so we close this one because #1216 is merged already, right?

@Thrameos Thrameos closed this Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python 3.13 problems
4 participants