Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Python 3.13 free-threading support #59057

Open
7 tasks done
lysnikolaou opened this issue Jun 20, 2024 · 18 comments
Open
7 tasks done

ENH: Python 3.13 free-threading support #59057

lysnikolaou opened this issue Jun 20, 2024 · 18 comments
Labels
Build Library building on various platforms Enhancement Python 3.13

Comments

@lysnikolaou
Copy link
Contributor

lysnikolaou commented Jun 20, 2024

This aims to be a tracking issue for all work necessary to support the free-threaded build of Python 3.13. A very high-level list of steps is (more details to follow as I investigate more issues):

@xbit18
Copy link

xbit18 commented Jun 20, 2024

Hi! I'm currently writing a thesis on free-threading python and wonder if there's any estimate on how much time this would take

@lysnikolaou
Copy link
Contributor Author

Hello! An initial investigation shows that it should be relatively straight-forward to get a passing test suite under the free-threaded build. However, if you're looking for a full release with support for it, it's still probably going to be a few weeks/months, since Python 3.13 is not final (not even in rc) yet.

Maybe a maintainer could answer this better than me as well.

@xbit18
Copy link

xbit18 commented Jun 21, 2024

Yeah I'm only in need of a "preliminary" implementation to run some really simple tests involving a dataframe with python 3.13.0b2 with the free threading build. Right now it's not even installing correctly.

@lesteve
Copy link
Contributor

lesteve commented Aug 1, 2024

Yeah I'm only in need of a "preliminary" implementation to run some really simple tests involving a dataframe with python 3.13.0b2 with the free threading build. Right now it's not even installing correctly.

@xbit18 you probably have managed to do it since your last comment but the https://py-free-threading.github.io doc has a lot of useful info in particular:

@lysnikolaou
Copy link
Contributor Author

lysnikolaou commented Oct 16, 2024

I'm trying to get Windows wheels to build successfully as well. It seems like it's an easy task to build them, but not as easy to test them, because Windows wheels are tested on windowsservercore instead of the GHA runner.

@mroeschke You implemented this in #53087. Is there any way around it? There's this comment in the workflow file. Could you elaborate on it?

# Testing on windowsservercore instead of GHA runner to fail on missing DLLs

The Docker images do not contain a free-threaded build, so it's currently impossible to do this for the free-threaded wheels. Should we enable cibuildwheel tests and disable the Docker ones for free-threaded Windows wheels maybe? Would that be okay?

@mroeschke
Copy link
Member

Maybe it's not entirely necessary to test these wheels on a separate Docker image anymore and just use the windows images from the GHA runner if that makes this easier

@ngoldbaum
Copy link
Contributor

ngoldbaum commented Oct 31, 2024

FWIW @lesteve just let me know over in a scikit-learn issue that scikit-learn is going to need the docker images as well.

@Alec1198055421
Copy link

hi, How long until Python 3.13's free-threading support is complete?

@HackStrix
Copy link

HackStrix commented Nov 13, 2024

I tried pandas with GIL off with TQDM wrapper, I was getting seg faults with the tqdm thread. Let me know if you need the code to replicate this issue.

Fatal Python error: Segmentation fault

Thread 0x00007f642dffd640 (most recent call first):
File "/u3/u_name/python/lib/python3.13t/threading.py", line 363 in wait
File "/u3/u_name/python/lib/python3.13t/threading.py", line 659 in wait
File "/u3/u_name/python/lib/python3.13t/site-packages/tqdm/_monitor.py", line 60 in run
File "/u3/u_name/python/lib/python3.13t/threading.py", line 1041 in _bootstrap_inner
File "/u3/u_name/python/lib/python3.13t/threading.py", line 1012 in _bootstrap

Thread 0x00007f642effe640 (most recent call first):
File "/u3/u_name/python/lib/python3.13t/site-packages/pandas/core/generic.py", line 6312 in init
File "/u3/u_name/python/lib/python3.13t/site-packages/pandas/core/series.py", line 6255 in init
File "/u3/u_name/python/lib/python3.13t/site-packages/pandas/core/series.py", line 6232 in _construct_result
File "/u3/u_name/python/lib/python3.13t/site-packages/pandas/core/series.py", line 6121 in _cmp_method
File "/u3/u_name/python/lib/python3.13t/site-packages/pandas/core/arraylike.py", line ??? in gt

@ngoldbaum
Copy link
Contributor

Let me know if you need the code to replicate this issue.

yes please

@HackStrix
Copy link

HackStrix commented Nov 13, 2024

`
You might need to run this 2-3x to observe the issue. This is not my exact code. I had to simplify it by a lot.
My machine is
"Linux ubuntu2204-002 5.15.0-86-generic"

from tqdm import tqdm
import pandas as pd
import numpy as np
from concurrent.futures import ThreadPoolExecutor, as_completed
data = pd.DataFrame({
    'A': np.random.randn(100),
    'B': np.random.randn(100),
    'C': np.random.randn(100),
    'D': np.random.randn(100), 
    'E': np.random.randn(100),
    'F': np.random.randn(100),
    'G': np.random.randn(100),
    'H': np.random.randn(100)
})
grouped_data = data.groupby(['A'])


def getData():
    for name, group in grouped_data:
        yield group
    yield None
    return
    
def run(index):
    print(index)
    gen = getData()
    for grp in gen:
        if grp is None:
            continue
        filtered_grp = grp.copy(deep=True)
        filtered_grp = filtered_grp[filtered_grp['B'] > 0.1]
        filtered_grp = filtered_grp[filtered_grp['C'] < 0.1]
        # Perform a complex operation on the filtered DataFrame
        filtered_grp['X'] = filtered_grp['B'] * filtered_grp['C'] * filtered_grp['D']
        filtered_grp['Y'] = filtered_grp['B'] * filtered_grp['C'] * filtered_grp['D']
        filtered_grp['Z'] = filtered_grp['B'] * filtered_grp['C'] * filtered_grp['D']
        filtered_grp['L'] = filtered_grp['B'] * filtered_grp['C'] * filtered_grp['D']
        filtered_grp['Z'] = filtered_grp['B'] * filtered_grp['C'] * filtered_grp['D']
    return True


max_workers = 40
with ThreadPoolExecutor(max_workers=max_workers) as executor:
    futures = []
    for i in range(50):
        futures.append(
            executor.submit(run, i)
        )

# Wait for all futures to complete before proceeding to the next iteration
    with tqdm(total=len(futures)) as pbar:
        for future in as_completed(futures):
            try:
                result = future.result()
            except Exception as e:
                print(f"Simulation failed with exception: {e}")
            finally:
                pbar.update(1)

`

This is the gist of what I was trying to do.

python3.13t -Xgil=0 -X dev segfault_replication.py

`
ERROR

Fatal Python error: Segmentation fault

Thread 0x00007fd59effe640 (most recent call first):
  File "/u3/s24narula/python/lib/python3.13t/threading.py", line 363 in wait
  File "/u3/s24narula/python/lib/python3.13t/threading.py", line 659 in wait
  File "/u3/s24narula/python/lib/python3.13t/site-packages/tqdm/_monitor.py", line 60 in run
  File "/u3/s24narula/python/lib/python3.13t/threading.py", line 1041 in _bootstrap_inner
  File "/u3/s24narula/python/lib/python3.13t/threading.py", line 1012 in _bootstrap

Thread 0x00007fd59ffff640 (most recent call first):
  File "/u3/s24narula/python/lib/python3.13t/site-packages/pandas/core/generic.py", line 357 in _from_mgr
  File "/u3/s24narula/python/lib/python3.13t/site-packages/pandas/core/frame.py", line 6255 in _constructor_from_mgr
  File "/u3/s24narula/python/lib/python3.13t/site-packages/pandas/core/generic.py", line 6813 in copy
  File "/u3/s24narula/python/lib/python3.13t/site-packages/pandas/core/indexing.py", line 370 in check_dict_or_set_indexers
  File Segmentation fault

`

pip freeze
numpy==2.1.3
pandas==2.2.3
psycopg2-binary==2.9.10
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
pytz==2024.2
six==1.16.0
tqdm==4.67.0
tzdata==2024.2

@ngoldbaum
Copy link
Contributor

@colesbury could this be a CPython bug? I can reproduce the crash, here's the traceback:

* thread #7, stop reason = EXC_BAD_ACCESS (code=1, address=0xd410a0b8939d9809)
    frame #0: 0x0000000100a949a8 libpython3.13t.dylib`_mi_free_delayed_block + 164
    frame #1: 0x0000000100a93fbc libpython3.13t.dylib`_mi_malloc_generic + 252
    frame #2: 0x0000000100ba1bb4 libpython3.13t.dylib`gc_alloc + 284
    frame #3: 0x0000000100ba1c10 libpython3.13t.dylib`_PyObject_GC_NewVar + 64
    frame #4: 0x0000000100a5028c libpython3.13t.dylib`PyFrame_New + 208
    frame #5: 0x00000001018888ec hashtable.cpython-313t-darwin.so`__Pyx_AddTraceback + 812
  * frame #6: 0x00000001018978c8 hashtable.cpython-313t-darwin.so`__pyx_f_6pandas_5_libs_9hashtable_17PyObjectHashTable_get_item + 580
    frame #7: 0x00000001019199dc hashtable.cpython-313t-darwin.so`__pyx_pw_6pandas_5_libs_9hashtable_17PyObjectHashTable_13get_item + 132
    frame #8: 0x000000010251cb64 index.cpython-313t-darwin.so`__pyx_f_6pandas_5_libs_5index_11IndexEngine_get_loc + 2548
    frame #9: 0x000000010253e7bc index.cpython-313t-darwin.so`__pyx_pw_6pandas_5_libs_5index_11IndexEngine_3__contains__ + 268
    frame #10: 0x0000000100b69b70 libpython3.13t.dylib`_PyEval_EvalFrameDefault + 23000
    frame #11: 0x0000000100ad27cc libpython3.13t.dylib`slot_sq_contains + 388
    frame #12: 0x0000000100b69b70 libpython3.13t.dylib`_PyEval_EvalFrameDefault + 23000
    frame #13: 0x0000000100ac9a8c libpython3.13t.dylib`vectorcall_method + 144
    frame #14: 0x0000000100ad1d8c libpython3.13t.dylib`slot_mp_ass_subscript + 68
    frame #15: 0x0000000100b70d3c libpython3.13t.dylib`_PyEval_EvalFrameDefault + 52132
    frame #16: 0x0000000100a2b404 libpython3.13t.dylib`method_vectorcall + 328
    frame #17: 0x0000000100c58f60 libpython3.13t.dylib`thread_run + 128
    frame #18: 0x0000000100beec9c libpython3.13t.dylib`pythread_wrapper + 28
    frame #19: 0x0000000183b65f94 libsystem_pthread.dylib`_pthread_start + 136

I think we're hitting the raise KeyError branch here:

This is in a cython cdef class that implements __dealloc__ so I guess this could be a Cython bug too?

Extra bit of weirdness: if I remove the use of tqdm from the script, the crash doesn't happen.

@ngoldbaum
Copy link
Contributor

ngoldbaum commented Nov 13, 2024

Extra bit of weirdness: if I remove the use of tqdm from the script, the crash doesn't happen.

That's not true, I can trigger crashes without the context manager.

And I initially thought that the workers weren't operating on shared data, but that's not true, there's definitely some mutating of shared arrays happening.

In which case, @HackStrix you might be hitting the fact that ndarray itself is not thread safe, even in the GIL-enabled build: https://numpy.org/doc/stable/reference/thread_safety.html#thread-safety

@colesbury
Copy link

Maybe.... should I build Pandas from source to repro this? Might be helpful to look at the Cython generated code.

@ngoldbaum
Copy link
Contributor

I can repro it using the wheel but building pandas from source with debug symbols will likely help:

python -m pip install -v . --no-build-isolation -Cbuilddir=build -C'compile-args=-v' -C'setup-args=-Dbuildtype=debug'

That'll leave the build directory afterward in build if you want to look at the generated cython code. Make sure you have nightly cython installed first. You'll also need numpy, installing it from the wheel on pypi should work.

@colesbury
Copy link

I saw a crash that involved BlockValuesRefs. Are Blocks and BlockValuesRefs thread-safe?

... there's definitely some mutating of shared arrays happening

The mutation looks like it's on a deep copy, not shared data, but I don't know pandas well enough to be sure: filtered_grp = grp.copy(deep=True)

@ngoldbaum
Copy link
Contributor

Ah you're right, so no mutation but the grp dataframes are shared.

I saw a crash that involved BlockValuesRefs. Are Blocks and BlockValuesRefs thread-safe?

I don't see any kind of locking in the definition in pandas/_libs/_internals.pyx, so I doubt it.

@HackStrix
Copy link

My understanding was creating a deep copy should have avoided mutation of original ndarray.

Just wondering if it is possible, due to optimisation, the deep copy only copies on write?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms Enhancement Python 3.13
Projects
None yet
Development

No branches or pull requests

9 participants