Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Windows fatal exception: access violation #40100

Open
powersj opened this issue Feb 16, 2024 · 27 comments
Open

[Python] Windows fatal exception: access violation #40100

powersj opened this issue Feb 16, 2024 · 27 comments

Comments

@powersj
Copy link

powersj commented Feb 16, 2024

Describe the bug, including details regarding any error messages, version, and platform.

Hi,

When using the pyarrow flight client, I have a user who occasionally sees a Windows fatal exception error. This involves a query with multiple subqueries across many fields. I do have access to the environment and can reproduce. We have found that there is some sort of correlation between the number of fields and the exception occurring. As we decrease the number of fields the issue can occur less and less consistently.

I realize that getting an issue without exact steps to reproduce is unhelpful. However, I am more than willing to try out test builds or build a customer version to gather more details if I can get some guidance.

I was able to easily build a custom version on Linux per the dev docs, but I tried building a custom pyarrow on Windows and ran into issues right away with detection of the compiler. I have my steps and logs below.

Observations

  1. This only occurs on Windows 10 or 11; the same query runs fun on Linux/macOS
  2. This only occurs when running as a Python notebook, running as a script works
  3. It reproduces with both Python 3.11 and 3.12
  4. Issues occurs with both a pip-only or conda environment
  5. Disabling all virus or Windows security detection does not help
  6. A windows event occurs calling out arrow_flight.dll

Windows Event Log Message

Faulting application name: python3.12.exe, version: 3.12.1150.1013, time stamp: 0x6572422a
Faulting module name: arrow_flight.dll, version: 0.0.0.0, time stamp: 0x65a69ccb
Exception code: 0xc0000005
Fault offset: 0x00000000002dc6b0
Faulting process id: 0x0x4F8
Faulting application start time: 0x0x1DA55FAF308D836
Faulting application path: C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.496.0_x64__qbz5n2kfra8p0\python3.12.exe
Faulting module path: C:\Users\powersj\v3-ear\.venv\Lib\site-packages\pyarrow\arrow_flight.dll
Report Id: f8313105-2c59-4f1a-a8a6-a4227a8ae7d9
Faulting package full name: PythonSoftwareFoundation.Python.3.12_3.12.496.0_x64__qbz5n2kfra8p0
Faulting package-relative application ID: Python

Code

import json
import certifi

from pyarrow.flight import FlightClient, Ticket, FlightCallOptions

import faulthandler
faulthandler.enable()

host = "host"
token = "token"
database = "db"

with open(certifi.where(), "r", encoding="utf-8") as f_cert:
    cert = f_cert.read()

with open("kernel-crash.sql", "r", encoding="utf-8") as f_sql:
    query = f_sql.read()

options = FlightCallOptions(**{
    "headers": [(b"authorization", f"Bearer {token}".encode('utf-8'))],
    "timeout": 300
})
ticket_data = {
    "database": database,
    "sql_query": query,
    "query_type": "sql",
}
ticket = Ticket(json.dumps(ticket_data).encode('utf-8'))
with FlightClient(f"grpc+tls://{host}:443", tls_root_certs=cert) as client:
    reader = client.do_get(ticket, options)
    print(reader.read_all())

Traceback

Windows fatal exception: access violation

Thread 0x000026a8 (most recent call first):
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\parentpoller.py", line 93 in run
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1073 in _bootstrap_inner
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1030 in _bootstrap

Thread 0x00002700 (most recent call first):
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 355 in wait
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 655 in wait
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\history.py", line 894 in run
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\history.py", line 60 in only_when_enabled
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\decorator.py", line 232 in fun
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1073 in _bootstrap_inner
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1030 in _bootstrap

Thread 0x00002620 (most recent call first):
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\selectors.py", line 314 in _select
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\selectors.py", line 323 in select
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 1947 in _run_once
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 639 in run_forever
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\tornado\platform\asyncio.py", line 205 in start
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\control.py", line 23 in run
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1073 in _bootstrap_inner
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1030 in _bootstrap

Thread 0x00001ba8 (most recent call first):
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\heartbeat.py", line 106 in run
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1073 in _bootstrap_inner
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1030 in _bootstrap

Thread 0x00001d80 (most recent call first):
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\selectors.py", line 314 in _select
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\selectors.py", line 323 in select
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 1947 in _run_once
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 639 in run_forever
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\tornado\platform\asyncio.py", line 205 in start
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\iostream.py", line 92 in _thread_main
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1010 in run
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1073 in _bootstrap_inner
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1030 in _bootstrap

Current thread 0x000025e0 (most recent call first):
  File "C:\Users\powersj\AppData\Local\Temp\ipykernel_9720\769077188.py", line 26 in <module>
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3553 in run_code
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3493 in run_ast_nodes
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3311 in run_cell_async
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\async_helpers.py", line 129 in _pseudo_sync_runner
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3106 in _run_cell
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\IPython\core\interactiveshell.py", line 3051 in run_cell
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\zmqshell.py", line 549 in run_cell
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\ipkernel.py", line 446 in do_execute
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\kernelbase.py", line 775 in execute_request
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\ipkernel.py", line 359 in execute_request
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\kernelbase.py", line 437 in dispatch_shell
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\kernelbase.py", line 531 in process_one
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\kernelbase.py", line 542 in dispatch_queue
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\events.py", line 88 in _run
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 1985 in _run_once
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\asyncio\base_events.py", line 639 in run_forever
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\tornado\platform\asyncio.py", line 205 in start
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel\kernelapp.py", line 739 in start
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\traitlets\config\application.py", line 1075 in launch_instance
  File "C:\Users\powersj\v3-ear\.venv\Lib\site-packages\ipykernel_launcher.py", line 17 in <module>
  File "<frozen runpy>", line 88 in _run_code
  File "<frozen runpy>", line 198 in _run_module_as_main

System Information

$ python --version
Python 3.11.8
(venv)
$ pip list
Package           Version
----------------- --------
asttokens         2.4.1
certifi           2024.2.2
colorama          0.4.6
comm              0.2.1
debugpy           1.8.1
decorator         5.1.1
executing         2.0.1
ipdb              0.13.13
ipykernel         6.29.2
ipython           8.21.0
jedi              0.19.1
jupyter_client    8.6.0
jupyter_core      5.7.1
matplotlib-inline 0.1.6
nest-asyncio      1.6.0
numpy             1.26.4
packaging         23.2
parso             0.8.3
pip               23.3.1
platformdirs      4.2.0
prompt-toolkit    3.0.43
psutil            5.9.8
pure-eval         0.2.2
pyarrow           15.0.0
Pygments          2.17.2
python-dateutil   2.8.2
pywin32           306
pyzmq             25.1.2
setuptools        69.0.2
six               1.16.0
stack-data        0.6.3
tornado           6.4
traitlets         5.14.1
wcwidth           0.2.13
wheel             0.42.0

When using conda:

C:\Users\powersj>conda info

     active environment : None
       user config file : C:\Users\powersj\.condarc
 populated config files :
          conda version : 23.11.0
    conda-build version : not installed
         python version : 3.11.5.final.0
                 solver : libmamba (default)
       virtual packages : __archspec=1=x86_64
                          __conda=23.11.0=0
                          __win=0=0
       base environment : C:\ProgramData\miniconda3  (read only)
      conda av data dir : C:\ProgramData\miniconda3\etc\conda
  conda av metadata url : None
           channel URLs : https://repo.anaconda.com/pkgs/main/win-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/win-64
                          https://repo.anaconda.com/pkgs/r/noarch
                          https://repo.anaconda.com/pkgs/msys2/win-64
                          https://repo.anaconda.com/pkgs/msys2/noarch
          package cache : C:\ProgramData\miniconda3\pkgs
                          C:\Users\powersj\.conda\pkgs
                          C:\Users\powersj\AppData\Local\conda\conda\pkgs
       envs directories : C:\Users\powersj\.conda\envs
                          C:\ProgramData\miniconda3\envs
                          C:\Users\powersj\AppData\Local\conda\conda\envs
               platform : win-64
             user-agent : conda/23.11.0 requests/2.31.0 CPython/3.11.5 Windows/10 Windows/10.0.22621 solver/libmamba conda-libmamba-solver/23.12.0 libmambapy/1.5.3
          administrator : False
             netrc file : None
           offline mode : False

Build Attempt

C:\Users\powersj>conda create -y -n pyarrow-dev -c conda-forge ^
More?       --file arrow\ci\conda_env_cpp.txt ^
More?       --file arrow\ci\conda_env_python.txt ^
More?       --file arrow\ci\conda_env_gandiva.txt ^
More?       python=3.11

<snip>

C:\Users\powersj>conda activate pyarrow-dev

(pyarrow-dev) C:\Users\powersj>set ARROW_HOME=%CONDA_PREFIX%\Library

(pyarrow-dev) C:\Users\powersj>mkdir arrow\cpp\build

(pyarrow-dev) C:\Users\powersj>pushd arrow\cpp\build

(pyarrow-dev) C:\Users\powersj\arrow\cpp\build>cmake -G "Ninja" ^
More?       -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^
More?       -DCMAKE_UNITY_BUILD=ON ^
More?       -DARROW_COMPUTE=ON ^
More?       -DARROW_CSV=ON ^
More?       -DARROW_CXXFLAGS="/WX /MP" ^
More?       -DARROW_DATASET=ON ^
More?       -DARROW_FILESYSTEM=ON ^
More?       -DARROW_HDFS=ON ^
More?       -DARROW_JSON=ON ^
More?       -DARROW_PARQUET=ON ^
More?       -DARROW_WITH_LZ4=ON ^
More?       -DARROW_WITH_SNAPPY=ON ^
More?       -DARROW_WITH_ZLIB=ON ^
More?       -DARROW_WITH_ZSTD=ON ^
More?       -DARROW_FLIGHT=ON ^
More?       ..
-- Building using CMake version: 3.28.3
-- The C compiler identification is Clang 17.0.6 with GNU-like command-line
-- The CXX compiler identification is unknown
CMake Error at C:/Users/powersj/.conda/envs/pyarrow-dev/Library/share/cmake-3.28/Modules/Platform/Windows-Clang.cmake:170 (message):
  The current configuration mixes Clang and MSVC or some other CL compatible
  compiler tool.  This is not supported.  Use either clang or MSVC as both C,
  C++ and/or HIP compilers.
Call Stack (most recent call first):
  C:/Users/powersj/.conda/envs/pyarrow-dev/Library/share/cmake-3.28/Modules/Platform/Windows-Clang.cmake:180 (__verify_same_language_values)
  C:/Users/powersj/.conda/envs/pyarrow-dev/Library/share/cmake-3.28/Modules/Platform/Windows-Clang-C.cmake:1 (include)
  C:/Users/powersj/.conda/envs/pyarrow-dev/Library/share/cmake-3.28/Modules/CMakeCInformation.cmake:48 (include)
  CMakeLists.txt:95 (project)


CMake Error at CMakeLists.txt:95 (project):
  No CMAKE_CXX_COMPILER could be found.

  Tell CMake where to find the compiler by setting either the environment
  variable "CXX" or the CMake cache entry CMAKE_CXX_COMPILER to the full path
  to the compiler, or to the compiler name if it is in the PATH.


-- Configuring incomplete, errors occurred!

(pyarrow-dev) C:\Users\powersj\arrow\cpp\build>

It is not clear to me what compiler I am suppose to use, either something from the conda environment or the locally installed one?

If I try setting via the CC and CXX env variables I get:

set CC=C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64\cl.exe
set CXX=C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64\cl.exe

<snip>
-- Building using CMake version: 3.28.3
-- The C compiler identification is MSVC 19.39.33519.0
-- The CXX compiler identification is MSVC 19.39.33519.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - failed
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe - broken
CMake Error at C:/Users/powersj/.conda/envs/pyarrow-dev/Library/share/cmake-3.28/Modules/CMakeTestCCompiler.cmake:67 (message):
  The C compiler

    "C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.39.33519/bin/Hostx64/x64/cl.exe"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: 'C:/Users/powersj/arrow/cpp/build/CMakeFiles/CMakeScratch/TryCompile-j51cjy'

    Run Build Command(s): C:/Users/powersj/.conda/envs/pyarrow-dev/Library/bin/ninja.exe -v cmTC_f4d4d
    [1/2] C:\PROGRA~1\MICROS~2\2022\COMMUN~1\VC\Tools\MSVC\1439~1.335\bin\Hostx64\x64\cl.exe  /nologo   /DWIN32 /D_WINDOWS  /Zi /Ob0 /Od /RTC1 -MDd /showIncludes /FoCMakeFiles\cmTC_f4d4d.dir\testCCompiler.c.obj /FdCMakeFiles\cmTC_f4d4d.dir\ /FS -c C:\Users\powersj\arrow\cpp\build\CMakeFiles\CMakeScratch\TryCompile-j51cjy\testCCompiler.c
    [2/2] C:\WINDOWS\system32\cmd.exe /C "cd . && C:\Users\powersj\.conda\envs\pyarrow-dev\Library\bin\cmake.exe -E vs_link_exe --intdir=CMakeFiles\cmTC_f4d4d.dir --rc=rc --mt=CMAKE_MT-NOTFOUND --manifests  -- C:\PROGRA~1\MICROS~2\2022\COMMUN~1\VC\Tools\MSVC\1439~1.335\bin\Hostx64\x64\link.exe /nologo CMakeFiles\cmTC_f4d4d.dir\testCCompiler.c.obj  /out:cmTC_f4d4d.exe /implib:cmTC_f4d4d.lib /pdb:cmTC_f4d4d.pdb /version:0.0 /machine:x64  /debug /INCREMENTAL /subsystem:console  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
    FAILED: cmTC_f4d4d.exe
    C:\WINDOWS\system32\cmd.exe /C "cd . && C:\Users\powersj\.conda\envs\pyarrow-dev\Library\bin\cmake.exe -E vs_link_exe --intdir=CMakeFiles\cmTC_f4d4d.dir --rc=rc --mt=CMAKE_MT-NOTFOUND --manifests  -- C:\PROGRA~1\MICROS~2\2022\COMMUN~1\VC\Tools\MSVC\1439~1.335\bin\Hostx64\x64\link.exe /nologo CMakeFiles\cmTC_f4d4d.dir\testCCompiler.c.obj  /out:cmTC_f4d4d.exe /implib:cmTC_f4d4d.lib /pdb:cmTC_f4d4d.pdb /version:0.0 /machine:x64  /debug /INCREMENTAL /subsystem:console  kernel32.lib user32.lib gdi32.lib winspool.lib shell32.lib ole32.lib oleaut32.lib uuid.lib comdlg32.lib advapi32.lib && cd ."
    RC Pass 1: command "rc /fo CMakeFiles\cmTC_f4d4d.dir/manifest.res CMakeFiles\cmTC_f4d4d.dir/manifest.rc" failed (exit code 0) with the following output:
    The system cannot find the file specified
    ninja: build stopped: subcommand failed.





  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:95 (project)


-- Configuring incomplete, errors occurred!

(pyarrow-dev) C:\Users\powersj\arrow\cpp\build>

Component(s)

Python

@amoeba
Copy link
Member

amoeba commented Feb 16, 2024

Hi @powersj. This looks similar to #37852 though we weren't able to reproduce in that issue. I couldn't reproduce, though I had to modify your script to run on my system and create a simple server implementation to test. Would it be possible to share a self-contained example of both the client and server code? cc @lidavidm

One thing that jumped out at me in your logs are the lines in your traceback like this:

  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.12_3.12.752.0_x64__qbz5n2kfra8p0\Lib\threading.py", line 1030 in _bootstrap

That looks like references to the Windows Store version of Python but I'd expect all the paths to lead to your conda environment so I wonder if the crash is due to mixing two Python environments.

Have you tried capturing your crash with WinDbg Preview?

@assignUser @kou do either of you have any idea about the build issue at the bottom of the OP? I also get that when I try to do a Windows+conda+clang build.

@lidavidm
Copy link
Member

For Windows, you'll want to use vcvarsall.bat or whatever the modern equivalent is, don't muck with the env vars yourself. Also, possibly try the VS generator for CMake instead of Ninja.

I don't have any clue about the crash itself. We would need a way to reproduce it.

You could also try downloading "Windbg Preview" from the Windows Store and running your script as windbgx -g python myscript.py to get a traceback.

@kou
Copy link
Member

kou commented Feb 17, 2024

For Windows, you'll want to use vcvarsall.bat or whatever the modern equivalent is, don't muck with the env vars yourself.

I think so too.

Also, possibly try the VS generator for CMake instead of Ninja.

If you use one of Visual Studio Generators https://cmake.org/cmake/help/latest/manual/cmake-generators.7.html#visual-studio-generators , you don't need to use vcvarsall.bat. CMake will find suitable Visual C++.

@amoeba
Copy link
Member

amoeba commented Feb 17, 2024

Thanks @kou. I'll send a PR to make this clearer in the Python docs. It wasn't clear to me which toolchain we were supporting there. I think it's fairly clear in the C++ docs. I'll try that it tomorrow.

@powersj
Copy link
Author

powersj commented Feb 20, 2024

Thanks for all the responses, especially around building the Python libraries on Windows. It does seem that changing the cmake target has allowed me to get further along via cmake -G "Visual Studio 17 2022" -A x64 ....

I did find the Python build requires the older 2017 libraries installed that are already referenced in the Python docs. I had some success with the debugger below though.

You could also try downloading "Windbg Preview" from the Windows Store and running your script as windbgx -g python myscript.py to get a traceback.

I launched the notebook and attached to the python process with the time travel option and caught it. How can I better share this with you? Does this collect anything helpful? Would it help to share the time travel capture? fwiw it is 620MB.

image

image

@powersj
Copy link
Author

powersj commented Feb 20, 2024

Full stack:

[0x0]   arrow_flight!arrow::flight::FlightWriteSizeStatusDetail::type_id+0x14d0e0   0x4e3b1eb7f0   0x7ffc3c81e7ca   
[0x1]   arrow_flight!arrow::flight::FlightWriteSizeStatusDetail::type_id+0x14f1fa   0x4e3b1eb850   0x7ffc3cc191da   
[0x2]   arrow_flight!arrow::flight::FlightWriteSizeStatusDetail::type_id+0x549c0a   0x4e3b1eb890   0x7ffc3c82110e   
[0x3]   arrow_flight!arrow::flight::FlightWriteSizeStatusDetail::type_id+0x151b3e   0x4e3b1eb8f0   0x7ffc3c91f46f   
[0x4]   arrow_flight!arrow::flight::FlightWriteSizeStatusDetail::type_id+0x24fe9f   0x4e3b1eb9a0   0x7ffc3c8370ab   
[0x5]   arrow_flight!arrow::flight::FlightWriteSizeStatusDetail::type_id+0x167adb   0x4e3b1eb9d0   0x7ffc3c83e219   
[0x6]   arrow_flight!arrow::flight::FlightWriteSizeStatusDetail::type_id+0x16ec49   0x4e3b1ebac0   0x7ffc3c81b6a5   
[0x7]   arrow_flight!arrow::flight::FlightWriteSizeStatusDetail::type_id+0x14c0d5   0x4e3b1ebc50   0x7ffc3c7aca49   
[0x8]   arrow_flight!arrow::flight::FlightWriteSizeStatusDetail::type_id+0xdd479   0x4e3b1ebcc0   0x7ffc3c869402   
[0x9]   arrow_flight!arrow::flight::FlightWriteSizeStatusDetail::type_id+0x199e32   0x4e3b1ebd20   0x7ffc3c81bbd5   
[0xa]   arrow_flight!arrow::flight::FlightWriteSizeStatusDetail::type_id+0x14c605   0x4e3b1ebdf0   0x7ffc3c7b8c57   
[0xb]   arrow_flight!arrow::flight::FlightWriteSizeStatusDetail::type_id+0xe9687   0x4e3b1ebe30   0x7ffc3c7ba351   
[0xc]   arrow_flight!arrow::flight::FlightWriteSizeStatusDetail::type_id+0xead81   0x4e3b1ec030   0x7ffc3c6d9f51   
[0xd]   arrow_flight!arrow::flight::FlightWriteSizeStatusDetail::type_id+0xa981   0x4e3b1ec090   0x7ffc3c6b04f7   
[0xe]   arrow_flight!arrow::flight::MakeTracingServerMiddlewareFactory+0x1427   0x4e3b1ec0f0   0x7ffc3c6b30d9   
[0xf]   arrow_flight!arrow::flight::FlightClient::PollFlightInfo+0x2a99   0x4e3b1ec180   0x7ffc3c6c8c93   
[0x10]   arrow_flight!arrow::flight::MakeTracingClientMiddlewareFactory+0xc3   0x4e3b1ec380   0x7ffc3c6ccdcd   
[0x11]   arrow_flight!arrow::flight::FlightServerBase::Shutdown+0x119d   0x4e3b1ec480   0x7ffc3c6cd57d   
[0x12]   arrow_flight!arrow::flight::FlightStreamReader::ToTable+0x3d   0x4e3b1ec590   0x7ffc7329b43f   
[0x13]   _flight_cp311_win_amd64 + 0x5b43f!_flight_cp311_win_amd64+0x5b43f   0x4e3b1ec680   0x7ffc3e845f04   
[0x14]   python311!PyObject_VectorcallMethod+0x1b0   0x4e3b1ec780   0x7ffc3e784c2c   
[0x15]   python311!PyObject_Vectorcall+0x5dc   0x4e3b1ec7d0   0x7ffc3e7860b2   
[0x16]   python311!PyEval_EvalFrameDefault+0x7a2   0x4e3b1ec8e0   0x7ffc3e7e56bf   
[0x17]   python311!PyType_CalculateMetaclass+0xfb   0x4e3b1ecaf0   0x7ffc3e7e70bf   
[0x18]   python311!PyEval_EvalCode+0x97   0x4e3b1ecb30   0x7ffc3e870e50   
[0x19]   python311!Py_GetRecursionLimit+0x53c   0x4e3b1ecbb0   0x7ffc3e870d20   
[0x1a]   python311!Py_GetRecursionLimit+0x40c   0x4e3b1ecc30   0x7ffc3e789c44   
[0x1b]   python311!PyEval_EvalFrameDefault+0x4334   0x4e3b1ecce0   0x7ffc3e7a370f   
[0x1c]   python311!PyDict_MergeFromSeq2+0x3cf   0x4e3b1ecef0   0x7ffc3e8d015d   
[0x1d]   python311!PyLong_AsUnsignedLongMask+0xc1   0x4e3b1ecf40   0x7ffc3e789667   
[0x1e]   python311!PyEval_EvalFrameDefault+0x3d57   0x4e3b1ecf80   0x7ffc3e7a370f   
[0x1f]   python311!PyDict_MergeFromSeq2+0x3cf   0x4e3b1ed190   0x7ffc3e8d015d   
[0x20]   python311!PyLong_AsUnsignedLongMask+0xc1   0x4e3b1ed1e0   0x7ffc3e789667   
[0x21]   python311!PyEval_EvalFrameDefault+0x3d57   0x4e3b1ed220   0x7ffc3e7a370f   
[0x22]   python311!PyDict_MergeFromSeq2+0x3cf   0x4e3b1ed430   0x7ffc3e84709b   
[0x23]   python311!PyGen_Finalize+0x263   0x4e3b1ed480   0x7ffc3e7ac0fb   
[0x24]   python311!PySequence_Tuple+0x537   0x4e3b1ed4d0   0x7ffc3e784c2c   
[0x25]   python311!PyObject_Vectorcall+0x5dc   0x4e3b1ed510   0x7ffc3e7860b2   
[0x26]   python311!PyEval_EvalFrameDefault+0x7a2   0x4e3b1ed620   0x7ffc3e7b6f94   
[0x27]   python311!PyFunction_Vectorcall+0x1a4   0x4e3b1ed830   0x7ffc3e7b84cd   
[0x28]   python311!PyFunction_Vectorcall+0x16dd   0x4e3b1ed8c0   0x7ffc3e80036a   
[0x29]   python311!PyObject_CallObject+0x37e   0x4e3b1ed9c0   0x7ffc3e78ac94   
[0x2a]   python311!PyEval_EvalFrameDefault+0x5384   0x4e3b1eda20   0x7ffc3e7a370f   
[0x2b]   python311!PyDict_MergeFromSeq2+0x3cf   0x4e3b1edc30   0x7ffc3e8d015d   
[0x2c]   python311!PyLong_AsUnsignedLongMask+0xc1   0x4e3b1edc80   0x7ffc3e789667   
[0x2d]   python311!PyEval_EvalFrameDefault+0x3d57   0x4e3b1edcc0   0x7ffc3e7a370f   
[0x2e]   python311!PyDict_MergeFromSeq2+0x3cf   0x4e3b1eded0   0x7ffc3e8d015d   
[0x2f]   python311!PyLong_AsUnsignedLongMask+0xc1   0x4e3b1edf20   0x7ffc3e789667   
[0x30]   python311!PyEval_EvalFrameDefault+0x3d57   0x4e3b1edf60   0x7ffc3e7a370f   
[0x31]   python311!PyDict_MergeFromSeq2+0x3cf   0x4e3b1ee170   0x7ffc3e8d015d   
[0x32]   python311!PyLong_AsUnsignedLongMask+0xc1   0x4e3b1ee1c0   0x7ffc3e789667   
[0x33]   python311!PyEval_EvalFrameDefault+0x3d57   0x4e3b1ee200   0x7ffc3e7a370f   
[0x34]   python311!PyDict_MergeFromSeq2+0x3cf   0x4e3b1ee410   0x7ffc3e8d015d   
[0x35]   python311!PyLong_AsUnsignedLongMask+0xc1   0x4e3b1ee460   0x7ffc3e789667   
[0x36]   python311!PyEval_EvalFrameDefault+0x3d57   0x4e3b1ee4a0   0x7ffc3e7a370f   
[0x37]   python311!PyDict_MergeFromSeq2+0x3cf   0x4e3b1ee6b0   0x7ffc3e8d015d   
[0x38]   python311!PyLong_AsUnsignedLongMask+0xc1   0x4e3b1ee700   0x7ffc3e789667   
[0x39]   python311!PyEval_EvalFrameDefault+0x3d57   0x4e3b1ee740   0x7ffc3e7a370f   
[0x3a]   python311!PyDict_MergeFromSeq2+0x3cf   0x4e3b1ee950   0x7ffc3e8d015d   
[0x3b]   python311!PyLong_AsUnsignedLongMask+0xc1   0x4e3b1ee9a0   0x7ffc6ad558df   
[0x3c]   _asyncio!PyInit__asyncio+0x48df   0x4e3b1ee9e0   0x7ffc6ad55753   
[0x3d]   _asyncio!PyInit__asyncio+0x4753   0x4e3b1eea80   0x7ffc6ad5602f   
[0x3e]   _asyncio!PyInit__asyncio+0x502f   0x4e3b1eeab0   0x7ffc3e7b9c0c   
[0x3f]   python311!PyIter_Send+0x13ec   0x4e3b1eeaf0   0x7ffc3e9b8b31   
[0x40]   python311!PyContext_NewHamtForTests+0x51   0x4e3b1eeb50   0x7ffc3e9b8e11   
[0x41]   python311!PyContext_NewHamtForTests+0x331   0x4e3b1eeb90   0x7ffc3e7ba86c   
[0x42]   python311!PyArg_CheckPositional+0x12c   0x4e3b1eebe0   0x7ffc3e800773   
[0x43]   python311!PyObject_Call+0x5b   0x4e3b1eec20   0x7ffc3e800440   
[0x44]   python311!PyObject_CallObject+0x454   0x4e3b1eec80   0x7ffc3e78ac94   
[0x45]   python311!PyEval_EvalFrameDefault+0x5384   0x4e3b1eece0   0x7ffc3e7e56bf   
[0x46]   python311!PyType_CalculateMetaclass+0xfb   0x4e3b1eeef0   0x7ffc3e7e70bf   
[0x47]   python311!PyEval_EvalCode+0x97   0x4e3b1eef30   0x7ffc3e870e50   
[0x48]   python311!Py_GetRecursionLimit+0x53c   0x4e3b1eefb0   0x7ffc3e870d20   
[0x49]   python311!Py_GetRecursionLimit+0x40c   0x4e3b1ef030   0x7ffc3e7ba86c   
[0x4a]   python311!PyArg_CheckPositional+0x12c   0x4e3b1ef0e0   0x7ffc3e784c2c   
[0x4b]   python311!PyObject_Vectorcall+0x5dc   0x4e3b1ef120   0x7ffc3e7860b2   
[0x4c]   python311!PyEval_EvalFrameDefault+0x7a2   0x4e3b1ef230   0x7ffc3e7b6f94   
[0x4d]   python311!PyFunction_Vectorcall+0x1a4   0x4e3b1ef440   0x7ffc3e800773   
[0x4e]   python311!PyObject_Call+0x5b   0x4e3b1ef4d0   0x7ffc3e835788   
[0x4f]   python311!PyRun_SimpleStringFlags+0x230   0x4e3b1ef530   0x7ffc3e83594b   
[0x50]   python311!Py_RunMain+0x137   0x4e3b1ef580   0x7ffc3e835829   
[0x51]   python311!Py_RunMain+0x15   0x4e3b1ef5f0   0x7ff614fa42ef   
[0x52]   python3_11 + 0x42ef!python3_11+0x42ef   0x4e3b1ef620   0x7ff614fa58b4   
[0x53]   python3_11 + 0x58b4!python3_11+0x58b4   0x4e3b1ef8c0   0x7ffc84a2257d   
[0x54]   KERNEL32!BaseThreadInitThunk+0x1d   0x4e3b1ef900   0x7ffc856caa58   
[0x55]   ntdll!RtlUserThreadStart+0x28   0x4e3b1ef930   0x0   

@lidavidm
Copy link
Member

Shoot. I think I've seen this once or twice but was never able to figure it out.

Right here you basically make an impossible/nonsensical jump:

[0x11]   arrow_flight!arrow::flight::FlightServerBase::Shutdown+0x119d   0x4e3b1ec480   0x7ffc3c6cd57d   
[0x12]   arrow_flight!arrow::flight::FlightStreamReader::ToTable+0x3d   0x4e3b1ec590   0x7ffc7329b43f   

That is, ToTable should never call that function. So something is seriously borked. I don't really want to blame a "compiler bug" but...

Well. When you generated this stack, which PyArrow package were you using? (If a wheel, what version exactly?) We could disassemble ToTable at that offset and see if there's any explanation for how it managed to pull off that jump.

@powersj
Copy link
Author

powersj commented Feb 20, 2024

which PyArrow package were you using?

I am going to assume pyarrow-15.0.0-cp311-cp311-win_amd64.whl based on the following:

$ pip show pyarrow
Name: pyarrow
Version: 15.0.0
Summary: Python library for Apache Arrow
Home-page: https://arrow.apache.org/
Author:
Author-email:
License: Apache License, Version 2.0
Location: C:\Users\powersj\v3-ear\venv\Lib\site-packages
Requires: numpy
Required-by:

Digging through the site-packages the pyarrow-15.0.0.dist-info/WHEEL I see:

Wheel-Version: 1.0
Generator: bdist_wheel (0.41.1)
Root-Is-Purelib: false
Tag: cp311-cp311-win_amd64

@lidavidm
Copy link
Member

Ok. I think it's a virtual call:

                             **************************************************************
                             *                          FUNCTION                          *
                             **************************************************************
                             undefined ?ToTable@FlightStreamReader@flight@arrow@@QEAA
                               assume GS_OFFSET = 0xff00000000
             undefined         AL:1           <RETURN>
                             0x18d540  548  ?ToTable@FlightStreamReader@flight@arrow@@QEAA
                             Ordinal_548                                     XREF[4]:     Entry Point(*), 
                             ?ToTable@FlightStreamReader@flight@arrow@@QEAA               FUN_18018d520:18018d52d(c), 
                                                                                          18096cad4(*), 1809c53d0(*)  
       18018d540 40 55           PUSH       RBP
       18018d542 53              PUSH       RBX
       18018d543 56              PUSH       RSI
       18018d544 57              PUSH       RDI
       18018d545 41 56           PUSH       R14
       18018d547 48 8d 6c        LEA        RBP,[RSP + -0x37]
                 24 c9
       18018d54c 48 81 ec        SUB        RSP,0xc0
                 c0 00 00 00
       18018d553 48 c7 45        MOV        qword ptr [RBP + -0x29],-0x2
                 d7 fe ff 
                 ff ff
       18018d55b 48 8b 05        MOV        RAX,qword ptr [DAT_18098d388]                    = 00002B992DDFA232h
                 26 fe 7f 00
       18018d562 48 33 c4        XOR        RAX,RSP
       18018d565 48 89 45 27     MOV        qword ptr [RBP + 0x27],RAX
       18018d569 48 8b f2        MOV        RSI,RDX
       18018d56c 48 8b d9        MOV        RBX,RCX
       18018d56f 48 89 55 b7     MOV        qword ptr [RBP + -0x49],RDX
       18018d573 48 8b 01        MOV        RAX,qword ptr [RCX]
       18018d576 48 8d 55 df     LEA        RDX,[RBP + -0x21]
                             LAB_18018d57a                                   XREF[1]:     1808cc9e8(*)  
       18018d57a ff 50 30        CALL       qword ptr [RAX + 0x30]
       18018d57d 90              NOP

That'd make sense given the implementation:

arrow::Result<std::shared_ptr<Table>> FlightStreamReader::ToTable(
const StopToken& stop_token) {
ARROW_ASSIGN_OR_RAISE(auto batches, ToRecordBatches(stop_token));
ARROW_ASSIGN_OR_RAISE(auto schema, GetSchema());
return Table::FromRecordBatches(schema, std::move(batches));
}

So I'd hazard that we have a nullptr or otherwise invalid reader here, and instead of crashing we're just jumping to oblivion. That doesn't explain how we got said reader...

@lidavidm
Copy link
Member

Here's another curious thing.

[0x3c]   _asyncio!PyInit__asyncio+0x48df   0x4e3b1ee9e0   0x7ffc6ad55753   
[0x3d]   _asyncio!PyInit__asyncio+0x4753   0x4e3b1eea80   0x7ffc6ad5602f   
[0x3e]   _asyncio!PyInit__asyncio+0x502f   0x4e3b1eeab0   0x7ffc3e7b9c0c   

...that's supposed to initialize the asyncio native library. How is that in the stack trace?

@lidavidm
Copy link
Member

Hmm, actually, you mention this only happens in a notebook? Does IPython fork the Python kernel process or something?

@powersj
Copy link
Author

powersj commented Feb 21, 2024

you mention this only happens in a notebook? Does IPython fork the Python kernel process or something?

Correct, I seem to be able to run the same code as a python script (e.g. python script.py) all day long, but once it is tossed into a notebook it crashes. I have struggled to find any information on debugging the ipython kernel or to break this down any further in the event that it is not actually an issue with pyarrow.

What else could I provide to help dig into this further?

@lidavidm
Copy link
Member

I think we're going to have to replicate it, and then try to track down a debug build, unfortunately. Or if you know the Python stack trace of the crash we could start investigating from that side.

@powersj
Copy link
Author

powersj commented Feb 26, 2024

Thanks for looking into this so far.

Or if you know the Python stack trace of the crash we could start investigating from that side.

In my original comment I used faulthandler to grab a traceback, does that provide any pointers?

@lidavidm
Copy link
Member

If I'm not mistaken, I don't see any Flight RPC frames in that traceback.

@lidavidm
Copy link
Member

Hmm, or well possibly it's

Current thread 0x000025e0 (most recent call first):
  File "C:\Users\powersj\AppData\Local\Temp\ipykernel_9720\769077188.py", line 26 in <module>

but L26 there is just in the middle of making a dictionary...

@powersj
Copy link
Author

powersj commented Mar 5, 2024

@lidavidm is there anything else I could try or provide?

@lidavidm
Copy link
Member

lidavidm commented Mar 5, 2024

I think either we need a reproducer to look at, or we need to figure out how to produce a debug build and get a backtrace that way. But I've never figured out exactly how to get a debug build working on Windows.

@lidavidm
Copy link
Member

lidavidm commented Mar 5, 2024

It's also possible that it's something like grpc/grpc#29185 which I never managed to track down.

@powersj
Copy link
Author

powersj commented Mar 5, 2024

Would getting the debug grpc logs aid to confirm that it might be related?

@lidavidm
Copy link
Member

lidavidm commented Mar 6, 2024

We can look, but for that issue, I had to attach a debugger - the debug grpc logs don't really tell much in case of a crash

@lidavidm
Copy link
Member

lidavidm commented Mar 6, 2024

Sorry, I haven't gotten any time to actually fire up a Windows VM and try to attempt anything - I'm heavily timeboxed these days and anything Windows automatically eats up a good portion of the day ☹️

@powersj
Copy link
Author

powersj commented Mar 6, 2024

anything Windows automatically eats up a good portion of the day

Completely understand, especially since I am unable to provide a direct reproducer. Please do let me know if there is anything else I can provide or help with. Very happy to run some sort of debug build as well.

@amoeba
Copy link
Member

amoeba commented Mar 6, 2024

I've already spent a bit of time on a reproduction (no luck so far) and can also see about a debug build while I'm there. I'll update here with what I find.

@dburton-influxdata
Copy link

@amoeba Bryce, have you been able to identify any next steps?

@amoeba
Copy link
Member

amoeba commented May 1, 2024

Hi @dburton-influxdata, I think the next step here is still to get a debug build in your hands. I can take another shot at it in the next two weeks here and let you know how that goes.

@amoeba
Copy link
Member

amoeba commented Jun 20, 2024

Just as an update: I didn't end up having the time I had hoped so I haven't looked into this more but producing a debug build would be still be the best next step I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants