Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No mangling occurs #58

Open
pitrou opened this issue Dec 19, 2024 · 11 comments
Open

No mangling occurs #58

pitrou opened this issue Dec 19, 2024 · 11 comments

Comments

@pitrou
Copy link

pitrou commented Dec 19, 2024

Hello,

I'm probably doing something wrong, but I'm trying to integrate delvewheel in the Apache Arrow build procedure for Windows wheels, and delvewheel doesn't seem to mangle msvcp140.dll.

Here are the commands that we're running on CI:

delvewheel show -vv --ignore-existing --include msvcp140.dll %WHEEL_NAME% || exit /B 1
delvewheel repair -vv --ignore-existing --include msvcp140.dll %WHEEL_NAME% -w repaired_wheels || exit /B 1

and here is the output from those two commands:

extracting pyarrow-19.0.0.dev217-cp39-cp39-win_amd64.whl to C:\Users\ContainerAdministrator\AppData\Local\Temp\tmpd1uih2k1
Analyzing pyarrow-19.0.0.dev217-cp39-cp39-win_amd64.whl
The following DLLs will be copied into the wheel.
    msvcp140.dll (C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29333\bin\HostX64\x64\msvcp140.dll)
The following DLLs are already in the wheel and will not be copied.
    arrow.dll (pyarrow-19.0.0.dev217-cp39-cp39-win_amd64.whl\pyarrow\arrow.dll)
    arrow_acero.dll (pyarrow-19.0.0.dev217-cp39-cp39-win_amd64.whl\pyarrow\arrow_acero.dll)
    arrow_dataset.dll (pyarrow-19.0.0.dev217-cp39-cp39-win_amd64.whl\pyarrow\arrow_dataset.dll)
    arrow_flight.dll (pyarrow-19.0.0.dev217-cp39-cp39-win_amd64.whl\pyarrow\arrow_flight.dll)
    arrow_python.dll (pyarrow-19.0.0.dev217-cp39-cp39-win_amd64.whl\pyarrow\arrow_python.dll)
    arrow_python_flight.dll (pyarrow-19.0.0.dev217-cp39-cp39-win_amd64.whl\pyarrow\arrow_python_flight.dll)
    arrow_python_parquet_encryption.dll (pyarrow-19.0.0.dev217-cp39-cp39-win_amd64.whl\pyarrow\arrow_python_parquet_encryption.dll)
    arrow_substrait.dll (pyarrow-19.0.0.dev217-cp39-cp39-win_amd64.whl\pyarrow\arrow_substrait.dll)
    parquet.dll (pyarrow-19.0.0.dev217-cp39-cp39-win_amd64.whl\pyarrow\parquet.dll)
The following DLLs are assumed to be present in the end user's environment and will not be copied into the wheel.
    advapi32.dll
    api-ms-win-crt-convert-l1-1-0.dll
    api-ms-win-crt-environment-l1-1-0.dll
    api-ms-win-crt-filesystem-l1-1-0.dll
    api-ms-win-crt-heap-l1-1-0.dll
    api-ms-win-crt-locale-l1-1-0.dll
    api-ms-win-crt-math-l1-1-0.dll
    api-ms-win-crt-runtime-l1-1-0.dll
    api-ms-win-crt-stdio-l1-1-0.dll
    api-ms-win-crt-string-l1-1-0.dll
    api-ms-win-crt-time-l1-1-0.dll
    api-ms-win-crt-utility-l1-1-0.dll
    bcrypt.dll
    concrt140.dll
    crypt32.dll
    iphlpapi.dll
    kernel32.dll
    ncrypt.dll
    ole32.dll
    python39.dll
    secur32.dll
    shell32.dll
    user32.dll
    userenv.dll
    vcruntime140.dll
    vcruntime140_1.dll
    version.dll
    winhttp.dll
    wininet.dll
    ws2_32.dll
    wsock32.dll
extracting pyarrow-19.0.0.dev217-cp39-cp39-win_amd64.whl to C:\Users\ContainerAdministrator\AppData\Local\Temp\tmpkpwp_c2i
repairing dist\pyarrow-19.0.0.dev217-cp39-cp39-win_amd64.whl
finding DLL dependencies
analyzing package-level extension module pyarrow\lib.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_acero.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_compute.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_csv.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_dataset.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_dataset_orc.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_dataset_parquet.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_dataset_parquet_encryption.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_feather.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_flight.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_fs.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_gcsfs.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_hdfs.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_json.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_orc.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_parquet.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_parquet_encryption.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_pyarrow_cpp_tests.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_s3fs.cp39-win_amd64.pyd
analyzing package-level extension module pyarrow\_substrait.cp39-win_amd64.pyd
External dependencies to copy into the wheel are
{'msvcp140.dll'}
External dependencies not to copy into the wheel are
{   'advapi32.dll',
    'api-ms-win-crt-convert-l1-1-0.dll',
    'api-ms-win-crt-environment-l1-1-0.dll',
    'api-ms-win-crt-filesystem-l1-1-0.dll',
    'api-ms-win-crt-heap-l1-1-0.dll',
    'api-ms-win-crt-locale-l1-1-0.dll',
    'api-ms-win-crt-math-l1-1-0.dll',
    'api-ms-win-crt-runtime-l1-1-0.dll',
    'api-ms-win-crt-stdio-l1-1-0.dll',
    'api-ms-win-crt-string-l1-1-0.dll',
    'api-ms-win-crt-time-l1-1-0.dll',
    'api-ms-win-crt-utility-l1-1-0.dll',
    'arrow.dll',
    'arrow_acero.dll',
    'arrow_dataset.dll',
    'arrow_flight.dll',
    'arrow_python.dll',
    'arrow_python_flight.dll',
    'arrow_python_parquet_encryption.dll',
    'arrow_substrait.dll',
    'bcrypt.dll',
    'concrt140.dll',
    'crypt32.dll',
    'iphlpapi.dll',
    'kernel32.dll',
    'ncrypt.dll',
    'ole32.dll',
    'parquet.dll',
    'python39.dll',
    'secur32.dll',
    'shell32.dll',
    'user32.dll',
    'userenv.dll',
    'vcruntime140.dll',
    'vcruntime140_1.dll',
    'version.dll',
    'winhttp.dll',
    'wininet.dll',
    'ws2_32.dll',
    'wsock32.dll'}
copying DLLs into pyarrow.libs
copying C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29333\bin\HostX64\x64\msvcp140.dll -> C:\Users\ContainerAdministrator\AppData\Local\Temp\tmpkpwp_c2i\pyarrow.libs\msvcp140.dll
mangling DLL names
repairing lib.cp39-win_amd64.pyd -> lib.cp39-win_amd64.pyd
repairing _acero.cp39-win_amd64.pyd -> _acero.cp39-win_amd64.pyd
repairing _compute.cp39-win_amd64.pyd -> _compute.cp39-win_amd64.pyd
repairing _csv.cp39-win_amd64.pyd -> _csv.cp39-win_amd64.pyd
repairing _dataset.cp39-win_amd64.pyd -> _dataset.cp39-win_amd64.pyd
repairing _dataset_orc.cp39-win_amd64.pyd -> _dataset_orc.cp39-win_amd64.pyd
repairing _dataset_parquet.cp39-win_amd64.pyd -> _dataset_parquet.cp39-win_amd64.pyd
repairing _dataset_parquet_encryption.cp39-win_amd64.pyd -> _dataset_parquet_encryption.cp39-win_amd64.pyd
repairing _feather.cp39-win_amd64.pyd -> _feather.cp39-win_amd64.pyd
repairing _flight.cp39-win_amd64.pyd -> _flight.cp39-win_amd64.pyd
repairing _fs.cp39-win_amd64.pyd -> _fs.cp39-win_amd64.pyd
repairing _gcsfs.cp39-win_amd64.pyd -> _gcsfs.cp39-win_amd64.pyd
repairing _hdfs.cp39-win_amd64.pyd -> _hdfs.cp39-win_amd64.pyd
repairing _json.cp39-win_amd64.pyd -> _json.cp39-win_amd64.pyd
repairing _orc.cp39-win_amd64.pyd -> _orc.cp39-win_amd64.pyd
repairing _parquet.cp39-win_amd64.pyd -> _parquet.cp39-win_amd64.pyd
repairing _parquet_encryption.cp39-win_amd64.pyd -> _parquet_encryption.cp39-win_amd64.pyd
repairing _pyarrow_cpp_tests.cp39-win_amd64.pyd -> _pyarrow_cpp_tests.cp39-win_amd64.pyd
repairing _s3fs.cp39-win_amd64.pyd -> _s3fs.cp39-win_amd64.pyd
repairing _substrait.cp39-win_amd64.pyd -> _substrait.cp39-win_amd64.pyd
repairing msvcp140.dll -> msvcp140.dll
patching pyarrow\__init__.py
updating pyarrow-19.0.0.dev217.dist-info\RECORD
repackaging wheel
[ ... ]
adding pyarrow.libs\.load-order-pyarrow-19.0.0.dev217
adding pyarrow.libs\msvcp140.dll
fixed wheel written to C:\arrow\python\repaired_wheels\pyarrow-19.0.0.dev217-cp39-cp39-win_amd64.whl

The full output can be seen at https://github.com/ursacomputing/crossbow/actions/runs/12412729686/job/34653120828#step:7:7026

@adang1345
Copy link
Owner

This is due to the --ignore-existing flag. The documentation states --ignore-existing: don't search for or vendor in DLLs that are already in the wheel. Don't mangle the names of these DLLs or their direct dependencies. The reason for this design is that if a DLL is already in the wheel, then delvewheel avoids modifying it because the assumption is that the DLL was placed there through some other bundling technique, and delvewheel does not want to interfere with that bundling technique. In your case, mangling msvcp140.dll would require modifying a DLL that's already in the wheel.

delvewheel works best when it handles the bundling of all the DLL dependencies. Is there a reason that you put arrow.dll, arrow_acero.dll, etc. in the wheel prior to running delvewheel?

@pitrou
Copy link
Author

pitrou commented Dec 19, 2024

delvewheel works best when it handles the bundling of all the DLL dependencies. Is there a reason that you put arrow.dll, arrow_acero.dll, etc. in the wheel prior to running delvewheel?

From what I remember, some users of PyArrow wheels actually link directly to our C++ DLLs (arrow.dll etc.), so I think we'd rather not mangle those names. cc @raulcd

@raulcd
Copy link

raulcd commented Dec 19, 2024

This predates me and I am unsure on the reasoning behind it, maybe @kszucs or @jorisvandenbossche know

@pitrou
Copy link
Author

pitrou commented Dec 19, 2024

Also cc @xhochy in that case. But that might be an obsolete requirement, especially now the C Data Interface exists.

@xhochy
Copy link

xhochy commented Dec 20, 2024

Turbodbc was sadly never migrated to the C data interface and still requires an arrow.dll named as such if you install it using pip directly.

@pitrou
Copy link
Author

pitrou commented Dec 20, 2024

The reason for this design is that if a DLL is already in the wheel, then delvewheel avoids modifying it because the assumption is that the DLL was placed there through some other bundling technique

TBH it would be much easier if this behaviour was controllable.

Or perhaps we simply need a much lower-level facility, because the one thing we want to do is to name-mangle msvcp140.dll and patch the existing wheel DLLs to point to the mangled version.

@pitrou
Copy link
Author

pitrou commented Dec 20, 2024

So, right now I get:

adding pyarrow.libs\.load-order-pyarrow-19.0.0.dev228
adding pyarrow.libs\arrow-154a1467bc8e3c6140e95fdb75b956e3.dll
adding pyarrow.libs\arrow_acero-b03d7bb1e46e0610601803851df65483.dll
adding pyarrow.libs\arrow_dataset-5bf3f306a495eaa54419ebcf2e5c26d7.dll
adding pyarrow.libs\arrow_flight-27294d6a4ce5960c937039a2bce38350.dll
adding pyarrow.libs\arrow_python-d43270ed39dc7e15ab47da3a7c5c9325.dll
adding pyarrow.libs\arrow_python_flight-4a486a585d8dfdfe3067a19474fbe189.dll
adding pyarrow.libs\arrow_python_parquet_encryption-ca05a0e13acf1e4ddab8dcf2cc10553e.dll
adding pyarrow.libs\arrow_substrait-c561dcc8a8204addd6a6d51b90bdaa14.dll
adding pyarrow.libs\msvcp140-c691275f5538a516494cdd39ad3f5ca6.dll
adding pyarrow.libs\parquet-b1edd5facac8a93a0f910b1669f46c54.dll
fixed wheel written to C:\arrow\python\repaired_wheels\pyarrow-19.0.0.dev228-cp39-cp39-win_amd64.whl

The problem is that we really need arrow.dll and friends to remain in the pyarrow package directory. This is because we also ship the corresponding arrow.lib (etc.) files for third-party linking to the Arrow DLLs.

@pitrou
Copy link
Author

pitrou commented Dec 20, 2024

Looks like we could perhaps call delvewheels._dll_utils.replace_needed directly. Of course, that is a private API...

@adang1345
Copy link
Owner

TBH it would be much easier if this behaviour was controllable.

I'll add a command-line switch to control this behavior.

@pitrou
Copy link
Author

pitrou commented Dec 20, 2024

Thanks a lot! We also need the existing DLLs to remain in the same place, I don't know if that's what you have in mind.

@adang1345
Copy link
Owner

Yes, the existing DLLs will remain in the same place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants