Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help with running pyrrow on arm64 in Docker #10929

Closed
luther7 opened this issue Aug 13, 2021 · 10 comments
Closed

Help with running pyrrow on arm64 in Docker #10929

luther7 opened this issue Aug 13, 2021 · 10 comments

Comments

@luther7
Copy link

luther7 commented Aug 13, 2021

I have an issue running Arrow with pyarrow on arm64 - in Docker - and hope to get some advice.

I get the following errors:

<jemalloc>: Unsupported system page size

The Docker image I’m building is Debian and based on python:3.8.9-slim-buster. It's used for Apache Airflow.

I’ve tried a nightly pyarrow wheel but it doesn't fix the issue. Neither does installing jemalloc from source.

Installing Arrow and pyarrow from source does fix the issue. Any advice for fixing this issue without installing Arrow and pyarrow from source?

I contacted the mailing list on the 27th of July for this. Thanks!

@westonpace
Copy link
Member

If you don't want to install Arrow from source you can try using the environment variable ARROW_DEFAULT_MEMORY_POOL to switch to a different allocator:

(arrow-release-5) pace@pace-desktop:~$ ARROW_DEFAULT_MEMORY_POOL=jemalloc python -c "import pyarrow; print(pyarrow.default_memory_pool().backend_name)"
jemalloc
(arrow-release-5) pace@pace-desktop:~$ ARROW_DEFAULT_MEMORY_POOL=system python -c "import pyarrow; print(pyarrow.default_memory_pool().backend_name)"
system
(arrow-release-5) pace@pace-desktop:~$ ARROW_DEFAULT_MEMORY_POOL=mimalloc python -c "import pyarrow; print(pyarrow.default_memory_pool().backend_name)"
mimalloc

I'm not sure off the top of my head what options are available in the pyarrow wheel.

Also, some research suggests that there might be certain settings that can be set to get jemalloc working. Can you share your Dockerfile so I can try and reproduce?

I contacted the mailing list on the 27th of July for this. Thanks!

Yikes, unfortunately, I don't see any message on that date on either user@ or dev@

Did you make sure to subscribe to the mailing list before you sent your message? The mailing list will ignore emails from unsubscribed users. If you're pretty confident you subscribed then email me at weston dot pace at gmail dot com (remove spaces and replace dot with . and at with @) so I can get your email.

@xhochy
Copy link
Member

xhochy commented Aug 13, 2021

This error is known to me and can occur if you cross-compile pyarrow. @kszucs do we do that for ARM wheels? If so, we need to specify the page size in the jemalloc configure explicitly.

@kszucs
Copy link
Member

kszucs commented Aug 13, 2021

We build the arm64 wheels on graviton2 travis instances in docker, so we do not cross-compile. @xhochy do you have a reference to the issue?

Sadly I don't have an arm64 machine at hand, so I'm unable to reproduce the issue.
A wheel can be produced using the following command:

pip install -e arrow/dev/archery[docker]
ARCH=arm64v8 PYTHON=3.8 archery docker run python-wheel-manylinux-2014
# wheel is going to be available under arrow/python/repaired_wheels/

@xhochy
Copy link
Member

xhochy commented Aug 13, 2021

We build the arm64 wheels on graviton2 travis instances in docker, so we do not cross-compile. @xhochy do you have a reference to the issue?

I came across thus while building the conda packages for osx-arm64. No real documentation except what is in the recipe.

Sadly I don't have an arm64 machine at hand, so I'm unable to reproduce the issue.

A wheel can be produced using the following command:

pip install -e arrow/dev/archery[docker]

ARCH=arm64v8 PYTHON=3.8 archery docker run python-wheel-manylinux-2014

# wheel is going to be available under arrow/python/repaired_wheels/

I would expect the error to also happen when run using Qemu. It could also be that the kernel of the graviton instances has a larger than normal page size and thus only breaks on smaller machines.

@kszucs
Copy link
Member

kszucs commented Aug 13, 2021

I would expect the error to also happen when run using Qemu. It could also be that the kernel of the graviton instances has a larger than normal page size and thus only breaks on smaller machines.

Perhaps we can overcome that issue using jemalloc/jemalloc#467 (comment) ?

@xhochy
Copy link
Member

xhochy commented Aug 13, 2021

Yes, I'm doing that with --with-lg-page=14 in the OSX case.

@luther7
Copy link
Author

luther7 commented Aug 16, 2021

Thanks everyone for your help!

Can you share your Dockerfile so I can try and reproduce?

Unfortunately I can't as it's used at work.

If you don't want to install Arrow from source you can try using the environment variable ARROW_DEFAULT_MEMORY_POOL to switch to a different allocator:

This doesn't seem to help either. Here's a log snippet where I've echoed and grepped the container's environment to check that it's set properly:

[2021-08-16T03:16:17Z] ARROW_DEFAULT_MEMORY_POOL=system
SNIP
[2021-08-16T03:16:18Z] <jemalloc>: Unsupported system page size

Now I'm wondering if something else except for Arrow is using jemalloc

Did you make sure to subscribe to the mailing list before you sent your message? The mailing list will ignore emails from unsubscribed users. If you're pretty confident you subscribed then email me at weston dot pace at gmail dot com (remove spaces and replace dot with . and at with @) so I can get your email.

Yes, I overlooked subscribing to the mailing list. I have now. Thanks!

Yes, I'm doing that with --with-lg-page=14 in the OSX case.

I'll trying build a wheel and/or installing jemalloc from source. Thanks!

@kszucs
Copy link
Member

kszucs commented Aug 16, 2021

@nesyamun could you please try out pyarrow-6.0.0.dev45-cp38-cp38-manylinux2014_aarch64.whl?

@luther7
Copy link
Author

luther7 commented Aug 17, 2021

Thanks @kszucs, unfortunately it doesn't work. I've checked the page size of the system I'm running pyarrow on - it's 64 KiB.

Would it be possible to get a wheel with jemalloc configured for this page size, or would that be something I would need to maintain myself? Building from your fork with ARROW_JEMALLOC_LG_PAGE=16 works. If it is merged, would it be possible for a wheel to be published configured as such?

@luther7
Copy link
Author

luther7 commented Aug 19, 2021

@kszucs Thanks! I've tested this and it works: #10940 (comment)

Closing as #10940 is merged.

Thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants