Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand AWS Lambda layer pyarrow build? #1969

Closed
nkarpov opened this issue Jan 28, 2023 · 2 comments · Fixed by #1977
Closed

Expand AWS Lambda layer pyarrow build? #1969

nkarpov opened this issue Jan 28, 2023 · 2 comments · Fixed by #1977
Labels
question Further information is requested

Comments

@nkarpov
Copy link
Contributor

nkarpov commented Jan 28, 2023

We're exploring building a compatible layer for aws-sdk-pandas (delta-io/delta-rs#1108) now that deltalake is integrated with #1834

Today the pyarrow build in ./building/lambda/build-lambda-layer.sh is not generating some of the optional components found in https://github.com/apache/arrow/blob/master/docs/source/developers/cpp/building.rst#optional-components and which deltalake requires.

cmake \
    -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
    -DCMAKE_INSTALL_LIBDIR=lib \
    -DARROW_PYTHON=ON \
    -DARROW_PARQUET=ON \
    -DARROW_WITH_SNAPPY=ON \
    -DARROW_WITH_ZLIB=ON \
    -DARROW_FLIGHT=OFF \
    -DARROW_GANDIVA=OFF \
    -DARROW_ORC=OFF \
    -DARROW_CSV=OFF \
    -DARROW_PLASMA=OFF \
    -DARROW_WITH_BZ2=OFF \
    -DARROW_WITH_ZSTD=OFF \
    -DARROW_WITH_LZ4=OFF \
    -DARROW_WITH_BROTLI=OFF \
    -DARROW_BUILD_TESTS=OFF \
    -GNinja \

Notably, dataset, as the error in delta-io/delta-rs#1108 shows, but perhaps more too (need to investigate).

Ideally, we'd have the pyarrow build in this repository supply all the required libraries (some are turned on already), so that the delta-rs/deltalake repo can publish it's own compatible layer.

Thoughts?

@nkarpov nkarpov added the question Further information is requested label Jan 28, 2023
@nkarpov nkarpov changed the title Expand Lambda pyarrow build? Expand AWS Lambda layer pyarrow build? Jan 28, 2023
@jaidisido
Copy link
Contributor

Hi @nkarpov, as you might have guessed, the reason we have turned off some of these parameters is because we are trying to keep the size of the layer below the Lambda limit (250Mb unzipped, 50Mb zipped). The PyArrow dataset module is one of them as it was deemed not essential to our methods.

We would be wiling to reconsider as long as the impact on the layer size is reasonable. So my suggestion if you accept is for your team to go through the exercise of building our layer with the appropriate Arrow arguments you require for yours. Once you have them and if the layer size is still reasonable we will look into publishing it.

@nkarpov
Copy link
Contributor Author

nkarpov commented Jan 31, 2023

That's awesome thank you @jaidisido !

I've created the PR #1977 which I hope is a tolerable bump on the size. I think it's likely also other packages may in the future benefit from this change since the change is bringing this internal PyArrow build closer to the published pip version.

@jaidisido jaidisido linked a pull request Feb 2, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants