Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-1281: [C++/Python] Add Docker setup for testing HDFS IO in C++ and Python #895

Closed
wants to merge 5 commits into from

Conversation

wesm
Copy link
Member

@wesm wesm commented Jul 26, 2017

We aren't testing this in Travis CI because spinning up an HDFS cluster is a bit heavy weight, but this will at least enable us to do easier ongoing validation that this functionality is working properly.

Copy link
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made some initial comments but it seems to be still quite rough so I would wait for more comments until it's settled.

FROM cpcloud86/impala:metastore

RUN sudo apt-add-repository -y ppa:ubuntu-toolchain-r/test && \
sudo apt-get update && \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you normally run as root inside of docker, sudo should not be required here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I had to change the user back to root from the base image

export ARROW_TEST_WEBHDFS_USER=ubuntu

docker stop $ARROW_TEST_NN_HOST
docker rm $ARROW_TEST_NN_HOST
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding --rm to docker run should make this line redundant

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some issues when I removed this line and then added --rm to the docker run command. Perhaps we can tweak in a subsequent PR when adding more features to these ad hoc Docker tests

Copy link
Member Author

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'm adding the Python HDFS tests and will remove the WIP

FROM cpcloud86/impala:metastore

RUN sudo apt-add-repository -y ppa:ubuntu-toolchain-r/test && \
sudo apt-get update && \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I had to change the user back to root from the base image

export ARROW_TEST_WEBHDFS_USER=ubuntu

docker stop $ARROW_TEST_NN_HOST
docker rm $ARROW_TEST_NN_HOST
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Change-Id: I40d3acd46802ecb2a37f4d83ed08a841645772ba
@wesm wesm changed the title WIP ARROW-1281: [C++/Python] Add Docker setup for testing HDFS IO in C++ and Python ARROW-1281: [C++/Python] Add Docker setup for testing HDFS IO in C++ and Python Jul 26, 2017
@wesm
Copy link
Member Author

wesm commented Jul 26, 2017

This is ready to go. This is good for one-off use but we should see about refactoring our CI scripts to be able to share code more easily with this kind of thing. It's a little bit tricky because of the various interdependent environment variables

As part of ARROW-1213 I will add S3 testing to this setup so you can check things out locally against an access/secret key for a cloud bucket

wesm added 3 commits July 27, 2017 11:18
Change-Id: I9819fb4f79ae202164dc4cf41c8d35961cff2589
Change-Id: I820f8eb707df50c6d12602fe2d816c80b1402ee1
Change-Id: Ib247a679667a40365846507b6ea9795660226272
@wesm
Copy link
Member Author

wesm commented Jul 27, 2017

@xhochy let me know any more comments on this. I'm going to look at the Parquet RC in the meantime

@wesm
Copy link
Member Author

wesm commented Jul 28, 2017

+1; I'm going to keep adding some more ad hoc tests

@asfgit asfgit closed this in 8841bc0 Jul 28, 2017
@wesm wesm deleted the ARROW-1281 branch July 28, 2017 14:31
Copy link
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments about best practices I gather from developing a lot with Docker comtainers

# under the License.

use_gcc() {
export CC=gcc-4.9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to set these variables in The Dockerfile via ENV CC gcc-4.9

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure about this as the idea was to be able to easily switch between compilers depending on the test script

set -e

export PATH="$MINICONDA/bin:$PATH"
conda update -y -q conda
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my dockerfiles, i normally include the conda installation in the image. Thus I get faster iteration times on repeated test runs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is also helpful to separate the base installation and the project specific dependencies into different layers of the docker image so they are shared between similar images.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will do that in the next patch

set -ex

docker build -t arrow-hdfs-test -f hdfs/Dockerfile .
bash hdfs/restart_docker_container.sh
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multiple docker containers, have a look at docker-compose. This lets you start and plug multiple containers together.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants