Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable providing own hadoop for pyspark notebook image #220

Open
t92549 opened this issue Apr 20, 2022 · 0 comments
Open

Enable providing own hadoop for pyspark notebook image #220

t92549 opened this issue Apr 20, 2022 · 0 comments
Labels
Docker Issue related to the Docker side of the project good first issue Small, lower complexity and doesn't require pre-existing Gaffer knowledge
Milestone

Comments

@t92549
Copy link
Contributor

t92549 commented Apr 20, 2022

In the hdfs and Accumulo Dockerfiles, users can provide their own builds of Accumulo, ZooKeeper and Hadoop to be used instead of building them inside the image:

# Allow users to provide their own builds of Accumulo, ZooKeeper and Hadoop
COPY ./files/ .
# Otherwise, download official distributions
RUN if [ ! -f "./accumulo-${ACCUMULO_VERSION}-bin.tar.gz" ]; then \
(wget -nv -O ./accumulo-${ACCUMULO_VERSION}-bin.tar.gz ${ACCUMULO_DOWNLOAD_URL} || wget -nv -O ./accumulo-${ACCUMULO_VERSION}-bin.tar.gz ${ACCUMULO_BACKUP_DOWNLOAD_URL}); \

This can save a lot of time with repeated builds.
This cannot be done, however, for building hadoop inside the pyspark notebook Dockerfile:
ARG HADOOP_VERSION=3.2.2
ARG HADOOP_DOWNLOAD_URL="https://www.apache.org/dyn/closer.cgi?action=download&filename=hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz"
ARG HADOOP_BACKUP_DOWNLOAD_URL="https://archive.apache.org/dist/hadoop/common/hadoop-${HADOOP_VERSION}/hadoop-${HADOOP_VERSION}.tar.gz"
RUN cd /opt && \
(wget -nv -O ./hadoop-${HADOOP_VERSION}.tar.gz ${HADOOP_DOWNLOAD_URL} || wget -nv -O ./hadoop-${HADOOP_VERSION}.tar.gz ${HADOOP_BACKUP_DOWNLOAD_URL}) && \

It would be great if this was added to that Dockerfile also.

@t92549 t92549 added this to the v2_backlog milestone Apr 20, 2022
@GCHQDeveloper314 GCHQDeveloper314 added good first issue Small, lower complexity and doesn't require pre-existing Gaffer knowledge Docker Issue related to the Docker side of the project and removed beginner labels Jul 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docker Issue related to the Docker side of the project good first issue Small, lower complexity and doesn't require pre-existing Gaffer knowledge
Projects
None yet
Development

No branches or pull requests

2 participants