Skip to content
This repository has been archived by the owner on Jun 26, 2020. It is now read-only.

Data volumes for persistence and connect to Hive #68

Open
darrenhaken opened this issue Feb 24, 2017 · 2 comments
Open

Data volumes for persistence and connect to Hive #68

darrenhaken opened this issue Feb 24, 2017 · 2 comments

Comments

@darrenhaken
Copy link

I'm new to the Hadoop stack so forgive me if I'm missing something obvious.

I had two requirements I'm trying to work out with this Docker image.

  1. how to persist hdfs to a data volume (is hdfs running?)
  2. how to connect another container running another part of the Hadoop stack i.e Hive.

Can anyone help?

@patrickneubauer
Copy link

Dear darrenhaken,

we face similar requirements and wonder if you were able to resolve yours.

If so, could you please point us towards the direction or steps that you chose to configure the Hive container to use the hdfs running within this Docker image?

Cheers, Patrick

@Mehran91z
Copy link

Hi, I have same problem for using volume for my hdfs input/output.

I want to make directory by $HADOOP_PREFIX/bin/hadoop fs -mkdir mytest and then put files on mytest/input and do something on them like wordcount and I want to persist input and output data after each docker run!

How is it possible?

I made so far:

  1. Added these codes to hdfs-site.xml:
    <property> <name>dfs.datanode.data.dir</name> <value>file:///home/app/hdfs/datanode</value> <description>DataNode directory</description> </property>

  2. Create a docker volume with name 'myvol'

  3. Use -v for run image:
    docker run -v myvol:/home/app -it c29b621ba74a /etc/bootstrap.sh -bash

But in /home/app directory there are just my created files by vi command and another folder named 'hdfs', that not working to persist input/output data.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants