-
Notifications
You must be signed in to change notification settings - Fork 13
Troubleshooting
The goal of this page is to present frequent issues when executing the generator and their workarounds.
This error arises especially when generating large datasets. Hadoop internally creates several files, threads and its memory footprint is often large. This issue usually appears when the Linux ulimit
settings are not set appropriately. To solve this problem, try to increase the following system limits:
- stack size
- number of opened files
- number of maximum threads
This error can be cause for both the reasons explained for the error above, but also for not setting the java options correctly. To increase the amount of memory Java can allocate for each child, go to the mapred-site.xml
file, which can be found in the etc/hadoop
folder of Hadoop, and set the following option to a larger value:
<property>
<name>mapred.child.java.opts</name>
<value>-Xmx4G</value>
</property>
If you are running Hadoop locally, you also can set the HADOOP_CLIENT_OPTS
environment variable:
export HADOOP_CLIENT_OPTS="-Xmx4G"
Although it may appear that you have enough space in your filesystem for HDFS, some operations are performed (by default) using your temp directory. Be sure that the place Hadoop uses to store these files has enough space available.
An example exception:
java.lang.Exception: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:226)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)
at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:245)
at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:210)
at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$OnDiskMerger.merge(MergeManagerImpl.java:564)
at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:224)
... 10 more
Solution: To specify a location with ample space, set the value of hadoop.tmp.dir
by adding the following property into the core-site.xml
file:
<property>
<name>hadoop.tmp.dir</name>
<value>/path/to/dir</value>
</property>
/tmp
to some extent, occupying approx. 300 MB of space (for an SF1000) run in /tmp/hadoop
and /tmp/hadoop-unjar...
. The latter directory is created by the hadoop jar
command and changing it is non-trivial. Potential workarounds include:
- Clean the Hadoop temporary files between runs with
rm -rf /tmp/hadoop*
. - Run Hadoop using the Docker image. This way, the
/tmp
directory will reside within the container. - Mount
/tmp
to a different location - Experiment with other options to move the location of
hadoop jar
's temporary directory.
Error message:
Exception in thread "main" java.lang.IllegalStateException: HadoopPersonGenerator failed
The root cause (along with the suggested solution) can be found in $HADOOP_HOME/logs/hadoop-*-resourcemanager-*.log
:
Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
For more detailed output, check the application tracking page: http://localhost:8088/cluster/app/application_1584712919598_0016 Then click on links to logs of each attempt.
. Failing the application. APPID=application_1584712919598_0016
For example, if your $HADOOP_HOME
value is /home/ubuntu/hadoop
, use:
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=/home/ubuntu/hadoop</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=/home/ubuntu/hadoop</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=/home/ubuntu/hadoop</value>
</property>
Restarting Hadoop is not required.
Error message:
mkdir: Call From localhost/127.0.0.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
This can be caused by (1) an incorrectly set $HADOOP_HOME
and/or (2) the lack of formatting HDFS. Make sure you have formatted HDFS.
Formatting fails with the following error:
Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode
Based on our Stack Overflow answer:
One cause behind this problem might be a user-defined
HDFS_DIR
environment variable. This is picked up by scripts such as the following lines inlibexec/hadoop-functions.sh
:HDFS_DIR=${HDFS_DIR:-"share/hadoop/hdfs"} ... if [[ -z "${HADOOP_HDFS_HOME}" ]] && [[ -d "${HADOOP_HOME}/${HDFS_DIR}" ]]; then export HADOOP_HDFS_HOME="${HADOOP_HOME}" fi
The solution is to avoid defining an environment variable
HDFS_DIR
.The recommendations in the comments of question are correct – use the
hadoop classpath
command to identify whetherhadoop-hdfs-*.jar
files are present in the classpath or not. They were missing in my case.
The bin/hdfs namenode -format
command prompts the user to type Y to continue.
Re-format filesystem in Storage Directory root=...; location= null ? (Y or N)
You can pipe a Y
character as follows:
$ echo 'Y' | bin/hdfs namenode -format
Or even better, use bin/hadoop
instead of bin/hdfs
:
$ bin/hadoop namenode -format
-
The logs have the following exception:
Caused by: java.lang.ClassNotFoundException: javax.activation.DataSource
Cause: you're using Java 11+ instead of Java 8 (loosely related Stack Overflow question). Use JDK 8.
-
The logs have the following error:
ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs usable space is below configured utilization percentage/no more usable space [ /tmp/nm-local-dir : used space above threshold of 90.0% ] ; 1/1 log-dirs usable space is below configured utilization percentage/no more usable space [ /home/szarnyasg/hadoop-yarn/logs/userlogs : used space above threshold of 90.0% ]
The log message is pretty clear: the location of the Hadoop temp directory has too little free space.
Solution: in
core-site.xml
, sethadoop.tmp.dir
to a location with ample space (see alsojava.io.IOException: No space left on device
).
A typical problem on Ubuntu (Stack Overflow) is that your local hostname gets resolved incorrectly. To amend this, edit your /etc/hosts
file so that:
- there should not be a
127.0.1.1 <hostname>
entry - there should be a
127.0.0.1 localhost <hostname>
entry.
Error message:
Error: java.lang.RuntimeException: java.io.FileNotFoundException: File does not exist: /user/ubuntu/hadoop/m0activityFactors.txt (inode 16841) [Lease. Holder: DFSClient_attempt_1584810038683_0015_r_000000_2_-1750671586_1, pending creates: 1]
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2898)
...
org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error while doing final merge
It's likely that Hadoop uses the default /tmp
directory. Set an alternative temporary directory via the core-site.xml
file's hadoop.tmp.dir
configuration property.
DiskChecker$DiskErrorException: Could not find any valid local directory for ...
It's likely that Hadoop uses the default /tmp
directory. Set an alternative temporary directory via the core-site.xml
file's hadoop.tmp.dir
configuration property.
java.io.IOException: No valid local directories in property: mapreduce.cluster.local.dir
The hadoop.tmp.dir
configuration property in the core-site.xml
file points to a location that does not exist.
You can try increasing the timeout (mapreduce.task.timeout
) and the limit on job counters (mapreduce.job.counters.limitmapreduce.job.counters.limit
) in the mapred-site.xml
Hadoop configuration file.
<property>
<name>mapreduce.task.timeout</name>
<value>600000000</value>
</property>
<property>
<name>mapreduce.job.counters.limit</name>
<value>20000</value>
</property>
However, these settings might have caused random crashes for our previous setups so use them with caution.