Skip to content

Troubleshooting

Gabor Szarnyas edited this page Feb 27, 2022 · 5 revisions

Troubleshooting

The goal of this page is to present frequent issues when executing the generator and their workarounds.

java.lang.Throwable child error

This error arises especially when generating large datasets. Hadoop internally creates several files, threads and its memory footprint is often large. This issue usually appears when the Linux ulimit settings are not set appropriately. To solve this problem, try to increase the following system limits:

  • stack size
  • number of opened files
  • number of maximum threads

java.lang.OutOfMemoryError

This error can be cause for both the reasons explained for the error above, but also for not setting the java options correctly. To increase the amount of memory Java can allocate for each child, go to the mapred-site.xml file, which can be found in the etc/hadoop folder of Hadoop, and set the following option to a larger value:

<property>
  <name>mapred.child.java.opts</name>
  <value>-Xmx4G</value>
</property>

If you are running Hadoop locally, you also can set the HADOOP_CLIENT_OPTS environment variable:

export HADOOP_CLIENT_OPTS="-Xmx4G"

java.io.IOException: No space left on device

Although it may appear that you have enough space in your filesystem for HDFS, some operations are performed (by default) using your temp directory. Be sure that the place Hadoop uses to store these files has enough space available.

An example exception:

java.lang.Exception: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
        at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)                                                                                             
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)                                                                                        
Caused by: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in OnDiskMerger - Thread to merge on-disk map-outputs
        at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)                                                                                                     
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)                                                                                                    
        at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)                          
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)                                                                                                   
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)                                                                                                        
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)                                      
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)                                                                                           
        at java.lang.Thread.run(Thread.java:748)                                                                                                                                           
Caused by: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device                                                                                      
        at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:226)                   
        at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122)                                                                                                         
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)                                                                         
        at java.io.DataOutputStream.write(DataOutputStream.java:107)                                                            
        at org.apache.hadoop.mapred.IFileOutputStream.write(IFileOutputStream.java:88)                                                                                               
        at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)                                                                         
        at java.io.DataOutputStream.write(DataOutputStream.java:107)                                                            
        at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:245)                                                                                                              
        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:210)                                                                                                      
        at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl$OnDiskMerger.merge(MergeManagerImpl.java:564)               
        at org.apache.hadoop.mapreduce.task.reduce.MergeThread.run(MergeThread.java:94)                                                                                               
Caused by: java.io.IOException: No space left on device                                                                                                                
        at java.io.FileOutputStream.writeBytes(Native Method)                                                                                                         
        at java.io.FileOutputStream.write(FileOutputStream.java:326)                          
        at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:224)
        ... 10 more

Solution: To specify a location with ample space, set the value of hadoop.tmp.dir by adding the following property into the core-site.xml file:

<property>
  <name>hadoop.tmp.dir</name>
  <value>/path/to/dir</value>
</property>

⚠️ Note that even with this configuration set, Hadoop will still use /tmp to some extent, occupying approx. 300 MB of space (for an SF1000) run in /tmp/hadoop and /tmp/hadoop-unjar.... The latter directory is created by the hadoop jar command and changing it is non-trivial. Potential workarounds include:

  • Clean the Hadoop temporary files between runs with rm -rf /tmp/hadoop*.
  • Run Hadoop using the Docker image. This way, the /tmp directory will reside within the container.
  • Mount /tmp to a different location
  • Experiment with other options to move the location of hadoop jar's temporary directory.

Job in Hadoop 3 fails due to missing v2 class

Error message:

Exception in thread "main" java.lang.IllegalStateException: HadoopPersonGenerator failed 

The root cause (along with the suggested solution) can be found in $HADOOP_HOME/logs/hadoop-*-resourcemanager-*.log:

Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:
<property>
  <name>yarn.app.mapreduce.am.env</name>
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
  <name>mapreduce.map.env</name>
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>
<property>
  <name>mapreduce.reduce.env</name>
  <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
</property>

For more detailed output, check the application tracking page: http://localhost:8088/cluster/app/application_1584712919598_0016 Then click on links to logs of each attempt.
. Failing the application.      APPID=application_1584712919598_0016

For example, if your $HADOOP_HOME value is /home/ubuntu/hadoop, use:

<property>
  <name>yarn.app.mapreduce.am.env</name>
  <value>HADOOP_MAPRED_HOME=/home/ubuntu/hadoop</value>
</property>
<property>
  <name>mapreduce.map.env</name>
  <value>HADOOP_MAPRED_HOME=/home/ubuntu/hadoop</value>
</property>
<property>
  <name>mapreduce.reduce.env</name>
  <value>HADOOP_MAPRED_HOME=/home/ubuntu/hadoop</value>
</property>

Restarting Hadoop is not required.

Connection refused

Error message:

mkdir: Call From localhost/127.0.0.1 to localhost:9000 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

This can be caused by (1) an incorrectly set $HADOOP_HOME and/or (2) the lack of formatting HDFS. Make sure you have formatted HDFS.

Main class loading error when formatting

Formatting fails with the following error:

Error: Could not find or load main class org.apache.hadoop.hdfs.server.namenode.NameNode

Based on our Stack Overflow answer:

One cause behind this problem might be a user-defined HDFS_DIR environment variable. This is picked up by scripts such as the following lines in libexec/hadoop-functions.sh:

HDFS_DIR=${HDFS_DIR:-"share/hadoop/hdfs"}
...
if [[ -z "${HADOOP_HDFS_HOME}" ]] &&
   [[ -d "${HADOOP_HOME}/${HDFS_DIR}" ]]; then
  export HADOOP_HDFS_HOME="${HADOOP_HOME}"
fi

The solution is to avoid defining an environment variable HDFS_DIR.

The recommendations in the comments of question are correct – use the hadoop classpath command to identify whether hadoop-hdfs-*.jar files are present in the classpath or not. They were missing in my case.

NameNode formatting requires user input

The bin/hdfs namenode -format command prompts the user to type Y to continue.

Re-format filesystem in Storage Directory root=...; location= null ? (Y or N)

You can pipe a Y character as follows:

$ echo 'Y' | bin/hdfs namenode -format

Or even better, use bin/hadoop instead of bin/hdfs:

$ bin/hadoop namenode -format

Problem: Person generation hangs

  • The logs have the following exception:

    Caused by: java.lang.ClassNotFoundException: javax.activation.DataSource

    Cause: you're using Java 11+ instead of Java 8 (loosely related Stack Overflow question). Use JDK 8.

  • The logs have the following error:

    ERROR org.apache.hadoop.yarn.server.nodemanager.LocalDirsHandlerService: Most of the disks failed. 1/1 local-dirs usable space is below configured utilization percentage/no more usable space [ /tmp/nm-local-dir : used space above threshold of 90.0% ] ; 1/1 log-dirs usable space is below configured utilization percentage/no more usable space [ /home/szarnyasg/hadoop-yarn/logs/userlogs : used space above threshold of 90.0% ]

    The log message is pretty clear: the location of the Hadoop temp directory has too little free space.

    Solution: in core-site.xml, set hadoop.tmp.dir to a location with ample space (see also java.io.IOException: No space left on device).

127.0.1.1 errors

A typical problem on Ubuntu (Stack Overflow) is that your local hostname gets resolved incorrectly. To amend this, edit your /etc/hosts file so that:

  • there should not be a 127.0.1.1 <hostname> entry
  • there should be a 127.0.0.1 localhost <hostname> entry.

java.io.IOException: Filesystem closed exception

⚠️ No idea what's causing this so far.

java.io.FileNotFoundException: File does not exist error

Error message:

Error: java.lang.RuntimeException: java.io.FileNotFoundException: File does not exist: /user/ubuntu/hadoop/m0activityFactors.txt (inode 16841) [Lease.  Holder: DFSClient_attempt_1584810038683_0015_r_000000_2_-1750671586_1, pending creates: 1]
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2898)
...

⚠️ No idea what's causing this so far.

Error while doing final merge exception

org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: Error while doing final merge

It's likely that Hadoop uses the default /tmp directory. Set an alternative temporary directory via the core-site.xml file's hadoop.tmp.dir configuration property.

Error while doing final merge exception

DiskChecker$DiskErrorException: Could not find any valid local directory for ...

It's likely that Hadoop uses the default /tmp directory. Set an alternative temporary directory via the core-site.xml file's hadoop.tmp.dir configuration property.

Invalid mapreduce.cluster.local.dir directory

java.io.IOException: No valid local directories in property: mapreduce.cluster.local.dir

The hadoop.tmp.dir configuration property in the core-site.xml file points to a location that does not exist.

Additional settings

You can try increasing the timeout (mapreduce.task.timeout) and the limit on job counters (mapreduce.job.counters.limitmapreduce.job.counters.limit) in the mapred-site.xml Hadoop configuration file.

<property>
  <name>mapreduce.task.timeout</name>
  <value>600000000</value>
</property>


<property>
  <name>mapreduce.job.counters.limit</name>
  <value>20000</value>
</property>

However, these settings might have caused random crashes for our previous setups so use them with caution.