大数据常见错误收集器 #79

m17y · 2018-10-02T03:26:02Z

java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.

命令写错，或者其他配置文件写错
使用option时顺序需要保持一致

Caused by: com.esotericsoftware.kryo.KryoException:java.lang.NullPointerException

空指针异常，一般是代码中有的对象为空，而write 文件的时候不允许为空，或者某个.get 等方法获取不到造成的。
碰见此问题仔细检查代码

Caused by: com.esotericsoftware.kryo.KryoException: java.io.IOException: Stream is corrupted

序列化失败导致,同问题2 仔细检查 rdd write 或者rdd 其他操作

Caused by: org.apache.spark.SparkException: RDD transformations and actions can only be 
invoked by the driver, not inside of other transformations; for example, rdd1.map(x => 
rdd2.values.count() * x) is invalid because the values transformation and count action 
cannot be performed inside of the rdd1.map transformation. For more information, see 
SPARK-5063.

两个RDD不能嵌套计算
解决方案：1.将一个rdd进行action转换后，保存在内存中 2.使用广播变量

广播变量的使用
 rdd转换为Set或者Map

Error: java.lang.RuntimeException: Unable to create Thrift Converter for Thrift metadata nul

原因：sc.thriftParquetFile() 方法（Spark，MR类似）需要利用parquet中的metadata信息来确定如何转换到相应的thrift，而要读取的文件中缺少这些信息。常见于spark+thrift+parquet方式读取hive/pig生成的parquet文件。

1.版本不正确

<dependency>
    <groupId>org.apache.parquet</groupId>
    <artifactId>parquet-thrift</artifactId>
    <version>1.7.0-match</version>
</dependency>

The text was updated successfully, but these errors were encountered:

m17y added the 数据分析 label Oct 2, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

大数据常见错误收集器 #79

大数据常见错误收集器 #79

m17y commented Oct 2, 2018 •

edited

Loading

大数据常见错误收集器 #79

大数据常见错误收集器 #79

Comments

m17y commented Oct 2, 2018 • edited Loading

m17y commented Oct 2, 2018 •

edited

Loading