Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

大数据常见错误收集器 #79

Open
m17y opened this issue Oct 2, 2018 · 0 comments
Open

大数据常见错误收集器 #79

m17y opened this issue Oct 2, 2018 · 0 comments

Comments

@m17y
Copy link
Owner

m17y commented Oct 2, 2018

  1. java.io.IOException: HADOOP_HOME or hadoop.home.dir are not set.

命令写错,或者其他配置文件写错
使用option时顺序需要保持一致

  1. Caused by: com.esotericsoftware.kryo.KryoException:java.lang.NullPointerException

空指针异常,一般是代码中有的对象为空,而write 文件的时候不允许为空,或者某个.get 等方法获取不到造成的。
碰见此问题 仔细检查代码

  1. Caused by: com.esotericsoftware.kryo.KryoException: java.io.IOException: Stream is corrupted

序列化失败导致,同 问题2 仔细检查 rdd write 或者rdd 其他操作

Caused by: org.apache.spark.SparkException: RDD transformations and actions can only be 
invoked by the driver, not inside of other transformations; for example, rdd1.map(x => 
rdd2.values.count() * x) is invalid because the values transformation and count action 
cannot be performed inside of the rdd1.map transformation. For more information, see 
SPARK-5063.

两个RDD不能嵌套计算
解决方案:1.将一个rdd进行action转换后,保存在内存中 2.使用广播变量

广播变量的使用
rdd转换为Set或者Map

  1. Error: java.lang.RuntimeException: Unable to create Thrift Converter for Thrift metadata nul

原因:sc.thriftParquetFile() 方法(Spark,MR类似)需要利用parquet中的metadata信息来确定如何转换到相应的thrift,而要读取的文件中缺少这些信息。常见于spark+thrift+parquet方式读取hive/pig生成的parquet文件。

1.版本不正确

<dependency>
    <groupId>org.apache.parquet</groupId>
    <artifactId>parquet-thrift</artifactId>
    <version>1.7.0-match</version>
</dependency>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant