Skip to content

Conversation

@LuciferYang
Copy link
Contributor

What changes were proposed in this pull request?

This pr aims to fix following Java compilation warnings related to generic type:

2022-10-08T01:43:33.6487078Z /home/runner/work/spark/spark/core/src/main/java/org/apache/spark/SparkThrowable.java:54: warning: [rawtypes] found raw type: HashMap
2022-10-08T01:43:33.6487456Z     return new HashMap();
2022-10-08T01:43:33.6487682Z                ^
2022-10-08T01:43:33.6487957Z   missing type arguments for generic class HashMap<K,V>
2022-10-08T01:43:33.6488617Z   where K,V are type-variables:
2022-10-08T01:43:33.6488911Z     K extends Object declared in class HashMap
2022-10-08T01:43:33.6489211Z     V extends Object declared in class HashMap

2022-10-08T01:50:21.5951932Z /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/catalog/SupportsAtomicPartitionManagement.java:55: warning: [rawtypes] found raw type: Map
2022-10-08T01:50:21.5999993Z       createPartitions(new InternalRow[]{ident}, new Map[]{properties});
2022-10-08T01:50:21.6000343Z                                                      ^
2022-10-08T01:50:21.6000642Z   missing type arguments for generic class Map<K,V>
2022-10-08T01:50:21.6001272Z   where K,V are type-variables:
2022-10-08T01:50:21.6001569Z     K extends Object declared in interface Map
2022-10-08T01:50:21.6002109Z     V extends Object declared in interface Map

2022-10-08T01:50:21.6006655Z /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/connector/util/V2ExpressionSQLBuilder.java:216: warning: [rawtypes] found raw type: Literal
2022-10-08T01:50:21.6007121Z   protected String visitLiteral(Literal literal) {
2022-10-08T01:50:21.6007395Z                                 ^
2022-10-08T01:50:21.6007673Z   missing type arguments for generic class Literal<T>
2022-10-08T01:50:21.6008032Z   where T is a type-variable:
2022-10-08T01:50:21.6008324Z     T extends Object declared in interface Literal

2022-10-08T01:50:21.6008785Z /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:56: warning: [rawtypes] found raw type: Comparable
2022-10-08T01:50:21.6009223Z   public static class Coord implements Comparable {
2022-10-08T01:50:21.6009503Z                                        ^
2022-10-08T01:50:21.6009791Z   missing type arguments for generic class Comparable<T>
2022-10-08T01:50:21.6010137Z   where T is a type-variable:
2022-10-08T01:50:21.6010433Z     T extends Object declared in interface Comparable
2022-10-08T01:50:21.6010976Z /home/runner/work/spark/spark/sql/catalyst/src/main/java/org/apache/spark/sql/util/NumericHistogram.java:191: warning: [unchecked] unchecked method invocation: method sort in class Collections is applied to given types
2022-10-08T01:50:21.6011474Z       Collections.sort(tmp_bins);
2022-10-08T01:50:21.6011714Z                       ^
2022-10-08T01:50:21.6012050Z   required: List<T>
2022-10-08T01:50:21.6012296Z   found: ArrayList<Coord>
2022-10-08T01:50:21.6012604Z   where T is a type-variable:
2022-10-08T01:50:21.6012926Z     T extends Comparable<? super T> declared in method <T>sort(List<T>)

2022-10-08T02:13:38.0769617Z /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/OperationManager.java:85: warning: [rawtypes] found raw type: AbstractWriterAppender
2022-10-08T02:13:38.0770287Z     AbstractWriterAppender ap = new LogDivertAppender(this, OperationLog.getLoggingLevel(loggingMode));
2022-10-08T02:13:38.0770645Z     ^
2022-10-08T02:13:38.0770947Z   missing type arguments for generic class AbstractWriterAppender<M>
2022-10-08T02:13:38.0771330Z   where M is a type-variable:
2022-10-08T02:13:38.0771665Z     M extends WriterManager declared in class AbstractWriterAppender

2022-10-08T02:13:38.0774487Z /home/runner/work/spark/spark/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/LogDivertAppender.java:268: warning: [rawtypes] found raw type: Layout
2022-10-08T02:13:38.0774940Z         Layout l = ap.getLayout();
2022-10-08T02:13:38.0775173Z         ^
2022-10-08T02:13:38.0775441Z   missing type arguments for generic class Layout<T>
2022-10-08T02:13:38.0775849Z   where T is a type-variable:
2022-10-08T02:13:38.0776359Z     T extends Serializable declared in interface Layout

2022-10-08T02:19:55.0035795Z [WARNING] /home/runner/work/spark/spark/connector/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java:56:17:  [rawtypes] found raw type: SparkAvroKeyRecordWriter
2022-10-08T02:19:55.0037287Z [WARNING] /home/runner/work/spark/spark/connector/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java:56:13:  [unchecked] unchecked call to SparkAvroKeyRecordWriter(Schema,GenericData,CodecFactory,OutputStream,int,Map<String,String>) as a member of the raw type SparkAvroKeyRecordWriter
2022-10-08T02:19:55.0038442Z [WARNING] /home/runner/work/spark/spark/connector/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java:75:31:  [rawtypes] found raw type: DataFileWriter
2022-10-08T02:19:55.0039370Z [WARNING] /home/runner/work/spark/spark/connector/avro/src/main/java/org/apache/spark/sql/avro/SparkAvroKeyOutputFormat.java:75:27:  [unchecked] unchecked call to DataFileWriter(DatumWriter<D>) as a member of the raw type DataFileWriter

Why are the changes needed?

Fix Java compilation warnings.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Pass GitHub Actions.


@SuppressWarnings("unchecked")
@Override
default void createPartition(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't new new Map<String, String>[]{properties}, just suppress it

Copy link
Contributor

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM pending CI

int syncInterval,
Map<String, String> metadata) throws IOException {
this.mAvroFileWriter = new DataFileWriter(dataModel.createDatumWriter(writerSchema));
this.mAvroFileWriter = new DataFileWriter<>(new GenericDatumWriter<>(writerSchema, dataModel));
Copy link
Contributor Author

@LuciferYang LuciferYang Oct 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}.getMessage
assert(message.contains("Caused by: java.lang.NullPointerException: "))
assert(message.contains("null in string in field Name"))
assert(message.contains("null value for (non-nullable) string at test_schema.Name"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message change from

Job aborted due to stage failure: Task 1 in stage 1.0 failed 1 times, most recent failure: Lost task 1.0 in stage 1.0 (TID 3) (localhost executor driver): org.apache.spark.SparkException: Task failed while writing rows.
	at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:723)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:310)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$11(FileFormatWriter.scala:217)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
	at org.apache.spark.scheduler.Task.run(Task.scala:139)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.NullPointerException: null in test_schema
	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:317)
	at org.apache.spark.sql.avro.SparkAvroKeyRecordWriter.write(SparkAvroKeyOutputFormat.java:87)
	at org.apache.spark.sql.avro.SparkAvroKeyRecordWriter.write(SparkAvroKeyOutputFormat.java:64)
	at org.apache.spark.sql.avro.AvroOutputWriter.write(AvroOutputWriter.scala:86)
	at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:175)
	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:85)
	at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:92)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:293)
	at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1525)
	at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:300)
	... 10 more
Caused by: java.lang.NullPointerException: null in test_schema
	at org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:208)
	at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:160)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
	at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:314)
	... 19 more
Caused by: java.lang.NullPointerException: null in string in field Name
	at org.apache.avro.generic.GenericDatumWriter.npe(GenericDatumWriter.java:208)
	at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:254)
	at org.apache.avro.specific.SpecificDatumWriter.writeField(SpecificDatumWriter.java:117)
	at org.apache.avro.reflect.ReflectDatumWriter.writeField(ReflectDatumWriter.java:184)
	at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:234)
	at org.apache.avro.specific.SpecificDatumWriter.writeRecord(SpecificDatumWriter.java:92)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:145)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:95)
	at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:158)
	... 21 more
Caused by: java.lang.NullPointerException: Cannot invoke "Object.getClass()" because "datum" is null
	at org.apache.avro.specific.SpecificDatumWriter.writeString(SpecificDatumWriter.java:73)
	at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:165)
	at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:95)
	at org.apache.avro.reflect.ReflectDatumWriter.write(ReflectDatumWriter.java:158)
	at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:245)
	... 28 more

to

"Job aborted due to stage failure: Task 1 in stage 148.0 failed 1 times, most recent failure: Lost task 1.0 in stage 148.0 (TID 252) (localhost executor driver): org.apache.spark.SparkException: Task failed while writing rows.
  at org.apache.spark.sql.errors.QueryExecutionErrors$.taskFailedWhileWritingRowsError(QueryExecutionErrors.scala:723)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:310)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$11(FileFormatWriter.scala:217)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
  at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
  at org.apache.spark.scheduler.Task.run(Task.scala:139)
  at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
  at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
  at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
  at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
  at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.NullPointerException: null value for (non-nullable) string at test_schema.Name
  at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:317)
  at org.apache.spark.sql.avro.SparkAvroKeyRecordWriter.write(SparkAvroKeyOutputFormat.java:87)
  at org.apache.spark.sql.avro.SparkAvroKeyRecordWriter.write(SparkAvroKeyOutputFormat.java:64)
  at org.apache.spark.sql.avro.AvroOutputWriter.write(AvroOutputWriter.scala:86)
  at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:175)
  at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithMetrics(FileFormatDataWriter.scala:85)
  at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:92)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:293)
  at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1525)
  at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:300)
  ... 10 more
Caused by: java.lang.NullPointerException: null value for (non-nullable) string at test_schema.Name
  at org.apache.avro.path.TracingNullPointException.summarize(TracingNullPointException.java:88)
  at org.apache.avro.path.TracingNullPointException.summarize(TracingNullPointException.java:30)
  at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:84)
  at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:314)
  ... 19 more
Caused by: java.lang.NullPointerException: Cannot invoke "java.lang.CharSequence.toString()" because "charSequence" is null
  at org.apache.avro.io.Encoder.writeString(Encoder.java:130)
  at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:392)
  at org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:384)
  at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:165)
  at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:95)
  at org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:245)
  at org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:234)
  at org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:145)
  at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:95)
  at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
  ... 20 more

due to change of SparkAvroKeyOutputFormat, the exception type not changed, but the error message has changed

Copy link
Contributor

@amaliujia amaliujia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome! LGTM

@LuciferYang
Copy link
Contributor Author

Yeah ~ all passed

@HyukjinKwon
Copy link
Member

Merged to master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants