Skip to content

Commit 76a116a

Browse files
committed
revise the comments in read path
1 parent e3d4349 commit 76a116a

File tree

2 files changed

+11
-10
lines changed

2 files changed

+11
-10
lines changed

sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/DataSourceReader.java

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@
3131
* {@link ReadSupport#createReader(DataSourceOptions)} or
3232
* {@link ReadSupportWithSchema#createReader(StructType, DataSourceOptions)}.
3333
* It can mix in various query optimization interfaces to speed up the data scan. The actual scan
34-
* logic is delegated to {@link InputPartition}s that are returned by
34+
* logic is delegated to {@link InputPartition}s, which are returned by
3535
* {@link #planInputPartitions()}.
3636
*
3737
* There are mainly 3 kinds of query optimizations:
@@ -65,9 +65,9 @@ public interface DataSourceReader {
6565
StructType readSchema();
6666

6767
/**
68-
* Returns a list of read tasks. Each task is responsible for creating a data reader to
69-
* output data for one RDD partition. That means the number of tasks returned here is same as
70-
* the number of RDD partitions this scan outputs.
68+
* Returns a list of {@link InputPartition}s. Each {@link InputPartition} is responsible for
69+
* creating a data reader to output data of one RDD partition. The number of input partitions
70+
* returned here is the same as the number of RDD partitions this scan outputs.
7171
*
7272
* Note that, this may not be a full scan if the data source reader mixes in other optimization
7373
* interfaces like column pruning, filter push-down, etc. These optimizations are applied before

sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/InputPartition.java

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,14 @@
2323

2424
/**
2525
* An input partition returned by {@link DataSourceReader#planInputPartitions()} and is
26-
* responsible for creating the actual data reader. The relationship between
27-
* {@link InputPartition} and {@link InputPartitionReader}
26+
* responsible for creating the actual data reader of one RDD partition.
27+
* The relationship between {@link InputPartition} and {@link InputPartitionReader}
2828
* is similar to the relationship between {@link Iterable} and {@link java.util.Iterator}.
2929
*
30-
* Note that input partitions will be serialized and sent to executors, then the partition reader
31-
* will be created on executors and do the actual reading. So {@link InputPartition} must be
32-
* serializable and {@link InputPartitionReader} doesn't need to be.
30+
* Note that {@link InputPartition}s will be serialized and sent to executors, then
31+
* {@link InputPartitionReader}s will be created on executors to do the actual reading. So
32+
* {@link InputPartition} must be serializable while {@link InputPartitionReader} doesn't need to
33+
* be.
3334
*/
3435
@InterfaceStability.Evolving
3536
public interface InputPartition<T> extends Serializable {
@@ -42,7 +43,7 @@ public interface InputPartition<T> extends Serializable {
4243
*
4344
* Note that if a host name cannot be recognized by Spark, it will be ignored as it was not in
4445
* the returned locations. By default this method returns empty string array, which means this
45-
* task has no location preference.
46+
* data reader has no location preference.
4647
*
4748
* If this method fails (by throwing an exception), the action would fail and no Spark job was
4849
* submitted.

0 commit comments

Comments
 (0)