Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ public interface ReadSupport extends DataSourceV2 {
/**
* Creates a {@link DataSourceReader} to scan the data from this data source.
*
* If this method fails (by throwing an exception), the action would fail and no Spark job was
* If this method fails (by throwing an exception), the action will fail and no Spark job will be
* submitted.
*
* @param options the options for the returned data source reader, which is an immutable
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ public interface ReadSupportWithSchema extends DataSourceV2 {
/**
* Create a {@link DataSourceReader} to scan the data from this data source.
*
* If this method fails (by throwing an exception), the action would fail and no Spark job was
* If this method fails (by throwing an exception), the action will fail and no Spark job will be
* submitted.
*
* @param schema the full schema of this data source reader. Full schema usually maps to the
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ public interface WriteSupport extends DataSourceV2 {
* Creates an optional {@link DataSourceWriter} to save the data to this data source. Data
* sources can return None if there is no writing needed to be done according to the save mode.
*
* If this method fails (by throwing an exception), the action would fail and no Spark job was
* If this method fails (by throwing an exception), the action will fail and no Spark job will be
* submitted.
*
* @param jobId A unique string for the writing job. It's possible that there are many writing
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
* {@link ReadSupport#createReader(DataSourceOptions)} or
* {@link ReadSupportWithSchema#createReader(StructType, DataSourceOptions)}.
* It can mix in various query optimization interfaces to speed up the data scan. The actual scan
* logic is delegated to {@link InputPartition}s that are returned by
* logic is delegated to {@link InputPartition}s, which are returned by
* {@link #planInputPartitions()}.
*
* There are mainly 3 kinds of query optimizations:
Expand All @@ -45,8 +45,8 @@
* only one of them would be respected, according to the priority list from high to low:
* {@link SupportsScanColumnarBatch}, {@link SupportsScanUnsafeRow}.
*
* If an exception was throw when applying any of these query optimizations, the action would fail
* and no Spark job was submitted.
* If an exception was throw when applying any of these query optimizations, the action will fail
* and no Spark job will be submitted.
*
* Spark first applies all operator push-down optimizations that this data source supports. Then
* Spark collects information this data source reported for further optimizations. Finally Spark
Expand All @@ -59,21 +59,21 @@ public interface DataSourceReader {
* Returns the actual schema of this data source reader, which may be different from the physical
* schema of the underlying storage, as column pruning or other optimizations may happen.
*
* If this method fails (by throwing an exception), the action would fail and no Spark job was
* If this method fails (by throwing an exception), the action will fail and no Spark job will be
* submitted.
*/
StructType readSchema();

/**
* Returns a list of read tasks. Each task is responsible for creating a data reader to
* output data for one RDD partition. That means the number of tasks returned here is same as
* the number of RDD partitions this scan outputs.
* Returns a list of {@link InputPartition}s. Each {@link InputPartition} is responsible for
Copy link
Contributor

@rdblue rdblue May 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Instead of {@link InputPartition}s use {@link InputPartition partitions} to add the plural. That avoids awkward formatting and links that are missing 's'.

* creating a data reader to output data of one RDD partition. The number of input partitions
* returned here is the same as the number of RDD partitions this scan outputs.
*
* Note that, this may not be a full scan if the data source reader mixes in other optimization
* interfaces like column pruning, filter push-down, etc. These optimizations are applied before
* Spark issues the scan request.
*
* If this method fails (by throwing an exception), the action would fail and no Spark job was
* If this method fails (by throwing an exception), the action will fail and no Spark job will be
* submitted.
*/
List<InputPartition<Row>> planInputPartitions();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,14 @@

/**
* An input partition returned by {@link DataSourceReader#planInputPartitions()} and is
* responsible for creating the actual data reader. The relationship between
* {@link InputPartition} and {@link InputPartitionReader}
* responsible for creating the actual data reader of one RDD partition.
* The relationship between {@link InputPartition} and {@link InputPartitionReader}
* is similar to the relationship between {@link Iterable} and {@link java.util.Iterator}.
*
* Note that input partitions will be serialized and sent to executors, then the partition reader
* will be created on executors and do the actual reading. So {@link InputPartition} must be
* serializable and {@link InputPartitionReader} doesn't need to be.
* Note that {@link InputPartition}s will be serialized and sent to executors, then
* {@link InputPartitionReader}s will be created on executors to do the actual reading. So
* {@link InputPartition} must be serializable while {@link InputPartitionReader} doesn't need to
* be.
*/
@InterfaceStability.Evolving
public interface InputPartition<T> extends Serializable {
Expand All @@ -41,10 +42,10 @@ public interface InputPartition<T> extends Serializable {
* The location is a string representing the host name.
*
* Note that if a host name cannot be recognized by Spark, it will be ignored as it was not in
* the returned locations. By default this method returns empty string array, which means this
* task has no location preference.
* the returned locations. The default return value is empty string array, which means this
* input partition's reader has no location preference.
*
* If this method fails (by throwing an exception), the action would fail and no Spark job was
* If this method fails (by throwing an exception), the action will fail and no Spark job will be
* submitted.
*/
default String[] preferredLocations() {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@
* It can mix in various writing optimization interfaces to speed up the data saving. The actual
* writing logic is delegated to {@link DataWriter}.
*
* If an exception was throw when applying any of these writing optimizations, the action would fail
* and no Spark job was submitted.
* If an exception was throw when applying any of these writing optimizations, the action will fail
* and no Spark job will be submitted.
*
* The writing procedure is:
* 1. Create a writer factory by {@link #createWriterFactory()}, serialize and send it to all the
Expand All @@ -58,7 +58,7 @@ public interface DataSourceWriter {
/**
* Creates a writer factory which will be serialized and sent to executors.
*
* If this method fails (by throwing an exception), the action would fail and no Spark job was
* If this method fails (by throwing an exception), the action will fail and no Spark job will be
* submitted.
*/
DataWriterFactory<Row> createWriterFactory();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ public interface DataWriterFactory<T> extends Serializable {
/**
* Returns a data writer to do the actual writing work.
*
* If this method fails (by throwing an exception), the action would fail and no Spark job was
* If this method fails (by throwing an exception), the action will fail and no Spark job will be
* submitted.
*
* @param partitionId A unique id of the RDD partition that the returned writer will process.
Expand Down