Flink: FLIP-27 source split and reader #2305

stevenzwu · 2021-03-08T17:48:33Z

This is the first feature PR for FLIP-27 source split from the uber PR #2105

Currently, there are at least two open questions to be addressed. Since I will be out for the rest of the week, I like to put this out first.

@openinx suggested that we break the DataIterator into two levels (combined and file tasks). I have a question that maybe @openinx can confirm in the comment from the uber PR.
Reader is currently implemented on top of FileSourceSplit and BulkFormat. The original reason is that Jingsong mentioned that we may be able to take advantage of the high-performant vectorized readers from Flink. But I am revisiting that decision. It is unlikely Flink's vectorized readers will support deletes. It seems that Iceberg is also adding vectorized readers and I assume Iceberg implementations will support deletes.

@openinx @sundargates @tweise

openinx · 2021-03-10T11:53:25Z

flink/src/main/java/org/apache/iceberg/flink/source/DataIterator.java


+  public void seek(CheckpointedPosition checkpointedPosition)  {
+    // skip files
+    Preconditions.checkArgument(checkpointedPosition.getOffset() < combinedTask.files().size(),


Nit: could simplify it as:

Preconditions.checkArgument(checkpointedPosition.getOffset() < combinedTask.files().size(), "Checkpointed file offset is %s, while CombinedScanTask has %s files", checkpointedPosition.getOffset(), combinedTask.files().size());

openinx · 2021-03-10T12:04:43Z

flink/src/main/java/org/apache/iceberg/flink/source/IcebergSourceOptions.java

+import org.apache.flink.configuration.ConfigOption;
+import org.apache.flink.configuration.ConfigOptions;
+
+public interface IcebergSourceOptions {


We've introduced a FlinkTableOptions , I think it's not friendly to create options classes for source, sink, table etc. Maybe we could rename the FlinkTableOptions to FlinkConfigOptions, and put all the options into that class.

openinx · 2021-03-10T12:11:58Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/FlinkBulkFormatAdaptor.java

+
+  @Override
+  public Reader<T> createReader(Configuration config, IcebergSourceSplit split) throws IOException {
+    return new ReaderAdaptor<T>(bulkFormatProvider, config, split, false);


Nit: new ReaderAdaptor<>(...) ?

openinx · 2021-03-10T12:12:16Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/FlinkBulkFormatAdaptor.java

+
+  @Override
+  public Reader<T> restoreReader(Configuration config, IcebergSourceSplit split) throws IOException {
+    return new ReaderAdaptor<T>(bulkFormatProvider, config, split, true);


openinx · 2021-03-10T12:14:00Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/FlinkBulkFormatAdaptor.java

+      final CheckpointedPosition position = icebergSplit.checkpointedPosition();
+      if (position != null) {
+        // skip files based on offset in checkpointed position
+        Preconditions.checkArgument(position.getOffset() < icebergSplit.task().files().size(),


Nit: use the method

checkArgument(boolean expression, @Nullable String errorMessageTemplate, @Nullable Object... errorMessageArgs)

openinx · 2021-03-10T12:23:34Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/FlinkBulkFormatAdaptor.java

+        fileOffset++;
+        final FileSourceSplit fileSourceSplit = new FileSourceSplit(
+            "",
+            new Path(URI.create(fileScanTask.file().path().toString())),


So the FileSourceSplit will use the flink's fs interface to access the underlying files ? We iceberg currently has our own FileIO interface, the object storage services are implementing this interface to write/read data to cloud. If we introduce flink fs here , I'm concerning that we have to implement both flink fs interfaces and iceberg FileIO interfaces for making the experimental unified source work.

We extended from FileSourceSplit mainly for the BulkFormat batch reader interface so that we can plug in vectorized readers from Flink. I am also debating if this is the right thing to do as mentioned in the original description.

But this is not really relevant to FileIO, which deals with underneath filesystem like S3. Here we are mainly talking about file format reader (like Parquet, Orc).

openinx · 2021-03-10T12:25:26Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/FlinkBulkFormatAdaptor.java

+
+    private final long fileOffset;
+    private final RecordIterator<T> iterator;
+    private final MutableRecordAndPosition mutableRecordAndPosition;


Nit: use MutableRecordAndPosition<T> here

openinx · 2021-03-10T12:26:56Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/IcebergSourceReader.java

+import org.apache.iceberg.flink.source.split.MutableIcebergSourceSplit;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+
+public class IcebergSourceReader<T> extends


This class don't have to be introduced in this PR ? I see there's no usage.

Yes, it is not used in this PR. It is added for the completeness of the split reader module

Lets drop it for now and do it in a follow up PR, this one is already really really large.

sounds good. will do

openinx · 2021-03-10T12:44:09Z

Actually, I did not fully understand the whole PR (#2105) when reviewing this separate PR, I think I will need more time to understand the whole codes firstly.

stevenzwu · 2021-03-23T05:48:23Z

@openinx I updated the PR on refactoring the DataIterator: using composition instead of inheritance (like the old RowDataIterator). Please take a look when you got a chance.

stevenzwu · 2021-05-14T00:02:18Z

@openinx if we are committed to have vectorized reader in iceberg-flink that @zhangjun0x01 is working on (PR #2566 ), then I will update this PR and avoid the dep/extension from FileSourceSplit and BulkFormat. The only reason I did it is to reuse the vectorized readers from Flink.

I am also wondering if the vectorized readers support deletes filtering? I know Flink vectorized readers impl for sure won't support as it is an Iceberg concept.

stevenzwu · 2021-05-14T03:59:11Z

flink/src/main/java/org/apache/iceberg/flink/source/CombinedScanTaskIterator.java

+    return position;
+  }
+
+  public static class Position {


@openinx following up your comment from the uber PR: https://github.com/apache/iceberg/pull/2105/files#r630834205.

The reason I introduce this mutable Position class is to avoid the construction of a <fileOffset, recordOffset> object. It is the current cursor for the iterator.

Didn't track the recordOffset inside the FileIteratorReader for the same reason. Otherwise, position() getter will construct a new object each time.

We can't use CheckpointedPosition from Flink for two reasons: (1) it is immutable (2) we want to return the current position (not necessarily CheckpointedPosition).

holdenk · 2021-06-25T17:34:17Z

Hey @stevenzwu can you rebase or merge in master branch github is showing a conflicting file.

holdenk

Thanks for working on this. I don't have enough context around this I'll try and spin up some more context for next week but in the meantime if any of the people with more context (like @JingsongLi or @rdblue ) have some cycles to review that would be rad :)

holdenk · 2021-06-25T17:36:14Z

flink/src/main/java/org/apache/iceberg/flink/FlinkConfigOptions.java

 import org.apache.flink.configuration.ConfigOptions;

-public class FlinkTableOptions {
+public class FlinkConfigOptions {


This looks like an existing public API, renaming might have some challenges or at the very least need a comment in the migration guide. Another option would be to extend FlinkConfigOptions into a deprecated FlinkTableOptions so as not to break compatiblity.

the class is use internally. the config keys are public contract and weren't be changed. Let me move this one out to a separate small PR

holdenk · 2021-06-25T17:43:06Z

flink/src/main/java/org/apache/iceberg/flink/source/ScanContext.java

 * Context object with optional arguments for a Flink Scan.
 */
-class ScanContext implements Serializable {
+public class ScanContext implements Serializable {


Why do we need to make so much more of this ScanContext public?

because now this class is accessed by classes in sub package (like reader/RowDataIteratorBulkFormat.java)

holdenk · 2021-06-25T17:44:27Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/IcebergSourceReader.java

+import org.apache.iceberg.flink.source.split.MutableIcebergSourceSplit;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
+
+public class IcebergSourceReader<T> extends


Lets drop it for now and do it in a follow up PR, this one is already really really large.

stevenzwu · 2021-06-25T20:05:53Z

@holdenk thx a lot for taking a look. will rebase it after the decision on the vectorized readers below.

@rdblue @JingsongLi please help take a look and move the new FLIP-27 based Iceberg source move forward if you can. It is part 3 of this uber PR #2105

Right now, there is a pending decision before this PR can be reviewed. Currently, this PR is based on the premise of reusing the BulkFormat from Flink for vectorized readers (for Parquet, Orc etc.), originally suggested by @JingsongLi . I am rethinking that choice. It is unlikely Flink (vectorized) readers with support delete filters like Iceberg readers. Maybe iceberg-flink module needs to have its own vectorized readers to support deletes. @zhangjun0x01 already submitted a PR #2566 for Orc. Then this PR needs to be adjusted and break away from flink file source.

sundargates · 2021-06-27T01:36:18Z

flink/src/main/java/org/apache/iceberg/flink/source/split/IcebergSourceSplit.java

+
+public class IcebergSourceSplit extends FileSourceSplit {
+
+  public enum Status {


nit: Given that this Status is not a field in the Split itself, wondering if we should move this to a separate file?

make sense. will move it out

sundargates · 2021-06-27T01:38:42Z

flink/src/main/java/org/apache/iceberg/flink/source/split/IcebergSourceSplit.java

+import org.apache.iceberg.relocated.com.google.common.base.Objects;
+import org.apache.iceberg.relocated.com.google.common.collect.Iterables;
+
+public class IcebergSourceSplit extends FileSourceSplit {


what's the rationale for extending from FileSourceSplit given that it is used for representing a file itself and based on the passed args from the constructor in line 56, it doesn't seem to map well to the iceberg file split?

Was hoping to extend from flink-connector-files to leverage the vectorized readers in Flink. I have been revisiting that decision. After discussion with @openinx offline, we agreed that iceberg-flink probably should have its own vectorized readers (mainly for delete filters in the future). flink-connector-files' vectorized readers won't support delete filter, as it is a concept for Iceberg only (not the raw file format like Parquet or Orc). Will update this PR

sundargates · 2021-06-27T01:40:08Z

flink/src/main/java/org/apache/iceberg/flink/source/split/IcebergSourceSplit.java

+    return checkpointedPosition;
+  }
+
+  public byte[] serializedFormCache() {


nit: I'm guessing this need not be public given that it's an optimization primarily around making serialization cheap?

it is actually used by IcebergSourceSplit

You mean by IcebergSourceSplitSerializer. Since that's same package it doesn't need to be public?

stevenzwu · 2021-07-01T14:24:24Z

@openinx @sundargates @holdenk @tweise Now this sub PR is ready for review.

I have changed the code not to extend from FileSourceSplit and BulkFormat, as we are aligned that Iceberg source reader probably can't reuse the vectorized readers from Flink. The main reason is that future Iceberg V2 format supports deletes, which is a concept applicable to Iceberg (not the raw file formats like Orc, Parquet etc.). Hence we can't reuse Flink's vectorized readers with delete filters.

I also moved some unrelated refactoring out of this PR.

sundargates · 2021-07-02T17:13:09Z

build.gradle

      void onOutput(TestDescriptor testDescriptor, TestOutputEvent testOutputEvent) {
        if (lastDescriptor != testDescriptor) {
-          buildLog << "--------\n- Test log for: "<< testDescriptor << "\n--------\n"
+          buildLog << "--------\n- Test log for: " << testDescriptor << "\n--------\n"


looks like extraneous changes unrelated to the diff?

yeah. annoying auto-formatting from Intellij. will fix it.

flink/src/main/java/org/apache/iceberg/flink/FlinkConfigOptions.java

sundargates · 2021-07-02T20:49:52Z

flink/src/main/java/org/apache/iceberg/flink/source/DataIterator.java

    return inputFiles.get(location);
  }

+  public void seek(CheckpointedPosition checkpointedPosition)  {


is there a need to make use of this DS given that BulkFormat is not going to be supported initially?

sundargates · 2021-07-02T20:57:19Z

flink/src/main/java/org/apache/iceberg/flink/source/DataIterator.java

    tasks = null;
  }
+
+  public Position position() {


It appears that you are using CheckpointedPosition DS to communicate the position that the iterator has to seek to from outside. However, in order to communicate the current position to the outside world, you are using the internal Position DS. Wondering if we can keep this consistent to be either CheckpointedPosition or the mutable Position?

That is a good point. It is also related to your question above. Let me see how to unify them and maybe move away from Flink's CheckpointedPosition

sundargates · 2021-07-02T21:45:49Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/FileRecords.java

+public class FileRecords<T> implements RecordsWithSplitIds<RecordAndPosition<T>> {
+
+  @Nullable
+  private final ReaderFactory.RecordIterator recordsForSplit;


why is the RecordIterator type here not parameterized?

they should be. will fix

sundargates · 2021-07-02T23:28:53Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/ReaderFactory.java

+   *
+   * @param <T> output record type
+   */
+  interface Reader<T> {


Would it make sense to have the reader extend CloseableIterable<RecordsWithSplitIds>? This way the intermediate DS called FileRecords can be completely avoided. A lot of the abstraction makes it slightly complex to read otherwise IMO.

interface Reader<T> extends CloseableIterable<RecordsWithSplitIds<T>> { }

I would need FileRecords, as it is an implementation of the RecordsWithSplitIds interface.

I see your point on the complexity. let me see how to simplify the abstractions.

sundargates · 2021-07-02T23:30:03Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/ReaderFactory.java

+   *
+   * @param <T> The type of the record.
+   */
+  interface RecordIterator<T> {


I'm not sure we need this internal interface given RecordsWithSplitIds is pretty much a superset of this DS.

sundargates · 2021-07-02T23:38:35Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/RowDataIteratorReaderFactory.java

+import org.apache.iceberg.flink.source.ScanContext;
+import org.apache.iceberg.flink.source.split.IcebergSourceSplit;
+
+public class RowDataIteratorReaderFactory implements ReaderFactory<RowData> {


Can we make this generic by asking the user to provide the factory for generating T (RowData in this specific case) from a given CombinedScanTask or make that abstract and have a default implementation for RowData that uses the DataIterator?

class BatchDataIteratorFactory<T> implements Function<FileScanTaskSplit, CloseableIterable<RecordsWithSplitIds<T>> { protected abstract DataIterator<T> getIteratorFor(CombinedScanTask task); }

sundargates · 2021-07-02T23:40:09Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/ReaderFactory.java

+   * @param split  Iceberg source split
+   * @return a batch reader
+   */
+  Reader<T> create(Configuration config, IcebergSourceSplit split);


Why is the configuration passed per split? Wouldn't it be the same for the full table? If so, should it be passed to the constructor as a property of the implementation rather than per split?

make sense. will change

sundargates · 2021-07-02T23:45:45Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/ReaderFactory.java

+import org.apache.flink.connector.file.src.util.RecordAndPosition;
+import org.apache.iceberg.flink.source.split.IcebergSourceSplit;
+
+public interface ReaderFactory<T> extends Serializable {


I think it might be better to avoid this interface and replace it with an existing Java type such as java.util.Function, as it would lead to less need for understanding new types. If you want to have this abstraction, then you can just define this interface as

public interface ReaderFactory<T> extends Function<IcebergSourceSplit, CloseableIterable<RecordsWithSplitIds<T>> {}

stevenzwu · 2021-07-12T07:47:51Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/DataIteratorReaderFactory.java

+  private final Configuration config;
+  private final DataIteratorBatcher<T> batcher;
+
+  public DataIteratorReaderFactory(Configuration config, DataIteratorBatcher<T> batcher) {


@sundargates I introduced this Batcher interface. RowDataIterator reuses RowData object so that we need to use array pool for the reader. Avro iterator doesn't reuse object. So we don't need to use array pool and clone object. A different Batcher can be plugged in for Avro record output type reader

openinx · 2021-09-09T02:57:53Z

build.gradle

    compile project(':iceberg-parquet')
    compile project(':iceberg-hive-metastore')

+    compileOnly "org.apache.flink:flink-connector-base"


The interfaces (such as SplitReader, RecordsWithSplitIds, SourceReaderOptions) from this jar are stable enough to expose for downstream project like iceberg , ha ? I rise this Q because I don't see it's marked with a Public annotation and seems don't have any compatibility guarantee ?

SplitReader, RecordsWithSplitIds are part of the core API that source implementation needs to extend/implement from.

I did a little research and saw Stephan recently merged the @PublicEvolve annotation to all 3 interfaces that you pointed out: https://issues.apache.org/jira/browse/FLINK-22358

It would be too restrictive to only depend on @Public interfaces. FLIP-27 connectors are becoming the default and these interface should mature soon though. Iceberg project will need a story to support multiple Flink versions at some point.

openinx · 2021-09-09T03:00:04Z

build.gradle

    def buildLog = new File(logFile)
    addTestOutputListener(new TestOutputListener() {
      def lastDescriptor
+


Nit: please don't introduce any unrelated changes for avoiding conflicts when rebase to people's own repo.

sorry. that was a mistake. will fix

openinx · 2021-09-09T03:02:53Z

build.gradle

      exclude group: 'com.google.code.findbugs', module: 'jsr305'
    }
+
+    compile "org.apache.flink:flink-connector-base"


This jar was not included into the flink-dist binary package, so we have to include it into iceberg-flink-runtime jar explicitly ?

that is correct

It would be good to add a comment since this question is almost certain to come up again.

openinx · 2021-09-09T03:37:56Z

flink/src/main/java/org/apache/iceberg/flink/source/FlinkSplitGenerator.java

+    }
+  }
+
+  public static List<IcebergSourceSplit> planIcebergSourceSplits(


Should we add a javadoc (Or replace it with a more clear name) to indicate why do we need to add an extra planIcebergSourceSplits (compared to createInputSplits) ? Seems it's not easy to identify the difference between the name 'InputSplits' and name 'IcebergSourceSplits'. I think it's used for implementing the flip27's SourceSplit, right ?

It's good to align with the createInputSplits by naming this as createIcebergSourceSplits.

actually, I think we should rename createInputSplits to planFlinkInputSplit. We are not creating splits out of nowhere. Both are just discover/plan splits from table by calling the same planTasks. I can add some javadoc on the new planIcebergSourceSplits method.

Since createInputSplits is non-public, we should be safe to rename.

tweise · 2021-09-26T00:11:25Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/DataIteratorBatcher.java

+import org.apache.iceberg.flink.source.DataIterator;
+import org.apache.iceberg.io.CloseableIterator;
+
+@FunctionalInterface


tweise · 2021-09-26T00:12:01Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/DataIteratorReaderFunction.java

+import org.apache.iceberg.flink.source.split.IcebergSourceSplit;
+import org.apache.iceberg.io.CloseableIterator;
+
+public abstract class DataIteratorReaderFunction<T> implements ReaderFunction<T> {


internal or javadoc?

will add Javadoc. In this PR, there is only one implementation of RowDataReaderFunction. but we can extend from it with AvroGenericRecordReaderFunction that directly reads Parquet files into Avro GenericRecord. That is why this is left as public

tweise · 2021-09-26T00:17:32Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/FileRecords.java

+ * A batch of recrods for one split
+ */
+@Internal
+public class FileRecords<T> implements RecordsWithSplitIds<RecordAndPosition<T>> {


Found the class name not intuitive. From usage, this appears to be a "fetch result"?

I am going to rename it to SplitRecords. Javadoc explains the purpose of this class. yes, it is for "fetch result"

tweise · 2021-09-26T00:21:15Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/IcebergSourceSplitReader.java

+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+class IcebergSourceSplitReader<T> implements SplitReader<RecordAndPosition<T>, IcebergSourceSplit> {


Did you consider the need to subclass the reader for customization? Maybe it should be protected?

right now, I don't anticipate any need of extending from this class

tweise · 2021-09-26T00:30:28Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/RecyclableArrayIterator.java

+  }
+
+  /**
+   * Each record's {@link RecordAndPosition} will have the same fileOffset (for {@link RecordAndPosition#getOffset()}.


should this move to class level doc?

tweise · 2021-09-26T00:39:09Z

flink/src/main/java/org/apache/iceberg/flink/source/split/IcebergSourceSplit.java

+    return checkpointedPosition;
+  }
+
+  public byte[] serializedFormCache() {


You mean by IcebergSourceSplitSerializer. Since that's same package it doesn't need to be public?

tweise · 2021-09-26T00:40:56Z

flink/src/main/java/org/apache/iceberg/flink/source/split/IcebergSourceSplitSerializer.java

+
+/**
+ * TODO: use Java serialization for now.
+ * will switch to more stable serializer from issue-1698.


Please add full link. With new version such subsequent change will be backward compatible?

will add link.

Regarding the future change with stable serialize, it will be backward compatible as we can bump up the serializer version to v2 and can continue to deserialize state written by v1

@Override public IcebergSourceSplit deserialize(int version, byte[] serialized) throws IOException { switch (version) { case 1: return deserializeV1(serialized); default: throw new IOException("Unknown version: " + version); } }

rdblue · 2021-10-10T22:13:33Z

@stevenzwu, I'll try to review this in the next week. Thank you!

rdblue · 2021-10-29T23:08:39Z

flink/src/main/java/org/apache/iceberg/flink/source/FlinkSplitPlanner.java

-    FlinkInputSplit[] splits = new FlinkInputSplit[tasks.size()];
-    for (int i = 0; i < tasks.size(); i++) {
-      splits[i] = new FlinkInputSplit(i, tasks.get(i));
+  static FlinkInputSplit[] planInputSplits(Table table, ScanContext context) {


Why change the name of this method?

create/generate implies creating sth new. This is actually plan/discover splits from table. Hence changed the method name. I actually also renamed the class name from FlinkSplitGenerator to FlinkSplitPlanner. This is an internal class. So it shouldn't break user code.

rdblue · 2021-10-29T23:10:56Z

flink/src/main/java/org/apache/iceberg/flink/source/DataIterator.java

+    Preconditions.checkArgument(startingPosition.fileOffset() < combinedTask.files().size(),
+        "Checkpointed file offset is %d, while CombinedScanTask has %d files",
+        startingPosition.fileOffset(), combinedTask.files().size());
+    for (long i = 0L; i < startingPosition.fileOffset(); ++i) {


Is fileOffset() a long? That seems odd to me. When would you need to address more than 2 billion files in a single combined scan task?

integer would certainly be sufficient. I was using long to match the type in RecordAndPosition from flink-connector-files module. Looking at it again. The long offset from Flink's RecordAndPosition actually meant byte offset within a file. I will define our own RecordAndPosition and change fileOffset to int type.

same as another comment. will update

rdblue · 2021-10-29T23:11:35Z

flink/src/main/java/org/apache/iceberg/flink/source/DataIterator.java

    this.inputFilesDecryptor = new InputFilesDecryptor(task, io, encryption);
+    this.combinedTask = task;
+
    this.tasks = task.files().iterator();


I'd probably rename this to fileTasks since there is now a combinedTask.

agree. will rename it to fileTasksIterator

rdblue · 2021-10-29T23:12:18Z

flink/src/main/java/org/apache/iceberg/flink/source/DataIterator.java

+            startingPosition.recordOffset());
+      }
+    }
+    this.position.update(startingPosition.fileOffset(), startingPosition.recordOffset());


Can position be final since this is using update?

yes, will change Position to final

rdblue · 2021-10-29T23:14:19Z

flink/src/main/java/org/apache/iceberg/flink/source/RowDataFileScanTaskReader.java


 @Internal
 public class RowDataFileScanTaskReader implements FileScanTaskReader<RowData> {
-


Looks like this file doesn't need to change. Can you remove this?

will revert

rdblue · 2021-10-29T23:15:14Z

flink/src/main/java/org/apache/iceberg/flink/source/ScanContext.java

      ConfigOptions.key("monitor-interval").durationType().defaultValue(Duration.ofSeconds(10));

+  private static final ConfigOption<Boolean> INCLUDE_COLUMN_STATS =
+      ConfigOptions.key("include-column-stats").booleanType().defaultValue(false);


How is this addition related to FLIP-27?

this is needed for event time aligned assigner / rough ordering where we use the min-max stats from timestamp column to order the splits and assignment. it is not directly related to the reader part. It is just part of the sub PR as I copied the classes from the the uber PR #2105 for this sub PR.

rdblue · 2021-10-29T23:16:35Z

flink/src/main/java/org/apache/iceberg/flink/source/StreamingMonitorFunction.java

      }

-      FlinkInputSplit[] splits = FlinkSplitGenerator.createInputSplits(table, newScanContext);
+      FlinkInputSplit[] splits = FlinkSplitPlanner.planInputSplits(table, newScanContext);


Does this keep the monitor function so that it can maintain the old API? Do we need to maintain the old API?

FlinkSplitGenerator/FlinkSplitPlanner is an internal class. it doesn't affect the public API for user code

rdblue · 2021-10-29T23:17:25Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/ArrayPoolDataIteratorBatcher.java

+import org.apache.iceberg.flink.source.Position;
+import org.apache.iceberg.io.CloseableIterator;
+
+class ArrayPoolDataIteratorBatcher<T> implements DataIteratorBatcher<T> {


There isn't much context for me to go on here. Should there be Javadoc to explain what's going on?

will add Javadoc to explain

rdblue · 2021-10-29T23:19:50Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/DataIteratorBatcher.java

+ */
+@FunctionalInterface
+public interface DataIteratorBatcher<T> extends Serializable {
+  CloseableIterator<RecordsWithSplitIds<RecordAndPosition<T>>> apply(String splitId, DataIterator<T> inputIterator);


Why name this apply? Is there a more specific verb you could use here?

rdblue · 2021-10-29T23:21:40Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/ArrayPoolDataIteratorBatcher.java

+      int num = 0;
+      while (inputIterator.hasNext() && num < batchSize) {
+        T nextRecord = inputIterator.next();
+        recordFactory.clone(nextRecord, batch[num]);


What is this doing? Is clone expensive?

It could be that the record produced by inputIterator is reused and clone call is making a copy because this can't call inputIterator.next() until the copy is made since the record is not consumed immediately. If that's the case, then I think there should be a comment to point out what's going on.

you are exactly right. will add code comment

rdblue · 2021-10-29T23:24:48Z

flink/src/main/java/org/apache/iceberg/flink/source/reader/ArrayPoolDataIteratorBatcher.java

+          break;
+        }
+      }
+      if (num == 0) {


Could you add newlines between control flow blocks?

This is needed for event time aligned assigner for FLIP-27 source.

…lic classes as @internal

…chema. also make projected schema optional in ScanContext

openinx · 2021-11-01T09:55:38Z

flink/build.gradle

    implementation project(':iceberg-parquet')
    implementation project(':iceberg-hive-metastore')

+    compileOnly "org.apache.flink:flink-connector-base"


It's strange the the build for flink 1.12 & flink 1.13 has been passed, because I don't see the same dependency are added to flink 1.12 build.gradle and 1.13 build.gradle. Maybe I need to check the 1.12's build.gradle again.

@openinx Maybe follow-up on the other comment discussion here.

With the SplitEnumerator API change, looks like I need to put FLIP-27 source in the v1.13 folder. What should we do with future versions (like 1.14)? do we copy the FLIP-27 source code from v1.13 to v1.14 folder?

@openinx, the tests run against the iceberg-flink module. They aren't present in the 1.12 or 1.13 modules. If you want them to be run for those modules, you'd need to add the source folder like you do for src/main/java. If you choose to do that, let's also remove CI for the common module since we don't need to run the tests outside of 1.12 and 1.13 if they are run in those modules.

@stevenzwu, I think that copying the parts that change is reasonable. And once we remove support for 1.12, you can move the files back into the common module.

yeah. that is my plan too. Once 1.12 support is removed, we should be able to move files back to the common module. We just need to be diligent with these efforts.

openinx · 2021-11-01T11:38:13Z

flink/src/main/java/org/apache/iceberg/flink/source/DataIterator.java

+    this.combinedTask = task;
+    // fileOffset starts at -1 because we started
+    // from an empty iterator that is not from the split files.
+    this.position = new Position(-1, 0L);


The general DataIterator don't use the position or seek method to skip tasks or records. Putting all the flip-27 related logics in the flink common read path does not make sense to me, because every times when I read this class, I need to see which part is related to flip-27, which is the unrelated part.

I will suggest to introduce a separate SeekableDataIterator to isolate the two code path, I made a simple commit for this: https://github.com/openinx/incubator-iceberg/commit/b08dde86aae0c718d9d72acb347dffb3a836b336, you may want to take a look.

I wouldn't say that seek capability is FLIP-27 specific. If we think DataIterator as reading a list of files/splits from CombinedScanTask, it is like a file API where seek is pretty common. It is needed to achieve exactly-once processing semantics. e.g., if we were to implement exactly once semantics for the current streaming source, I would imagine we need this as well.

Thanks a lot for the SeekableDataIterator. I feel that leaving these two empty abstract methods in the base DataIterator is a little weird

protected void advanceRecord() protected void advanceTask()

Overall, I still think adding seek capability to DataIterator is natural (for file-like read APIs)

stevenzwu · 2021-11-08T20:37:50Z

closing this PR for now. Will further break it down to smaller PR as suggested by @openinx

github-actions bot added build flink labels Mar 8, 2021

stevenzwu mentioned this pull request Mar 8, 2021

Flink: Initial implementation of Flink source with the new FLIP-27 source interface #2105

Closed

openinx reviewed Mar 10, 2021

View reviewed changes

stevenzwu force-pushed the flip27SplitReader branch 2 times, most recently from 672e638 to cec66f9 Compare March 22, 2021 18:39

stevenzwu mentioned this pull request May 8, 2021

Flink : vectorized read of orc format in flink #2566

Closed

stevenzwu commented May 14, 2021

View reviewed changes

holdenk reviewed Jun 25, 2021

View reviewed changes

stevenzwu mentioned this pull request Jun 25, 2021

Flink: Rename FlinkTableOptions to more generic FlinkConfigOptions. #2742

Merged

sundargates reviewed Jun 29, 2021

View reviewed changes

stevenzwu force-pushed the flip27SplitReader branch 2 times, most recently from 134aac0 to 82fd27f Compare July 1, 2021 06:38

sundargates reviewed Jul 2, 2021

View reviewed changes

stevenzwu commented Jul 12, 2021

View reviewed changes

stevenzwu force-pushed the flip27SplitReader branch from e127507 to 697b114 Compare July 12, 2021 16:44

stevenzwu force-pushed the flip27SplitReader branch from 4d53094 to 5026e93 Compare September 8, 2021 20:12

openinx reviewed Sep 9, 2021

View reviewed changes

stevenzwu force-pushed the flip27SplitReader branch from 5026e93 to 5355c8c Compare September 20, 2021 18:22

tweise reviewed Sep 26, 2021

View reviewed changes

rdblue reviewed Oct 29, 2021

View reviewed changes

stevenzwu added 14 commits October 31, 2021 22:23

Iceberg source split and split reader

e236d11

whitespace change

a0037db

rename class

ef0b937

sync up with uber branch

5b3f5fd

make ReaderFunction a functional interface

ce4cfad

support includeColumnStats in ScanContext.

c2b7eea

This is needed for event time aligned assigner for FLIP-27 source.

rename FlinkSplitGenerator to FlinkSplitPlanner. also marked some pub…

93e3e49

…lic classes as @internal

rename fetch-batch-size to fetch-record-batch-size to be more clear

887436f

Address Thomas' comments

75b29cb

Constrcut RowType internally inside RowDataReaderFunction from read s…

db48cd7

…chema. also make projected schema optional in ScanContext

reapply dep change after build.gradle refactoring from master branch

819bed5

address review comments for split reader

7e51471

fix comment

71eaa71

address Ryan's review comments for split reader

d7ec63d

stevenzwu force-pushed the flip27SplitReader branch from 48c9061 to d7ec63d Compare November 1, 2021 05:24

openinx reviewed Nov 1, 2021

View reviewed changes

stevenzwu closed this Nov 8, 2021


		public class IcebergSourceSplit extends FileSourceSplit {

		public enum Status {


		@Internal
		public class RowDataFileScanTaskReader implements FileScanTaskReader<RowData> {

Flink: FLIP-27 source split and reader #2305

Flink: FLIP-27 source split and reader #2305

Uh oh!

Conversation

stevenzwu commented Mar 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

openinx commented Mar 10, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stevenzwu commented Mar 23, 2021

Uh oh!

stevenzwu commented May 14, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

holdenk commented Jun 25, 2021

Uh oh!

holdenk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu commented Jun 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu commented Jul 1, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevenzwu commented Mar 8, 2021 •

edited

Loading

openinx commented Mar 10, 2021 •

edited

Loading

stevenzwu commented Jun 25, 2021 •

edited

Loading

stevenzwu Jul 8, 2021 •

edited

Loading