Druid with Oak supporting also plain mode v04 #6235

sanastas · 2018-08-26T10:26:37Z

Hi Everybody,

Please take a look on the following Oak Incremental Index Pool Request. Below please find a list of the help material. Although we will soon present also the performance results, we would like to hear your thoughts and comments from the correctness and software engineering point of views.

Thanks,
Anastasia

In continuation to our great talk yesterday (where we agreed about publishing a PR for Oak Incremental Index for Druid), hereby please find some reading material about Oak. The pool request itself will be ready for tomorrow. I would strongly suggest to read some of the documentation prior to looking on the code, as it may make the last task easier and clearer. As I said the Oak is now an open source library (https://github.com/yahoo/Oak) and its README might help you to understand the Oak's interface and new Druid's code.

Recall we had an issue#5698 about Oak Incremental Index (#5698) and there you can find another useful documents to start from. We will publish the latest single-thread ingestion benchmark results tomorrow, together with the pool request. Feel free to ask questions!

Files from the issue:

Oak Introduction: https://github.com/druid-io/druid/files/1946175/OakIntroduction.pdf
Oak Details Presentation: https://github.com/druid-io/druid/files/1946182/OAK.Off-Heap.Allocated.Keys.pptx
Suggested Refactoring: https://github.com/druid-io/druid/files/1947353/Incremental.Index.Refactoring.Suggestion.pdf
Some additional Oak results: https://github.com/druid-io/druid/files/1959106/OakMap.OnHeap.with.Integer.Values.pdf

Here one can see initial ingestion benchmark results:
IndexIngestionBenchmarkWithOak.pdf

sdimbsn · 2018-08-27T11:23:39Z

processing/src/main/java/io/druid/segment/incremental/OakValueSerializer.java

+  @Override
+  public Row deserialize(ByteBuffer serializedValue)
+  {
+    throw new UnsupportedOperationException(); // cannot be deserialized without the IncrementalIndexRow


Shouldn't it be implemented ? Iterators may use it.

sdimbsn · 2018-08-27T11:24:59Z

processing/src/main/java/io/druid/segment/incremental/OakValueSerializer.java

+public class OakValueSerializer implements Serializer<Row>
+{
+
+  private List<DimensionDesc> dimensionDescsList;


Never used here. Can be removed.

ebortnik · 2018-08-27T08:48:53Z

processing/src/main/java/io/druid/segment/incremental/IncrementalIndex.java


  // This is modified on add() in a critical section.
-  private final ThreadLocal<InputRow> in = new ThreadLocal<>();
-  private final Supplier<InputRow> rowSupplier = in::get;
+  protected final ThreadLocal<InputRow> in = new ThreadLocal<>();


Better to have private data members and protected accessors

ebortnik · 2018-08-27T14:55:33Z

processing/src/main/java/io/druid/segment/incremental/OakIncrementalIndex.java

+      memoryCapacity = (int) maxDirectMemory;
+    }
+
+    OakMapBuilder builder = new OakMapBuilder()


If there is some builder for IncrementalIndex, maybe it should have its own permanent OakMapBuilder. What is the rationale in creating the builder ad hoc?

ebortnik · 2018-08-27T14:56:44Z

processing/src/main/java/io/druid/segment/incremental/OakIncrementalIndex.java

+      }
+    }
+
+    oak = builder.build();


Ideally, the builder should be singleton, and this line should be the only one in the c'tor - no?

ebortnik · 2018-08-27T14:58:28Z

processing/src/main/java/io/druid/segment/incremental/OakIncrementalIndex.java

+      }
+    };
+
+    OakMap tmpOakMap = descending ? oak.descendingMap() : oak;


You could keep using "oak"

ebortnik · 2018-08-27T14:59:31Z

processing/src/main/java/io/druid/segment/incremental/OakIncrementalIndex.java

+  @Override
+  public Iterable<Row> iterableWithPostAggregations(List<PostAggregator> postAggs, boolean descending)
+  {
+    Function<Map.Entry<ByteBuffer, ByteBuffer>, Row> transformer = new Function<Map.Entry<ByteBuffer, ByteBuffer>, Row>()


Maybe this line should be a separate function ..

ebortnik · 2018-08-27T15:34:14Z

processing/src/main/java/io/druid/segment/incremental/OakKeysComparator.java

+      final Object lhsIdxs = lhs.getDims()[index];
+      final Object rhsIdxs = rhs.getDims()[index];
+
+      if (lhsIdxs == null) {


lhsIdxs == null && rhsIdxs == null

ebortnik · 2018-08-27T15:35:31Z

processing/src/main/java/io/druid/segment/incremental/OakKeysComparator.java

+
+    int index = 0;
+    while (retVal == 0 && index < numComparisons) {
+      final Object lhsIdxs = lhs.getDims()[index];


lhs.getDims() and rhs.getDims() to local vars (not sure the compiler does the job)

ebortnik · 2018-08-27T15:35:39Z

processing/src/main/java/io/druid/segment/incremental/OakKeysComparator.java

+  }
+
+  @Override
+  public int compareKeys(IncrementalIndexRow lhs, IncrementalIndexRow rhs)


Is this code a copy-and-paste from somewhere? Any chance for code reuse?

ebortnik · 2018-08-27T15:38:22Z

processing/src/main/java/io/druid/segment/incremental/OakKeysComparator.java

+import java.nio.ByteBuffer;
+import java.util.List;
+
+public class OakKeysComparator implements OakComparator<IncrementalIndexRow>


Lots of similar code down the road. Can the code be restructured into smaller methods, to be used in a modular way?

ebortnik · 2018-08-27T15:39:51Z

processing/src/main/java/io/druid/segment/incremental/OffheapAggsManager.java

+    aggregate(reportParseExceptions, row, rowContainer, byteBuffer);
+  }
+
+  public void aggregate(


Looks like copy-and-paste from OakIncrementalIndex, please reuse

sdimbsn · 2018-08-27T11:39:47Z

processing/src/main/java/io/druid/segment/incremental/OakIncrementalIndex.java

+    return dimObject;
+  }
+
+  static boolean checkDimsAllNull(ByteBuffer buff, int numComparisons)


Consider to move it to external DimsUtils class. It is not related to internals of Incremental index.

sdimbsn · 2018-08-27T11:40:14Z

processing/src/main/java/io/druid/segment/incremental/OakIncrementalIndex.java

+
+  static ValueType getDimValueType(int dimIndex, List<DimensionDesc> dimensionDescsList)
+  {
+    DimensionDesc dimensionDesc = dimensionDescsList.get(dimIndex);


Consider to move it to external DimsUtils class. It is not related to internals of Incremental index.

sdimbsn · 2018-08-27T14:16:57Z

processing/src/main/java/io/druid/segment/incremental/OakIncrementalIndex.java

+
+  private Integer addToOak(
+          InputRow row,
+          AtomicInteger numEntries,


It may be more clear to let the method use numEntries field instead.

sdimbsn · 2018-08-27T14:19:05Z

processing/src/main/java/io/druid/segment/incremental/OakIncrementalIndex.java

+          InputRow row,
+          AtomicInteger numEntries,
+          IncrementalIndexRow incrementalIndexRow,
+          ThreadLocal<InputRow> rowContainer,


Consider using in field instead. Will reduce method signature and it will be easier to understand the functionality.

sdimbsn · 2018-08-27T14:26:29Z

processing/src/main/java/io/druid/segment/incremental/OakIncrementalIndex.java

+    if (numEntries.get() < maxRowCount || skipMaxRowsInMemoryCheck) {
+      oak.putIfAbsentComputeIfPresent(incrementalIndexRow, row, computer);
+
+      int currSize = oak.entries();


The code tries to sync numEntries field with oak.entries() value.
Consider to override methods querying numEntries to return oak.entries() value and ignore numEntries in OakIncrementalIndex.

sdimbsn · 2018-08-28T07:50:00Z

processing/src/main/java/io/druid/segment/incremental/OakIncrementalIndex.java

+    return aggsManager.getMetricAggs();
+  }
+
+  public static void aggregate(


It seems that this method never used.

sdimbsn · 2018-08-28T08:38:45Z

processing/src/main/java/io/druid/segment/incremental/OakIncrementalIndex.java

+    OakMap tmpOakMap = descending ? oak.descendingMap() : oak;
+    OakTransformView transformView = tmpOakMap.createTransformView(transformer);
+    CloseableIterator<Row> valuesIterator = transformView.valuesIterator();
+    return new Iterable<Row>()


Closable iterator close() can not be invoked here. May block memory GC.

sdimbsn · 2018-08-28T08:41:41Z

processing/src/main/java/io/druid/segment/incremental/OakIncrementalIndex.java

+  {
+    CloseableIterator<IncrementalIndexRow> keysIterator = oak.keysIterator();
+
+    return new Iterable<IncrementalIndexRow>() {


No close for keysIterator here, may block GC.

sdimbsn · 2018-08-28T08:43:36Z

processing/src/main/java/io/druid/segment/incremental/OakIncrementalIndex.java

+    if (descending == true) {
+      subMap = subMap.descendingMap();
+    }
+    CloseableIterator<IncrementalIndexRow> keysIterator = subMap.keysIterator();


No close() call for keysIter here. May cause GC block.

sdimbsn · 2018-08-28T08:53:07Z

processing/src/main/java/io/druid/segment/incremental/OakIncrementalIndex.java

+    this.versionCounter = new AtomicInteger(0);
+    IncrementalIndexRow minIncrementalIndexRow = getMinIncrementalIndexRow();
+
+    long maxDirectMemory = VM.maxDirectMemory();


It may be better to get the capacity in constructor, e.g. instead of rowsCount.

fjy · 2018-08-30T17:03:40Z

@sanastas do you mind fixing merge conflicts?

KenjiTakahashi · 2018-09-01T00:13:51Z

Sorry for chiming in early, but is there any way to use this in a real (not just tests/benchmarks) scenario? I've tried "brute force" by modifying realtime/plumber/Sink to use OffheapOak, instead of the current Onheap default. But all that's happening is index Task spinning the CPU for a looong time and eventually going OOM.
I guess this is not the way it should be :-).
If this is just not ready for real world testing yet, that's fine. I just felt like giving it a spin.

sanastas · 2018-09-02T07:33:29Z

@fjy , thanks for your comment! Also the code changes are constantly happening, we will try to resolve the merge conflicts once in a week or so.

sanastas · 2018-09-02T07:46:25Z

@KenjiTakahashi , thanks for taking a try! Unfortunately, indeed Oak is not yet connected to be tested end-to-end (in a real scenario). The integration of an off-heap index (specially when also keys are off-heap) requires some baby steps in order to make it efficiently. So we are moving inside-out checking how the oak off-heap index can be used and introducing some Druid changes so the usage is (at least) more or less efficient. As off-heap index (buffer based) and on-heap index (object based) should be used differently.

However, we clearly understand the need and the urgency to allow the real usage of Oak so we hope to make it work soon. In a meanwhile, as this Pool Request is going bigger and bigger we would like to hear your comments about the code so far.

Thanking all reviewers in advance!

KenjiTakahashi · 2018-09-03T02:52:05Z

@sanastas Thanks for the explanations. Well, I'll leave the CR here to the ones more familiar with this code :-).
But I'll definitely keep my eye on this. We are currently profiling our ingestion speeds and we've seen that these SkipListMaps are one of the major pain points. So I have high hopes for Oak to improve upon this.

sanastas · 2018-09-03T07:24:36Z

Thank you, @KenjiTakahashi ! We are working to connect the Oak end-to-end soon!

sanastas · 2018-09-12T14:32:05Z

According to the request we have fixed the merge conflicts. The result is PR#6327. I am closing this PR, the comment existing here will be all fixed. The discussion is moved to PR#6327.

gsheffi added 3 commits August 23, 2018 10:40

Pull request

b85b7e7

Removing files

758a7d3

Removing the last extra file

ac73768

sanastas mentioned this pull request Aug 26, 2018

Oak: New Concurrent Key-Value Map #5698

Closed

sdimbsn reviewed Aug 27, 2018

View reviewed changes

ebortnik reviewed Aug 27, 2018

View reviewed changes

sdimbsn reviewed Aug 28, 2018

View reviewed changes

drcrallen added Area - Batch Ingestion Feature labels Aug 28, 2018

sanastas closed this Sep 12, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Druid with Oak supporting also plain mode v04 #6235

Druid with Oak supporting also plain mode v04 #6235

sanastas commented Aug 26, 2018

sdimbsn Aug 27, 2018

sdimbsn Aug 27, 2018

ebortnik Aug 27, 2018

ebortnik Aug 27, 2018

ebortnik Aug 27, 2018

ebortnik Aug 27, 2018

ebortnik Aug 27, 2018

ebortnik Aug 27, 2018

ebortnik Aug 27, 2018

ebortnik Aug 27, 2018

ebortnik Aug 27, 2018

ebortnik Aug 27, 2018

sdimbsn Aug 27, 2018

sdimbsn Aug 27, 2018

sdimbsn Aug 27, 2018

sdimbsn Aug 27, 2018

sdimbsn Aug 27, 2018

sdimbsn Aug 28, 2018

sdimbsn Aug 28, 2018

sdimbsn Aug 28, 2018

sdimbsn Aug 28, 2018

sdimbsn Aug 28, 2018

fjy commented Aug 30, 2018

KenjiTakahashi commented Sep 1, 2018

sanastas commented Sep 2, 2018

sanastas commented Sep 2, 2018

KenjiTakahashi commented Sep 3, 2018

sanastas commented Sep 3, 2018

sanastas commented Sep 12, 2018

Druid with Oak supporting also plain mode v04 #6235

Druid with Oak supporting also plain mode v04 #6235

Conversation

sanastas commented Aug 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fjy commented Aug 30, 2018

KenjiTakahashi commented Sep 1, 2018

sanastas commented Sep 2, 2018

sanastas commented Sep 2, 2018

KenjiTakahashi commented Sep 3, 2018

sanastas commented Sep 3, 2018

sanastas commented Sep 12, 2018