New output buffer implementation #5002

dain · 2016-04-12T19:47:30Z

No description provided.

dain · 2016-04-14T00:57:17Z

Note: Github, in its infinite wisdom, has shuffled the commit order.

dain · 2016-04-22T17:07:39Z

@haozhun can you review "Add new PartitionedBuffer" and "Add new BroadcastBuffer"

haozhun · 2016-04-26T19:24:17Z

presto-main/src/main/java/com/facebook/presto/OutputBuffers.java

+
+    public static OutputBuffers createInitialEmptyOutputBuffers()
+    {
+        return new OutputBuffers(0, false, ImmutableMap.<TaskId, Integer>of());


ImmutableMap.of() will just work

dain · 2016-06-23T05:11:06Z

I renamed the classes to make it clear which ones are specific to the old SharedBuffer code and which ones are generic. I also named OutputBuffer implementations to end with "OutputBuffer" for clarity.

Rename SharedBufferInfo to OutputBufferInfo
Rename SharedBuffer to SharedOutputBuffer
Rename SharedBufferMemoryManager to OutputBufferMemoryManager

I reworked the URI and BufferId management logic in the query scheduler. The key insight was that I could force the partition number and the taskId to be the same for partitioned outputs, with some minor changes. With that, I didn't need most of my changes, so the old code is restored:

Guarantee task id and output partition are the same
Remove unused partition from OutputBufferManager
Remove unused partition from RemoteTask
Declare out buffers immediately for non-broadcast stages

Finally, when working on the new BroadcastOutputBuffer, I discovered that I could reuse the internal PartitionedOutputBuffer.Partition object in BroadcastOutputBuffer by adding reference counting. I pulled out this class into a ClientBuffer and added tests for the client protocol. I'll squash the first two commits:

Extract PartitionedOutputBuffer.Partition into ClientBuffer
Add reference counting to ClientBuffer
Add new BroadcastOutputBuffer

haozhun · 2016-06-28T22:59:24Z

Looks good
- Move SharedBuffer and related classes to new package
- Add OutputBuffers.createInitialEmptyOutputBuffers
- Rename SharedBufferInfo to OutputBufferInfo
- Rename SharedBuffer to SharedOutputBuffer
- Rename SharedBufferMemoryManager to OutputBufferMemoryManager
- Fix thread safety issue in BroadcastOutputBufferManager
- Change TaskId.id to be an int
- Replace TaskId in output buffers with a generic OutputBufferId
- Remove unused partition from OutputBufferManager
- Remove unused partition from RemoteTask
- Register output buffers before adding splits
- Add LazyOutputBuffer to delay creation until plan is sent
- Add buffer type to OutputBuffers to allow for different buffer implementations
- Extract PartitionedOutputBuffer.Partition into ClientBuffer
Extract interface OutputBuffer from SharedBuffer
- Looks good, except comments below
- OutputBuffer.setNoMorePages javadoc: ignored
Guarantee task id and output partition are the same
- Looks good
- I would like to understand callers of SqlStageExecution.scheduleTask/Splits better
Declare out buffers immediately for non-broadcast stages
- Looks good, except minor comments below
- PartitionedOutputBufferManager.addOutputBuffers
  - Validate that noMoreBuffer does not change from true to false, unless there's a valid use case to do that.
  - Calling withBuffer a bunch of times will allocate a bunch of objects and create the version a lot. I guess these don't really matter. But I would create a map and then pass the map in just to be on the safe side.
Move shared buffer not-full notification to a new thread
- SharedOutputBuffer: duplicate requireNonNull for maxBufferSize and systemMemoryUsageListener
- Question about OutputBufferMemoryManager:
  - Is there any particular reason updateMemoryUsage submit the job outside the sync block and setNoBlockOnFull submit the job inside the sync block?
Add new PartitionedOutputBuffer
- Why is calling checkFlushComplete necessary/useful in setNoMorePages or the constructor?
Add reference counting to ClientBuffer
- In destroy, pendingRead should be completed with emptyResults(taskInstanceId, sequenceId, true)
- In PageReference.addReference, checkState(referenceCount.getAndIncrement() == 0, "...")
- I didn't know about FieldAccessNotGuarded, that's super nice. Let's add GuardedBy to currentSequenceId and destroyed
Add new BroadcastOutputBuffer
- enqueue not thread safe: gap before safeGetBuffersSnapshot can mean duplicate pages.
- Discussion item: noMoreBuffers: put dereferencePage in finally. Other usages of dereferencePage may also benefit if they get put in a finally block. Should we do that? Or should we care at all?

Note: Please squash "Extract PartitionedOutputBuffer.Partition into ClientBuffer". For these multi-threaded code, I find it easier to reason about correctness when I have the full picture.

dain · 2016-07-06T01:06:37Z

* Declare out buffers immediately for non-broadcast stages
  * PartitionedOutputBufferManager.addOutputBuffers
    * Validate that noMoreBuffer does not change from `true` to `false`, unless there's a valid use case to do that.
    * Calling `withBuffer` a bunch of times will allocate a bunch of objects and create the `version` a lot. I guess these don't really matter. But I would create a map and then pass the map in just to be on the safe side.

The noMoreBuffer check is handled by the validate code, and the object allocation is not really a problem when you consider that Map will allocate an object for each entry anyway (also partitions is typically small).

The real issues is with the version number which the checkValidTransition code validates. I just rewrote the code to check the buffers directly and avoided the checkValidTransition code.

* Move shared buffer not-full notification to a new thread
  * Question about `OutputBufferMemoryManager`:
    * Is there any particular reason `updateMemoryUsage` submit the job outside the sync block and `setNoBlockOnFull` submit the job inside the sync block?

No, I simplified the code.

* Add new PartitionedOutputBuffer
  * Why is calling `checkFlushComplete` necessary/useful in `setNoMorePages` or the constructor?

setNoMorePages can free a reader which will see the finished flag. Then there is a race to destroy the buffers. It is simpler to just double check at the end of the method, then try to determine if this is a benign race. As for the constructor, it is likely not needed, but doesn't hurt anything (potentially, you could have zero partitions, but that seems weird).

* Add reference counting to ClientBuffer
  * In `destroy`, `pendingRead` should be completed with `emptyResults(taskInstanceId, sequenceId, true)`

This doesn't matter. There is only one client and the destroy comes from the client. This has the benefit of keeping the API surface area smaller.

dain · 2016-07-06T01:07:00Z

I will merge once the current release goes out

Rename PartitionBuffer to SharedOutputBufferPartition

Field was being read in synchronized and unsynchronized contexts.

LazyOutputBuffer delays buffer creation until query plan is recieved so we can select different buffer implementations. Add config option to enable new buffer implementation.

…entations

facebook-github-bot added the CLA Signed label Apr 12, 2016

dain force-pushed the new-output-buffers branch from e175a9a to 4a4e2bf Compare April 12, 2016 19:50

dain assigned nileema Apr 12, 2016

dain force-pushed the new-output-buffers branch from 4a4e2bf to 4fe40f0 Compare April 12, 2016 22:29

dain force-pushed the new-output-buffers branch from 4fe40f0 to a4c1581 Compare April 21, 2016 00:52

dain assigned haozhun and unassigned nileema Apr 22, 2016

haozhun reviewed Apr 26, 2016
View reviewed changes

dain force-pushed the new-output-buffers branch from f0a86b7 to 8dacc1d Compare June 23, 2016 05:10

dain force-pushed the new-output-buffers branch from 8dacc1d to b3aa4a8 Compare July 6, 2016 01:06

dain added the accepted label Jul 6, 2016

dain unassigned haozhun Jul 6, 2016

dain added 19 commits July 6, 2016 20:06

Move SharedBuffer and related classes to new package

7e6f550

Add OutputBuffers.createInitialEmptyOutputBuffers

8d6635d

Extract interface OutputBuffer from SharedBuffer

3b6f8dd

Rename SharedBufferInfo to OutputBufferInfo

b5d0d32

Rename SharedBuffer to SharedOutputBuffer

0f25ed8

Rename PartitionBuffer to SharedOutputBufferPartition

Rename SharedBufferMemoryManager to OutputBufferMemoryManager

0a5845b

Fix thread safety issue in BroadcastOutputBufferManager

cb2bd33

Field was being read in synchronized and unsynchronized contexts.

Change TaskId.id to be an int

f604b4d

Replace TaskId in output buffers with a generic OutputBufferId

26da1f0

Guarantee task id and output partition are the same

ffd80fd

Remove unused partition from OutputBufferManager

f2f2020

Remove unused partition from RemoteTask

ded333e

Declare out buffers immediately for non-broadcast stages

05f904c

Register output buffers before adding splits

f7f11dc

Add LazyOutputBuffer to delay creation until plan is sent

c599230

LazyOutputBuffer delays buffer creation until query plan is recieved so we can select different buffer implementations. Add config option to enable new buffer implementation.

Add buffer type to OutputBuffers to allow for different buffer implem…

53196b8

…entations

Move shared buffer not-full notification to a new thread

832b4bd

Add new PartitionedOutputBuffer

12b537d

Add new BroadcastOutputBuffer

800df76

dain force-pushed the new-output-buffers branch from b3aa4a8 to 800df76 Compare July 7, 2016 03:11

dain merged commit 800df76 into prestodb:master Jul 7, 2016

dain deleted the new-output-buffers branch July 7, 2016 03:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New output buffer implementation #5002

New output buffer implementation #5002

dain commented Apr 12, 2016

dain commented Apr 14, 2016

dain commented Apr 22, 2016

haozhun Apr 26, 2016

dain commented Jun 23, 2016

haozhun commented Jun 28, 2016 •

edited

Loading

dain commented Jul 6, 2016

dain commented Jul 6, 2016

New output buffer implementation #5002

New output buffer implementation #5002

Conversation

dain commented Apr 12, 2016

dain commented Apr 14, 2016

dain commented Apr 22, 2016

haozhun Apr 26, 2016

Choose a reason for hiding this comment

dain commented Jun 23, 2016

haozhun commented Jun 28, 2016 • edited Loading

dain commented Jul 6, 2016

dain commented Jul 6, 2016

haozhun commented Jun 28, 2016 •

edited

Loading