[wip] object pooling in the distsql planning path #30556

jordanlewis · 2018-09-24T14:00:25Z

This PR pools many of the expensive top-level objects during DistSQL physical planning and flow setup. I haven't cleaned it up - comments are missing, things might be misplaced, tests might fail - but I was iterating on this to try to get rid of some of the discrepancy performance-wise between auto and 2.0-auto on the kv95 workload. But pushing up for early feedback on the approach. Take a look at the Release and Reset methods and when they're called in the object lifecycles.

I saw some pretty good results on this.

name                           old time/op    new time/op    delta
FlowSetup/distribute=false-24    18.4µs ± 7%    15.2µs ± 3%  -17.52%  (p=0.000 n=9+9)

name                           old alloc/op   new alloc/op   delta
FlowSetup/distribute=false-24    32.3kB ± 0%    24.1kB ± 0%  -25.34%  (p=0.000 n=10+10)

name                           old allocs/op  new allocs/op  delta
FlowSetup/distribute=false-24       299 ± 0%       243 ± 0%  -18.75%  (p=0.000 n=10+10)

Note that this PR is based on #30464 - ignore those commits.

cc @nvanbenschoten @andreimatei

Previously, expressions that needed to be used in processors were written to a buffer with FmtParsable and re-parsed and type checked with the parser, even if the processors were running in the local flow mode. This was inefficient. Now, we use a similar strategy to the "LocalPlanNode" system to avoid this inefficiency. When we go to save an Expr to a processor spec, if we're writing to a spec that will be inflated on the gateway, instead of formatting the Expr to text, we save the Expr in a slice and record its index. Then, at inflation time, instead of parsing the text we would have written, we simply look up the Expr in the slice we saved it to. This should eliminate one of the major inefficiences introduced by the distsql merge. Release note: None

Rather than having to do the confusing dance of keeping around the slice of Exprs, I found a way to have protobuf not generate the type declaration for a message. Using this, I added the local tree.Expr directly to the Expression struct when necessary. This simplifies the code a lot. Leaving this as a separate commit during review, will squash later. Release note: None

The following objects are now kept in memory pools and reset and released instead of letting the GC reclaim them: - TableReaderSpec - copyingRowReceiver - FlowSpec - tableReader These objects tend to have several slice fields on them. Code was changed to resize/reslice these fields instead of always making them anew, and the Release/Reset methods now 0-size-slice these fields instead of letting them be cleared. This has the effect of saving a lot of slice allocations. Release note: None

Release note: None

It always gets allocated eventually anyway. An upcoming commit will introduce pooling of these objects as well. Release note: None

Release note: None

cockroach-teamcity · 2018-09-24T14:00:37Z

This change is

rjnn

Great job! R1 and R2 are in a separate PR, and I'd advocate for you sending R3, R4, and R5+R6 in 3 separate PRs as well to get this merged as smoothly as possible.

I'm surprised that pooling the allocations is this much of a win. Can you please refresh my memory as to how much of an overall win this is? What fraction of our time is spent in flow setup?

Reviewable status: complete! 0 of 0 LGTMs obtained

jordanlewis · 2018-09-24T15:31:07Z

I'm surprised that pooling the allocations is this much of a win. Can you please refresh my memory as to how much of an overall win this is? What fraction of our time is spent in flow setup?

The overall win-ness of this is still up for debate. It takes GC pressure off the system, for sure, but that doesn't always manifest as a throughput improvement.

Flow setup and DistSQL planning are the main extra work steps between the local execution engine and the DistSQL execution engine, though, so it follows that improving their performance should help alleviate the performance difference.

rjnn · 2018-09-24T16:38:01Z

I guess this is a separate conversation, but we need to drill down on why GC pressure reductions are hard to benchmark, especially given that GC time is such a large fraction of CPU time in our profiles.

I otherwise buy the argument that this improvement is good, and we should get it merged. What are your thoughts on its backport-worthiness?

jordanlewis · 2018-09-24T16:54:17Z

Doesn't feel particularly backport worthy, unless for some reason we're desperate for extra performance.

rjnn · 2018-09-24T18:27:34Z

Great, that makes me feel more comfortable with this change. I am now emphatically in favor of this!

asubiotto · 2018-09-25T15:52:13Z

Could you share some kv throughput numbers as well?

30530: opt: Fix bug in function interning. r=andy-kimball a=andy-kimball Some builtin function overloads have a return type that's dependent on the type of its parameters (e.g. unnest). This means that two FuncOpDef structs with the same overload can have different Type fields. Therefore, the FuncOpDef interning needs to incorporate that type. Release note: None 30608: sqlbase: introduce RowFetcher.Reset() r=jordanlewis a=jordanlewis In preparation for pooling tableReader objects and reusing internal allocated memory. Broken out from #30556. Release note: None 30727: cli: Make "unknown sub-command" error clearer for empty subcommands r=a-robinson a=a-robinson I ran `cockroach zone` to try and get the list of available sub-commands printed as help text, but ended up getting confused by what it printed: Error: unknown sub-command: Failed running "zone" I wasn't really sure what this meant. Now it'll print: Error: unknown sub-command: "" Failed running "zone" It may also be better to say something like "unknown or missing sub-command". Let me know if you'd prefer that. Release note: None --- Along these lines, how come the available subcommands don't get printed anymore? I can't get them listed either with `cockroach zone` or `cockroach zone -h`. I could have sworn we used to list them. Did cobra change? Maybe I'm conflating cockroach with other tools. It'd certainly be nice if we did, though. Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com> Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com> Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>

30676: distsqlrun: pool TableReaders r=jordanlewis a=jordanlewis And permit ProcOutputHelper and ProcessorBase to be reset without throwing away all slice memory. Extracted from #30556. Release note: None Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>

30607: sql: pool FlowSpecs and TableReaderSpecs r=jordanlewis a=jordanlewis Broken out from #30556. The first commit is a slightly unrelated but necessary change to refactor plan diagram creation. The explain analyze output changes slightly as a result - the stats diagram lines are moved to the end of the box, instead of always occurring before the post-process lines in the box. I don't think this is a big deal - if we really care it's possible to fix, but it just takes a little more bookkeeping. Previously, plan diagram creation and annotation of plan diagrams with span stats was done in one step that couldn't be decomposed. This commit changes the interface of plan diagrams a bit to permit decomposing the two steps. Now, plan diagrams are represented with an interface, FlowDiagram, that exposes a method to generate JSON and a method to augment the diagram with statistics from tracing spans. This is necessary to permit more granular resource management of FlowSpecs in the next commit - the plan diagram needs to be generated before the flow is cleaned up, but the tracing spans are only available after the flow is cleaned up. n.b. this moves the stats lines of each processor in the diagram below the post processor line. Release note: None Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>

bobvawter · 2020-03-10T16:33:28Z

All committers have signed the CLA.

jordanlewis added 6 commits September 20, 2018 18:50

sql: pool distsql receivers

13a50f9

Release note: None

sql: pass PhysicalPlan by pointer

377b73e

It always gets allocated eventually anyway. An upcoming commit will introduce pooling of these objects as well. Release note: None

sql: pool PhysicalPlan objects

fe5a9e8

Release note: None

jordanlewis requested review from solongordon, rjnn, asubiotto and a team September 24, 2018 14:00

rjnn reviewed Sep 24, 2018

View reviewed changes

This was referenced Sep 24, 2018

sql: pool FlowSpecs and TableReaderSpecs #30607

Merged

sqlbase: introduce RowFetcher.Reset() #30608

Merged

jordanlewis mentioned this pull request Sep 26, 2018

distsqlrun: pool TableReaders #30676

Merged

solongordon removed their request for review May 15, 2019 15:19

jordanlewis added the X-noremind Bots won't notify about PRs with X-noremind label May 22, 2019

jordanlewis closed this Jun 2, 2021

jordanlewis deleted the dpp-allocs branch June 2, 2021 23:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wip] object pooling in the distsql planning path #30556

[wip] object pooling in the distsql planning path #30556

jordanlewis commented Sep 24, 2018

cockroach-teamcity commented Sep 24, 2018

rjnn left a comment

jordanlewis commented Sep 24, 2018

rjnn commented Sep 24, 2018

jordanlewis commented Sep 24, 2018

rjnn commented Sep 24, 2018

asubiotto commented Sep 25, 2018

bobvawter commented Mar 10, 2020 •

edited by cockroach-teamcity

Loading

[wip] object pooling in the distsql planning path #30556

[wip] object pooling in the distsql planning path #30556

Conversation

jordanlewis commented Sep 24, 2018

cockroach-teamcity commented Sep 24, 2018

rjnn left a comment

Choose a reason for hiding this comment

jordanlewis commented Sep 24, 2018

rjnn commented Sep 24, 2018

jordanlewis commented Sep 24, 2018

rjnn commented Sep 24, 2018

asubiotto commented Sep 25, 2018

bobvawter commented Mar 10, 2020 • edited by cockroach-teamcity Loading

bobvawter commented Mar 10, 2020 •

edited by cockroach-teamcity

Loading