Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] object pooling in the distsql planning path #30556

Closed
wants to merge 6 commits into from

Conversation

jordanlewis
Copy link
Member

This PR pools many of the expensive top-level objects during DistSQL physical planning and flow setup. I haven't cleaned it up - comments are missing, things might be misplaced, tests might fail - but I was iterating on this to try to get rid of some of the discrepancy performance-wise between auto and 2.0-auto on the kv95 workload. But pushing up for early feedback on the approach. Take a look at the Release and Reset methods and when they're called in the object lifecycles.

I saw some pretty good results on this.

name                           old time/op    new time/op    delta
FlowSetup/distribute=false-24    18.4µs ± 7%    15.2µs ± 3%  -17.52%  (p=0.000 n=9+9)

name                           old alloc/op   new alloc/op   delta
FlowSetup/distribute=false-24    32.3kB ± 0%    24.1kB ± 0%  -25.34%  (p=0.000 n=10+10)

name                           old allocs/op  new allocs/op  delta
FlowSetup/distribute=false-24       299 ± 0%       243 ± 0%  -18.75%  (p=0.000 n=10+10)

Note that this PR is based on #30464 - ignore those commits.

cc @nvanbenschoten @andreimatei

Previously, expressions that needed to be used in processors were
written to a buffer with FmtParsable and re-parsed and type checked
with the parser, even if the processors were running in the local flow
mode. This was inefficient.

Now, we use a similar strategy to the "LocalPlanNode" system to avoid
this inefficiency. When we go to save an Expr to a processor spec, if
we're writing to a spec that will be inflated on the gateway, instead of
formatting the Expr to text, we save the Expr in a slice and record its
index. Then, at inflation time, instead of parsing the text we would
have written, we simply look up the Expr in the slice we saved it to.

This should eliminate one of the major inefficiences introduced by the
distsql merge.

Release note: None
Rather than having to do the confusing dance of keeping around the slice
of Exprs, I found a way to have protobuf not generate the type
declaration for a message. Using this, I added the local tree.Expr
directly to the Expression struct when necessary. This simplifies the
code a lot.

Leaving this as a separate commit during review, will squash later.

Release note: None
The following objects are now kept in memory pools and reset and
released instead of letting the GC reclaim them:

- TableReaderSpec
- copyingRowReceiver
- FlowSpec
- tableReader

These objects tend to have several slice fields on them. Code was
changed to resize/reslice these fields instead of always making them
anew, and the Release/Reset methods now 0-size-slice these fields
instead of letting them be cleared. This has the effect of saving a lot
of slice allocations.

Release note: None
Release note: None
It always gets allocated eventually anyway. An upcoming commit will
introduce pooling of these objects as well.

Release note: None
@jordanlewis jordanlewis requested review from solongordon, rjnn, asubiotto and a team September 24, 2018 14:00
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@rjnn rjnn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! R1 and R2 are in a separate PR, and I'd advocate for you sending R3, R4, and R5+R6 in 3 separate PRs as well to get this merged as smoothly as possible.

I'm surprised that pooling the allocations is this much of a win. Can you please refresh my memory as to how much of an overall win this is? What fraction of our time is spent in flow setup?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@jordanlewis
Copy link
Member Author

I'm surprised that pooling the allocations is this much of a win. Can you please refresh my memory as to how much of an overall win this is? What fraction of our time is spent in flow setup?

The overall win-ness of this is still up for debate. It takes GC pressure off the system, for sure, but that doesn't always manifest as a throughput improvement.

Flow setup and DistSQL planning are the main extra work steps between the local execution engine and the DistSQL execution engine, though, so it follows that improving their performance should help alleviate the performance difference.

@rjnn
Copy link
Contributor

rjnn commented Sep 24, 2018

I guess this is a separate conversation, but we need to drill down on why GC pressure reductions are hard to benchmark, especially given that GC time is such a large fraction of CPU time in our profiles.

I otherwise buy the argument that this improvement is good, and we should get it merged. What are your thoughts on its backport-worthiness?

@jordanlewis
Copy link
Member Author

Doesn't feel particularly backport worthy, unless for some reason we're desperate for extra performance.

@rjnn
Copy link
Contributor

rjnn commented Sep 24, 2018

Great, that makes me feel more comfortable with this change. I am now emphatically in favor of this!

@asubiotto
Copy link
Contributor

Could you share some kv throughput numbers as well?

craig bot pushed a commit that referenced this pull request Sep 27, 2018
30530: opt: Fix bug in function interning. r=andy-kimball a=andy-kimball

Some builtin function overloads have a return type that's dependent on
the type of its parameters (e.g. unnest). This means that two FuncOpDef
structs with the same overload can have different Type fields. Therefore,
the FuncOpDef interning needs to incorporate that type.

Release note: None

30608: sqlbase: introduce RowFetcher.Reset() r=jordanlewis a=jordanlewis

In preparation for pooling tableReader objects and reusing internal
allocated memory.

Broken out from #30556.

Release note: None

30727: cli: Make "unknown sub-command" error clearer for empty subcommands r=a-robinson a=a-robinson

I ran `cockroach zone` to try and get the list of available sub-commands
printed as help text, but ended up getting confused by what it printed:

Error: unknown sub-command:
Failed running "zone"

I wasn't really sure what this meant. Now it'll print:

Error: unknown sub-command: ""
Failed running "zone"

It may also be better to say something like "unknown or missing
sub-command". Let me know if you'd prefer that.

Release note: None

---

Along these lines, how come the available subcommands don't get printed anymore? I can't get them listed either with `cockroach zone` or `cockroach zone -h`. I could have sworn we used to list them. Did cobra change? Maybe I'm conflating cockroach with other tools. It'd certainly be nice if we did, though.

Co-authored-by: Andrew Kimball <andyk@cockroachlabs.com>
Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
Co-authored-by: Alex Robinson <alexdwanerobinson@gmail.com>
craig bot pushed a commit that referenced this pull request Oct 2, 2018
30676: distsqlrun: pool TableReaders r=jordanlewis a=jordanlewis

And permit ProcOutputHelper and ProcessorBase to be reset without
throwing away all slice memory.

Extracted from #30556.

Release note: None

Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
craig bot pushed a commit that referenced this pull request Oct 3, 2018
30607: sql: pool FlowSpecs and TableReaderSpecs r=jordanlewis a=jordanlewis

Broken out from #30556.

The first commit is a slightly unrelated but necessary change to refactor
plan diagram creation. The explain analyze output changes slightly as a
result - the stats diagram lines are moved to the end of the box, instead of
always occurring before the post-process lines in the box. I don't think this
is a big deal - if we really care it's possible to fix, but it just takes a little more
bookkeeping.

Previously, plan diagram creation and annotation of plan diagrams with
span stats was done in one step that couldn't be decomposed. This commit
changes the interface of plan diagrams a bit to permit decomposing the
two steps.

Now, plan diagrams are represented with an interface, FlowDiagram, that
exposes a method to generate JSON and a method to augment the diagram
with statistics from tracing spans.

This is necessary to permit more granular resource management of
FlowSpecs in the next commit - the plan diagram needs to be generated
before the flow is cleaned up, but the tracing spans are only available
after the flow is cleaned up.

n.b. this moves the stats lines of each processor in the diagram below
the post processor line.

Release note: None

Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
@solongordon solongordon removed their request for review May 15, 2019 15:19
@jordanlewis jordanlewis added the X-noremind Bots won't notify about PRs with X-noremind label May 22, 2019
@bobvawter
Copy link
Contributor

bobvawter commented Mar 10, 2020

CLA assistant check
All committers have signed the CLA.

@jordanlewis jordanlewis closed this Jun 2, 2021
@jordanlewis jordanlewis deleted the dpp-allocs branch June 2, 2021 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
X-noremind Bots won't notify about PRs with X-noremind
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants