Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: allow deleteRangeNode to autoCommit transactions #41320

Closed
nvanbenschoten opened this issue Oct 4, 2019 · 4 comments · Fixed by #41324
Closed

sql: allow deleteRangeNode to autoCommit transactions #41320

nvanbenschoten opened this issue Oct 4, 2019 · 4 comments · Fixed by #41324
Labels
A-sql-execution Relating to SQL execution. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@nvanbenschoten
Copy link
Member

deleteRangeNode currently has the following restriction:

// Note: deleteRangeNode can't autocommit, because it has to delete in batches,
// and it won't know whether or not there is more work to do until after a batch
// is returned. This property precludes using auto commit.

This has huge performance implications, as it means that no point DELETE statement can run as a 1PC transaction. This results in roughly a 50% performance hit for delete-heavy workloads on tables without secondary indexes.

The comment says that we delete in batches and we won't know whether or not there is more work to do until after a batch is returned. I don't fully understand this. If the comment is talking about b.Header.MaxSpanRequestKeys = TableTruncateChunkSize then I don't think this needs to preclude attempting a 1PC transaction. I think the kv implementation is smart enough to handle 1PC transactions with key limits. If it's not, we should make it work to support this case.

For context, we have heard in the past from people running Cockroach that deletes are slower than inserts, which has never made sense to me. This explains why this is the case. Sysbench adds more weight to this claim. oltp_delete without a secondary index is the only sysbench workload that we are slower on than similarly architected databases. If we could run point DELETE statements as 1PC transactions then we would immediately close this gap.

@jordanlewis I noticed that some of this changed in #36728. Do you have context on this? Did that cause a regression in this behavior?

@nvanbenschoten nvanbenschoten added the A-sql-execution Relating to SQL execution. label Oct 4, 2019
@nvanbenschoten
Copy link
Member Author

It does look like this is new as of v19.2. This is a pretty bad regression. I think it's probably a release blocker.

# v19.1

root@localhost:26257/defaultdb> create table a (k int primary key);
CREATE TABLE

root@localhost:26257/defaultdb> set tracing = on; delete from a where k = 1; set tracing = off;
SET TRACING

root@localhost:26257/defaultdb> show kv trace for session;
             timestamp             |       age       |                      message                       |                        tag                         | location |    operation     | span
+----------------------------------+-----------------+----------------------------------------------------+----------------------------------------------------+----------+------------------+------+
  2019-10-04 06:28:10.893029+00:00 | 00:00:00.000217 | querying next range at /Table/52/1/1               | [n1,client=127.0.0.1:65018,user=root,txn=6f8307c2] |          | dist sender send |    5
  2019-10-04 06:28:10.89305+00:00  | 00:00:00.000238 | r21: sending batch 1 DelRng, 1 EndTxn to (n1,s1):1 | [n1,client=127.0.0.1:65018,user=root,txn=6f8307c2] |          | dist sender send |    5
  2019-10-04 06:28:10.893198+00:00 | 00:00:00.000386 | fast path completed                                | [n1,client=127.0.0.1:65018,user=root]              |          | flow             |    3
  2019-10-04 06:28:10.893215+00:00 | 00:00:00.000403 | rows affected: 0                                   | [n1,client=127.0.0.1:65018,user=root]              |          | sql txn          |    1
(4 rows)


# v19.2

root@localhost:26257/defaultdb> set tracing = on; delete from a where k = 1; set tracing = off;
SET TRACING

root@localhost:26257/defaultdb> show kv trace for session;
             timestamp             |       age       |                 message                  |                        tag                         | location |    operation     | span
+----------------------------------+-----------------+------------------------------------------+----------------------------------------------------+----------+------------------+------+
  2019-10-04 06:29:58.780724+00:00 | 00:00:00.000212 | querying next range at /Table/52/1/1     | [n1,client=127.0.0.1:65040,user=root,txn=b7f909e0] |          | dist sender send |    8
  2019-10-04 06:29:58.780744+00:00 | 00:00:00.000232 | r21: sending batch 1 DelRng to (n1,s1):1 | [n1,client=127.0.0.1:65040,user=root,txn=b7f909e0] |          | dist sender send |    8
  2019-10-04 06:29:58.780882+00:00 | 00:00:00.00037  | fast path completed                      | [n1,client=127.0.0.1:65040,user=root]              |          | flow             |    5
  2019-10-04 06:29:58.780897+00:00 | 00:00:00.000385 | rows affected: 0                         | [n1,client=127.0.0.1:65040,user=root]              |          | exec stmt        |    3
  2019-10-04 06:29:58.780921+00:00 | 00:00:00.000409 | querying next range at /Table/52/1/1     | [n1,client=127.0.0.1:65040,user=root,txn=b7f909e0] |          | dist sender send |   11
  2019-10-04 06:29:58.780931+00:00 | 00:00:00.000419 | r21: sending batch 1 EndTxn to (n1,s1):1 | [n1,client=127.0.0.1:65040,user=root,txn=b7f909e0] |          | dist sender send |   11
(6 rows)

@jordanlewis
Copy link
Member

Wow, good catch, glad you noticed this. We kind of had tests for this, but I overwrote them in my patch because it seemed like they were specific to interleaved tables (the tests were in files named interleaved). I'll try to fix it.

@jordanlewis jordanlewis added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Oct 4, 2019
@jordanlewis
Copy link
Member

Actually, I don't see what you mean - how could you avoid doing a 2pc transaction in this case?

If the comment is talking about b.Header.MaxSpanRequestKeys = TableTruncateChunkSize then I don't think this needs to preclude attempting a 1PC transaction. I think the kv implementation is smart enough to handle 1PC transactions with key limits. If it's not, we should make it work to support this case.

What would happen if you ran CommitInBatch on a batch that didn't finish?

@jordanlewis
Copy link
Member

I put up a strawman PR that doesn't work for the reason that I mentioned. Maybe I'm just missing something about what you said.

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 5, 2019
First commit from cockroachdb#41324.

We expect a selection of simple single-statement mutations to hit the
"1-phase commit" transaction fast-path. cockroachdb#41320 demonstrated how critical
this is, as regressions here can cause major (> 50%) performance hits to
benchmarks and user workloads. This commit adds a logic test to validate
that these statements continue to hit this fast-path.

Release justification: Testing only.

Release note: None
@craig craig bot closed this as completed in f865fe5 Oct 7, 2019
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this issue Oct 8, 2019
First commit from cockroachdb#41324.

We expect a selection of simple single-statement mutations to hit the
"1-phase commit" transaction fast-path. cockroachdb#41320 demonstrated how critical
this is, as regressions here can cause major (> 50%) performance hits to
benchmarks and user workloads. This commit adds a logic test to validate
that these statements continue to hit this fast-path.

Release justification: Testing only.

Release note: None
craig bot pushed a commit that referenced this issue Oct 10, 2019
41356: Revert "opt: disallow mutations under union" r=RaduBerinde a=RaduBerinde

This reverts commit a34d705.

Release justification: recently merge fix no longer needed (thanks
to #41307).

Release note (sql change): Mutations under UNION or UNION ALL are
allowed again.

41358: sql/logictest: add logictest for all expected 1PC txn statements r=nvanbenschoten a=nvanbenschoten

First commit from #41324.

We expect a selection of simple single-statement mutations to hit the
"1-phase commit" transaction fast-path. #41320 demonstrated how critical
this is, as regressions here can cause major (> 50%) performance hits to
benchmarks and user workloads. This commit adds a logic test to validate
that these statements continue to hit this fast-path.

Release justification: Testing only.

41371: kv: avoid allocations in client.Txn constructor r=nvanbenschoten a=nvanbenschoten

This PR contains two small wins that help speed up the client.Txn constructor, which is responsible for **2.90%** of a CPU profile when running Sysbench's `oltp_point_select` workload. The biggest win here will come from addressing #32508.

#### kv: lazily create *RangeIterator in txnPipeliner

This allocation was responsible for **0.34%** of a CPU profile when running Sysbench's `oltp_point_select` workload.

#### kv: only re-alloc refresh spans in augmentMetaLocked if necessary

This was responsible for **0.057%** of a CPU profile when running Sysbench's `oltp_point_select` workload.

Release justification: None. These can wait for the branch to split.

41372: sql/pgwire: statically allocate format code slice for all text args/cols r=nvanbenschoten a=nvanbenschoten

This allocation was responsible for **0.8%** of a CPU profile when running
Sysbench's oltp_point_select workload.

Release justification: Probably none, although this does look very safe.

Release note: None

41379: pkg/sql: use ring.Buffer in StmtBuf r=nvanbenschoten a=nvanbenschoten

The StmtBuf has the exact access patterns typically associated with a ring buffer. Command instances are added to the back of the buffer and trimmed from the front of the buffer. These operations are often performed in lockstep, but that is not always the case, so we need the buffer to be able to grow.

`ring.Buffer` provides exactly these semantics and it allows us to avoid memory allocations on each Command in `StmtBuf.Push`.

Release justification: None. Don't merge until branch is split.

Co-authored-by: Radu Berinde <radu@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-sql-execution Relating to SQL execution. C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants