sql: improve precision of index spans to scan for multi-column-family single-row fetches #18168

jordanlewis · 2017-09-01T20:36:14Z

Suppose we have a table and a query:

CREATE TABLE a (a INT PRIMARY KEY, b INT, c INT, FAMILY(a), FAMILY(b), FAMILY(c);
INSERT INTO a VALUES(1,2,3);

SELECT b FROM a WHERE a=1;

Currently, this query will cause a scan from /a/primary/1 -> /a/primary/2, even though ideally we'd only need to scan the key at /a/primary/1/b.

We have enough information to improve this scan, since we know that c is not a needed column, a is the primary index and therefore unique, and we have an equality constraint on a.

We should consider improving the algorithm in spanFromLogicalSpan to incorporate all of this information into its final output span.

cc @danhhz @andreimatei @petermattis based on our earlier conversation.

The text was updated successfully, but these errors were encountered:

nvanbenschoten · 2018-09-25T14:04:58Z

I now suspect that this is the reason why multiple column families didn't provide a noticeable improvement in YCSB workload A. If an UPDATE statement only needed to scan the primary key and the columns that it was updating, then splitting every column in YCSB into a separate column family could reduce contention by 10x. I think this means we could scale to 10 times the throughput in the zipfian distribution.

jordanlewis · 2018-09-25T17:36:13Z

The code that needs to be updated is spansFromConstraint in opt_index_selection.go. That code path needs to get passed the set of needed columns from ConstructScan and determine just the precise set of column families that are required to pick up all the needed columns.

One complication is that spanFromConstraintSpan currently always returns exactly 1 span. If the required set of column families isn't adjacent within a row, it will need to return more than 1 span. I think the RowFetcher will be okay with seeing more than 1 span per row, based on the results of the investigation from #29374, but that will have to be verified.

andreimatei · 2018-09-26T02:58:10Z

I've filed #30656 to track similar improvements to be done even when col fams are not used (and all columns are part of a single col fam).

solongordon · 2018-09-26T20:33:58Z

Regarding the complication @jordanlewis mentioned, I'm thinking for a first pass we can just handle the case where only a single column family is needed. Does that seem reasonable, and is it sufficient for TPCC?

jordanlewis · 2018-09-26T20:50:03Z

Unfortunately I don't think it's sufficient for TPCC, which would have tables that need 3 columns families total - one with the mutable columns from newOrder, one with the immutable columns, and one with the mutable columns from payment.

jordanlewis · 2018-09-26T20:55:38Z

Actually, I'm not entirely sure if the row fetcher would need to itself emit spans over more than 1 column family, though. @nvanbenschoten I think you thought this through in more detail, didn't you?

nvanbenschoten · 2018-09-26T21:18:40Z

Yes, TPC-C and YCSB will both require us to emit spans over more than 1 column family for the proposed optimization to work correctly. In both cases, there will be a static column family that all requests will read and a set of dynamic column families, out of which any given request will only read and update one.

jordanlewis · 2018-09-26T21:39:17Z

But we're talking about an entire transaction, right? The row fetcher only has to do tight spans for each query in isolation, and it looks like each individual read-only query in payment.go and new_order.go on the shared tables (customer, district, warehouse) only touches static columns.

In new order,

SELECT c_discount, c_last, c_credit FROM customer WHERE c_w_id=$1 and c_d_id=$2 and c_id=$3 is all static
SELECT w_tax FROM warehouse WHERE w_id... is static

In payment,

SELECT c_id FROM customer WHERE c_w_id, c_d_id, c_last = ... order by c_first is static

And the rest of the queries are either mutations or queries on tables that aren't shared across those 2 txns.

So actually I think it's fine to only do this special case for 1 CF for now, and we'll still get a win for TPCC.

nvanbenschoten · 2018-09-26T21:47:48Z

The problem isn't just the read-only queries, it's also the read-write queries. If one of the UPDATE statements touches a column family that it doesn't need to read or update, then we've already lost. For instance, newOrder's first statement UPDATE district SET d_next_o_id = d_next_o_id + 1 WHERE d_w_id = ... needs to read the static column family and one of the dynamic column families. If the limitation means that it wont be able to avoid touching the other dynamic column family (which contains d_ytd) then we're not going to avoid any contention.

nvanbenschoten · 2018-09-26T22:34:22Z

Oh and FWIW, this issue touches the following TODO:

cockroach/pkg/sql/scan.go

Lines 60 to 66 in 6c5d9a0

    
           // TODO(radu/knz): currently the optimization always loads the 
        
           // entire row from KV and only skips unnecessary decodes to 
        
           // Datum. Investigate whether performance is to be gained (e.g. for 
        
           // tables with wide rows) by reading only certain columns from KV 
        
           // using point lookups instead of a single range lookup for the 
        
           // entire row. 
        
           valNeededForCol util.FastIntSet

For tables with multiple column families, point lookups will now only scan column families which contain the needed columns. Previously we would scan the entire row. This optimization allows for faster lookups and, perhaps more importantly, reduces contention between operations on the same row but disjoint column families. Fixes cockroachdb#18168 Release note: None

nvanbenschoten · 2018-09-27T21:44:21Z

Unfortunately, my last post which I deleted was premature. I realized shortly after that we weren't correctly scanning at least one non-nullable column family when fetching the rows to update, so we weren't actually performing any updates at all because all fields in the dataset start as NULL. In other words, this is what the plan looked like.

root@:26257/defaultdb> EXPLAIN UPDATE ycsb SET field4 = 's' WHERE ycsb_key = 's';                                                                                                             tree         | field |    description
+---------------------+-------+-------------------+
  count               |       |
   └── update         |       |
        │             | table | ycsb
        │             | set   | field4
        └── render    |       |
             └── scan |       |
                      | table | ycsb@primary
                      | spans | /"s"/5/1-/"s"/5/2
(8 rows)

With that issue fixed, the plan correctly looks like:

root@:26257/ycsb> EXPLAIN UPDATE usertable SET field4 = 's' WHERE ycsb_key = 's';
         tree         | field |           description
+---------------------+-------+---------------------------------+
  count               |       |
   └── update         |       |
        │             | table | usertable
        │             | set   | field4
        └── render    |       |
             └── scan |       |
                      | table | usertable@primary
                      | spans | /"s"/0-/"s"/1 /"s"/5/1-/"s"/5/2
(8 rows)

I ran a modified YCSB workload again and the theory still panned out, but it certainly wasn't the 5x throughput improvement we saw in the other test.

This is YCSB's workload A with a zipfian load distribution. I ran it on a three-node GCE cluster with n1-highcpu-16 machines and the nobarrier SSD mount option. The concurrency started at 1 thread and doubled every minute up to 1024 threads. We can see that the peak throughput is ~50% higher with column families than without. In one sence, this is disapointing compared to the crazy speedup we saw before. In another, 50% is still an awesome win and clearly justifies the urgency of this change! It also makes me wonder whether contention is still the primary bottleneck in YCSB.

I also ran a second test where I used the old broken fix for this issue but adjusted the YCSB schema to use non-nullable columns so that we wouldn't miss rows. That resulted in very similar performance to what we see here. That indicates that the extra span in the point lookup has a negligible cost, or at least a cost that is offset by the extra data size required for non-nullable columns in this schema.

nvanbenschoten · 2018-09-27T21:49:32Z

It is also interesting that these new measurements hit a scalability cliff at the same place as the test that wasn't performing any updates at all. That lends further credence to the idea that there's a new dominant bottleneck that's replaced contention once we reach a concurrency of around 64 threads.

nvanbenschoten · 2018-10-01T03:45:49Z

I wonder if we're hitting the same concurrency cliff that we saw in #26178. workload/ycsb uses prepared statements that are scoped to the entire connection pool, so it's possible.

For tables with multiple column families, point lookups will now only scan column families which contain the needed columns. Previously we would scan the entire row. This optimization allows for faster lookups and, perhaps more importantly, reduces contention between operations on the same row but disjoint column families. Fixes cockroachdb#18168 Release note: None

30744: sql: optimize point lookups on column families r=solongordon a=solongordon For tables with multiple column families, point lookups will now only scan column families which contain the needed columns. Previously we would scan the entire row. This optimization allows for faster lookups and, perhaps more importantly, reduces contention between operations on the same row but disjoint column families. Fixes #18168 Release note: None Co-authored-by: Solon Gordon <solon@cockroachlabs.com>

This change adds a new `--families` flag to the ycsb workload. Now that cockroachdb#18168 is addressed, this significantly reduces the contention present in the workload by avoiding conflicts on updates to different columns in the same table. Release note: None

@robert-s-lee

32704: workload/ycsb: add flag to use column families r=nvanbenschoten a=nvanbenschoten This change adds a new `--families` flag to the ycsb workload. Now that #18168 is addressed, this significantly reduces the contention present in the workload by avoiding conflicts on updates to different columns in the same table. I just confirmed that this still provides a huge speedup. On a 24 cpu machine: ``` workload run ycsb --init --workload='A' --concurrency=128 --duration=1m --families=false gives: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 60.0s 0 103089 1718.0 13.9 0.6 1.8 604.0 1476.4 read 60.0s 0 102947 1715.6 59.5 5.2 11.5 2281.7 8321.5 update _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 60.0s 0 206036 3433.6 36.7 3.3 8.9 1342.2 8321.5 --families=true gives: _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__total 60.0s 0 333477 5557.8 9.2 0.6 6.0 302.0 1275.1 read 60.0s 0 332366 5539.3 13.7 6.8 17.8 54.5 4831.8 update _elapsed___errors_____ops(total)___ops/sec(cum)__avg(ms)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)__result 60.0s 0 665843 11097.1 11.5 3.9 16.3 268.4 4831.8 ``` cc. @robert-s-lee @drewdeally Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

jordanlewis self-assigned this Sep 1, 2017

jordanlewis added this to the 1.2 milestone Sep 1, 2017

jordanlewis modified the milestones: 2.0, 2.1 Feb 22, 2018

jordanlewis added the C-performance Perf of queries or internals. Solution not expected to change functional behavior. label Feb 22, 2018

jordanlewis added the A-sql-execution Relating to SQL execution. label Apr 25, 2018

jordanlewis added A-sql-optimizer SQL logical planning and optimizations. and removed A-sql-execution Relating to SQL execution. labels Sep 5, 2018

jordanlewis assigned andy-kimball and unassigned jordanlewis Sep 5, 2018

jordanlewis modified the milestones: 2.1, 2.2 Sep 5, 2018

andy-kimball assigned RaduBerinde and andy-kimball and unassigned andy-kimball Sep 5, 2018

This was referenced Sep 25, 2018

sql: when RETURNING expression is present, only fetch the specific columns used #30618

Closed

workload/tpcc: use multiple column families to avoid contention #30624

Merged

jordanlewis assigned jordanlewis and solongordon and unassigned RaduBerinde, andy-kimball and jordanlewis Sep 25, 2018

andreimatei mentioned this issue Sep 26, 2018

storage: push down columns to update within a column family. #30656

Closed

solongordon mentioned this issue Sep 27, 2018

sql: optimize point lookups on column families #30744

Merged

petermattis removed this from the 2.2 milestone Oct 5, 2018

craig bot closed this as completed in #30744 Oct 9, 2018

nvanbenschoten mentioned this issue Nov 29, 2018

workload/ycsb: add flag to use column families #32704

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: improve precision of index spans to scan for multi-column-family single-row fetches #18168

sql: improve precision of index spans to scan for multi-column-family single-row fetches #18168

jordanlewis commented Sep 1, 2017 •

edited

Loading

nvanbenschoten commented Sep 25, 2018

jordanlewis commented Sep 25, 2018

andreimatei commented Sep 26, 2018

solongordon commented Sep 26, 2018

jordanlewis commented Sep 26, 2018

jordanlewis commented Sep 26, 2018

nvanbenschoten commented Sep 26, 2018

jordanlewis commented Sep 26, 2018

nvanbenschoten commented Sep 26, 2018

nvanbenschoten commented Sep 26, 2018

nvanbenschoten commented Sep 27, 2018

nvanbenschoten commented Sep 27, 2018

nvanbenschoten commented Oct 1, 2018 •

edited

Loading

sql: improve precision of index spans to scan for multi-column-family single-row fetches #18168

sql: improve precision of index spans to scan for multi-column-family single-row fetches #18168

Comments

jordanlewis commented Sep 1, 2017 • edited Loading

nvanbenschoten commented Sep 25, 2018

jordanlewis commented Sep 25, 2018

andreimatei commented Sep 26, 2018

solongordon commented Sep 26, 2018

jordanlewis commented Sep 26, 2018

jordanlewis commented Sep 26, 2018

nvanbenschoten commented Sep 26, 2018

jordanlewis commented Sep 26, 2018

nvanbenschoten commented Sep 26, 2018

nvanbenschoten commented Sep 26, 2018

nvanbenschoten commented Sep 27, 2018

nvanbenschoten commented Sep 27, 2018

nvanbenschoten commented Oct 1, 2018 • edited Loading

jordanlewis commented Sep 1, 2017 •

edited

Loading

nvanbenschoten commented Oct 1, 2018 •

edited

Loading