Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: improve inverted index validation performance #35742

Merged
merged 1 commit into from
May 13, 2019

Conversation

thoszhang
Copy link
Contributor

To compute the expected number of keys in an inverted index, we currently
generate every key and count the number of unique keys, since array elements
can produce duplicate keys. This PR avoids some of these allocations by doing a
simple count for every non-array JSON type. For a 25GB table of random JSON
values generated by the workload, I got a ~15% speedup on the select sum(crdb_internal.json_num_index_entries(v)) from json.j query.

Release note: None

@thoszhang thoszhang requested a review from dt March 14, 2019 17:17
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@vivekmenezes
Copy link
Contributor

@lucy-zhang it's probably worth adding a benchmark so that future changes don't cause a regression in performance. Thanks!

To compute the expected number of keys in an inverted index, we currently
generate every key and count the number of unique keys, since array elements
can produce duplicate keys. This PR avoids some of these allocations by doing a
simple count for every non-array JSON type. For a 25GB table of random JSON
values generated by the workload, I got a ~15% speedup on the `select
sum(crdb_internal.json_num_index_entries(v)) from json.j` query.

Release note: None
@thoszhang
Copy link
Contributor Author

I added a benchmark for a JSON object containing mostly scalars and few arrays:

name                           old time/op  new time/op  delta
JSONNumInvertedIndexEntries-8  6.93µs ± 0%  2.92µs ± 1%  -57.91%  (p=0.008 n=5+5)

@thoszhang
Copy link
Contributor Author

bors r+

craig bot pushed a commit that referenced this pull request May 13, 2019
35742: sql: improve inverted index validation performance r=lucy-zhang a=lucy-zhang

To compute the expected number of keys in an inverted index, we currently
generate every key and count the number of unique keys, since array elements
can produce duplicate keys. This PR avoids some of these allocations by doing a
simple count for every non-array JSON type. For a 25GB table of random JSON
values generated by the workload, I got a ~15% speedup on the `select
sum(crdb_internal.json_num_index_entries(v)) from json.j` query.

Release note: None

Co-authored-by: Lucy Zhang <lucy-zhang@users.noreply.github.com>
@craig
Copy link
Contributor

craig bot commented May 13, 2019

Build succeeded

@craig craig bot merged commit a1fc446 into cockroachdb:master May 13, 2019
@thoszhang thoszhang deleted the faster-num-entries branch May 14, 2019 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants