Check hashes first during probing the aggr hash table #11718

Rachelint · 2024-07-30T08:01:15Z

Which issue does this PR close?

Closes #11717

Rationale for this change

See #11717

What changes are included in this PR?

See title.

Are these changes tested?

By exist tests.

Are there any user-facing changes?

No.

Rachelint · 2024-07-30T08:03:41Z

The clickbench result

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ check-hash-first ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.81ms │           0.77ms │ +1.06x faster │
│ QQuery 1     │    69.11ms │          69.79ms │     no change │
│ QQuery 2     │   169.47ms │         160.89ms │ +1.05x faster │
│ QQuery 3     │   182.60ms │         180.05ms │     no change │
│ QQuery 4     │  1589.41ms │        1595.87ms │     no change │
│ QQuery 5     │  1597.02ms │        1563.24ms │     no change │
│ QQuery 6     │    57.92ms │          59.34ms │     no change │
│ QQuery 7     │    72.33ms │          71.02ms │     no change │
│ QQuery 8     │  2415.02ms │        2293.14ms │ +1.05x faster │
│ QQuery 9     │  1928.42ms │        1912.94ms │     no change │
│ QQuery 10    │   544.39ms │         539.45ms │     no change │
│ QQuery 11    │   605.79ms │         606.26ms │     no change │
│ QQuery 12    │  1767.57ms │        1748.26ms │     no change │
│ QQuery 13    │  4073.33ms │        3979.57ms │     no change │
│ QQuery 14    │  2583.14ms │        2518.41ms │     no change │
│ QQuery 15    │  1784.13ms │        1777.43ms │     no change │
│ QQuery 16    │  5028.55ms │        4898.04ms │     no change │
│ QQuery 17    │  4956.14ms │        4796.22ms │     no change │
│ QQuery 18    │ 10436.51ms │       10168.34ms │     no change │
│ QQuery 19    │   144.11ms │         147.18ms │     no change │
│ QQuery 20    │  3310.77ms │        3286.34ms │     no change │
│ QQuery 21    │  3887.09ms │        3867.43ms │     no change │
│ QQuery 22    │  9398.96ms │        9008.04ms │     no change │
│ QQuery 23    │ 23087.26ms │       22804.51ms │     no change │
│ QQuery 24    │  1168.15ms │        1139.59ms │     no change │
│ QQuery 25    │  1046.92ms │        1010.22ms │     no change │
│ QQuery 26    │  1352.80ms │        1317.86ms │     no change │
│ QQuery 27    │  4711.92ms │        4698.67ms │     no change │
│ QQuery 28    │ 21891.92ms │       22870.99ms │     no change │
│ QQuery 29    │   920.19ms │         901.89ms │     no change │
│ QQuery 30    │  2075.81ms │        2036.71ms │     no change │
│ QQuery 31    │  2961.03ms │        2844.67ms │     no change │
│ QQuery 32    │ 16167.05ms │       15106.28ms │ +1.07x faster │
│ QQuery 33    │  9418.20ms │        9429.24ms │     no change │
│ QQuery 34    │  9388.74ms │        9431.36ms │     no change │
│ QQuery 35    │  3108.34ms │        3021.89ms │     no change │
│ QQuery 36    │   270.02ms │         269.25ms │     no change │
│ QQuery 37    │   166.63ms │         156.78ms │ +1.06x faster │
│ QQuery 38    │   158.33ms │         157.94ms │     no change │
│ QQuery 39    │   834.47ms │         844.51ms │     no change │
│ QQuery 40    │    63.22ms │          62.05ms │     no change │
│ QQuery 41    │    59.97ms │          58.34ms │     no change │
│ QQuery 42    │    70.34ms │          72.41ms │     no change │
└──────────────┴────────────┴──────────────────┴───────────────┘

Dandandan

Nice find!

alamb

Thank you @Rachelint

alamb · 2024-07-30T10:12:30Z

datafusion/physical-plan/src/aggregates/group_values/row.rs

                // verify that a group that we are inserting with hash is
                // actually the same key value as the group in
                // existing_idx  (aka group_values @ row)
-                group_rows.row(row) == group_values.row(*group_idx)
+                target_hash == *exist_hash


I am somewhat confused about how this makes things faster -- I thought that the check for equal hash was done as part of self.map.get_mut (aka the closure is only called when the hashes are equal, so I would expct this comparison to always be true)

However, if the benchmark results show it is an improvement wonderful.

I'll run the numbers as well to confirm.

I don't get it either.
I thought the closure (row values equal check) is triggered only if the hash is matched 😕

I am somewhat confused about how this makes things faster -- I thought that the check for equal hash was done as part of self.map.get_mut (aka the closure is only called when the hashes are equal, so I would expct this comparison to always be true)

However, if the benchmark results show it is an improvement wonderful.

I'll run the numbers as well to confirm.

This is my thought about why faster.

I read the source code in hashbrown, and the get_mut procedure is like this:

Use the hash value to find the first bucket.

If the bucket is filled, it will use the eq function passed by us to check if it is the target, for example, the prev eq function.

|(exist_hash, group_idx)| { // verify that a group that we are inserting with hash is // actually the same key value as the group in // existing_idx (aka group_values @ row) group_rows.row(row) == group_values.row(*group_idx) })

If eq return true, it is the target, otherwise we need to prob next and check.

In the high cardinality aggr scenario, the entry often actually not exist in the hash table.

And after the hash table grow too large(many buckets are filled), the prob will perform many times and finally find nothing...

In this sitution, check the hash first can reduce the random memory accesses compared to directy check the group value through group index.

cc @jayzhan211 , can see the guess above.

@Rachelint
My understanding is the similar, collision will happen quite often in get_mut, equality of u64 hashes will be faster than retrieving / comparing rows.
Hash collisions are usually very low, even for high cardinality, but RawTable::get_mut doesn't check for equality itself, just finds a first match without guaranteeing hash values are the same (equality check should be in the provided equality function). In other implementations we also check for hash values to be equal first.

I was curious -- I think the code @Rachelint is referring to in hashbrown is here

https://github.com/rust-lang/hashbrown/blob/ac00a0bbef46f02f555e235f57ce263aefa361e0/src/raw/mod.rs#L2183-L2199

It does appear that collisions could happen (it is doing some sort of abbreviated check by condensing the actual hash value to a byte or something), though I don't fully understand how it works.

I wonder if there are other places we could use this observation 🤔

Good idea! I am trying to find it.

The idea to check hash before expensive arrow row comparison makes sense to me.

I was curious -- I think the code @Rachelint is referring to in hashbrown is here

https://github.com/rust-lang/hashbrown/blob/ac00a0bbef46f02f555e235f57ce263aefa361e0/src/raw/mod.rs#L2183-L2199

It does appear that collisions could happen (it is doing some sort of abbreviated check by condensing the actual hash value to a byte or something), though I don't fully understand how it works.

I wonder if there are other places we could use this observation 🤔

This talk explains the details of this one-byte abbreviation trick https://www.youtube.com/watch?v=ncHmEUmJZf4 , I vaguely remember they said when this 1 byte check is done, it's very likely to find the correct slot.
Looks like it's not working well when the hash table size grows over some threshold?

I was curious -- I think the code @Rachelint is referring to in hashbrown is here
https://github.com/rust-lang/hashbrown/blob/ac00a0bbef46f02f555e235f57ce263aefa361e0/src/raw/mod.rs#L2183-L2199
It does appear that collisions could happen (it is doing some sort of abbreviated check by condensing the actual hash value to a byte or something), though I don't fully understand how it works.
I wonder if there are other places we could use this observation 🤔

This talk explains the details of this one-byte abbreviation trick https://www.youtube.com/watch?v=ncHmEUmJZf4 , I vaguely remember they said when this 1 byte check is done, it's very likely to find the correct slot. Looks like it's not working well when the hash table size grows over some threshold?

Seems make sense, maybe we should only add this check when found the hash table size is larger than a threshold 🤔 .

I was thinking this rationale is very non obvious, so I proposed adding some comments on th rationale here #11750

alamb · 2024-07-30T11:09:43Z

I am running benchmarks against this PR now

alamb · 2024-07-30T12:13:29Z

Interestingly my benchmarks against this PR showed more mixed results:

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ check-hash-first ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.80ms │           0.81ms │     no change │
│ QQuery 1     │   102.55ms │         100.88ms │     no change │
│ QQuery 2     │   207.48ms │         213.19ms │     no change │
│ QQuery 3     │   207.76ms │         209.69ms │     no change │
│ QQuery 4     │  2222.82ms │        2289.69ms │     no change │
│ QQuery 5     │  2009.01ms │        2087.26ms │     no change │
│ QQuery 6     │    89.41ms │          87.13ms │     no change │
│ QQuery 7     │   103.94ms │         105.40ms │     no change │
│ QQuery 8     │  3246.71ms │        3017.22ms │ +1.08x faster │
│ QQuery 9     │  2435.85ms │        2449.21ms │     no change │
│ QQuery 10    │   854.22ms │         856.77ms │     no change │
│ QQuery 11    │   924.61ms │         936.51ms │     no change │
│ QQuery 12    │  2130.15ms │        2198.85ms │     no change │
│ QQuery 13    │  4641.38ms │        4552.21ms │     no change │
│ QQuery 14    │  2883.17ms │        2883.05ms │     no change │
│ QQuery 15    │  2488.04ms │        2461.62ms │     no change │
│ QQuery 16    │  5893.04ms │        5836.59ms │     no change │
│ QQuery 17    │  5863.02ms │        5676.28ms │     no change │
│ QQuery 18    │ 11877.60ms │       11578.52ms │     no change │
│ QQuery 19    │   168.32ms │         181.08ms │  1.08x slower │
│ QQuery 20    │  2734.34ms │        2747.95ms │     no change │
│ QQuery 21    │  3457.31ms │        3545.68ms │     no change │
│ QQuery 22    │  8679.75ms │        9621.59ms │  1.11x slower │
│ QQuery 23    │ 21872.12ms │       22592.60ms │     no change │
│ QQuery 24    │  1351.77ms │        1418.37ms │     no change │
│ QQuery 25    │  1119.32ms │        1201.18ms │  1.07x slower │
│ QQuery 26    │  1483.02ms │        1541.26ms │     no change │
│ QQuery 27    │  4027.53ms │        4104.40ms │     no change │
│ QQuery 28    │ 28326.59ms │       28429.20ms │     no change │
│ QQuery 29    │  1051.56ms │        1078.07ms │     no change │
│ QQuery 30    │  2557.22ms │        2551.44ms │     no change │
│ QQuery 31    │  3240.11ms │        3164.86ms │     no change │
│ QQuery 32    │ 17175.99ms │       16627.27ms │     no change │
│ QQuery 33    │  9793.02ms │        9651.29ms │     no change │
│ QQuery 34    │  9684.26ms │        9709.83ms │     no change │
│ QQuery 35    │  3850.71ms │        3729.01ms │     no change │
│ QQuery 36    │   353.62ms │         352.64ms │     no change │
│ QQuery 37    │   223.88ms │         235.19ms │  1.05x slower │
│ QQuery 38    │   202.00ms │         208.92ms │     no change │
│ QQuery 39    │  1167.41ms │        1173.25ms │     no change │
│ QQuery 40    │    94.37ms │          97.81ms │     no change │
│ QQuery 41    │    85.27ms │          86.30ms │     no change │
│ QQuery 42    │   101.90ms │         102.17ms │     no change │
└──────────────┴────────────┴──────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary               ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (main_base)          │ 170982.96ms │
│ Total Time (check-hash-first)   │ 171692.24ms │
│ Average Time (main_base)        │   3976.35ms │
│ Average Time (check-hash-first) │   3992.84ms │
│ Queries Faster                  │           1 │
│ Queries Slower                  │           4 │
│ Queries with No Change          │          38 │
└─────────────────────────────────┴─────────────┘

Rachelint · 2024-07-30T12:19:00Z

@alamb Possible due to my main branch is not the latest?
I am running it again in local after pulling.

alamb · 2024-07-30T12:25:59Z

@alamb Possible due to my main branch is not the latest? I am running it again in local after pulling.

Thanks @Rachelint -- I am also trying again to see if I did something wrong -- I also filed #11722 to see if that could be related 🤔

Rachelint · 2024-07-30T12:28:09Z

@alamb Possible due to my main branch is not the latest? I am running it again in local after pulling.

Thanks @Rachelint -- I am also trying again to see if I did something wrong -- I also filed #11722 to see if that could be related 🤔

Seems so strange, I run it after pulling and rebasing, it still seems faster in the expected case q32

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ check-hash-first ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │ 16194.66ms │       15067.19ms │ +1.07x faster │
└──────────────┴────────────┴──────────────────┴───────────────┘

Rachelint · 2024-07-30T12:32:11Z

@alamb Possible due to my main branch is not the latest? I am running it again in local after pulling.

Thanks @Rachelint -- I am also trying again to see if I did something wrong -- I also filed #11722 to see if that could be related 🤔

Another possibility I could think is that maybe it is due to cache size? This the cpu info about my benchmark machine.

CPU MHz：             993.713
CPU max MHz:           4400.0000
CPU min MHz:           800.0000

L1d cache：          32K
L1i cache：          32K
L2 cache：           256K
L3 cache：           9216K

jayzhan211 · 2024-07-30T12:35:01Z

I can run the benchmark too

jayzhan211 · 2024-07-30T12:49:51Z

release-nonlto mode

--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ check-hash-first ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.48ms │           0.44ms │ +1.09x faster │
│ QQuery 1     │    42.21ms │          41.12ms │     no change │
│ QQuery 2     │    74.19ms │          81.59ms │  1.10x slower │
│ QQuery 3     │    62.76ms │          72.01ms │  1.15x slower │
│ QQuery 4     │   421.24ms │         475.74ms │  1.13x slower │
│ QQuery 5     │   702.16ms │         683.13ms │     no change │
│ QQuery 6     │    37.37ms │          40.15ms │  1.07x slower │
│ QQuery 7     │    40.82ms │          40.23ms │     no change │
│ QQuery 8     │   765.87ms │         747.78ms │     no change │
│ QQuery 9     │   663.02ms │         643.98ms │     no change │
│ QQuery 10    │   197.35ms │         196.56ms │     no change │
│ QQuery 11    │   220.60ms │         213.23ms │     no change │
│ QQuery 12    │   726.93ms │         703.93ms │     no change │
│ QQuery 13    │  1419.25ms │        1464.85ms │     no change │
│ QQuery 14    │   993.33ms │         993.17ms │     no change │
│ QQuery 15    │   496.78ms │         483.43ms │     no change │
│ QQuery 16    │  2026.78ms │        1970.75ms │     no change │
│ QQuery 17    │  1843.15ms │        1886.30ms │     no change │
│ QQuery 18    │  4766.27ms │        4960.04ms │     no change │
│ QQuery 19    │    58.23ms │          56.11ms │     no change │
│ QQuery 20    │  1518.53ms │        1480.85ms │     no change │
│ QQuery 21    │  1736.85ms │        1750.68ms │     no change │
│ QQuery 22    │  4067.41ms │        4129.24ms │     no change │
│ QQuery 23    │  8258.31ms │        8458.29ms │     no change │
│ QQuery 24    │   483.89ms │         502.11ms │     no change │
│ QQuery 25    │   492.77ms │         489.45ms │     no change │
│ QQuery 26    │   549.74ms │         560.24ms │     no change │
│ QQuery 27    │  1317.31ms │        1382.42ms │     no change │
│ QQuery 28    │ 10165.79ms │       10165.27ms │     no change │
│ QQuery 29    │   409.71ms │         399.79ms │     no change │
│ QQuery 30    │   857.91ms │         847.23ms │     no change │
│ QQuery 31    │   983.91ms │         947.09ms │     no change │
│ QQuery 32    │  9653.47ms │        9369.41ms │     no change │
│ QQuery 33    │  4134.36ms │        3266.48ms │ +1.27x faster │
│ QQuery 34    │  3872.37ms │        3991.76ms │     no change │
│ QQuery 35    │  1055.52ms │        1039.10ms │     no change │
│ QQuery 36    │   144.10ms │         148.25ms │     no change │
│ QQuery 37    │   100.81ms │         101.38ms │     no change │
│ QQuery 38    │   105.72ms │         107.32ms │     no change │
│ QQuery 39    │   387.52ms │         389.59ms │     no change │
│ QQuery 40    │    34.77ms │          34.62ms │     no change │
│ QQuery 41    │    32.97ms │          32.96ms │     no change │
│ QQuery 42    │    43.08ms │          42.63ms │     no change │
└──────────────┴────────────┴──────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)               │ 65965.62ms │
│ Total Time (check-hash-first)   │ 65390.71ms │
│ Average Time (main)             │  1534.08ms │
│ Average Time (check-hash-first) │  1520.71ms │
│ Queries Faster                  │          2 │
│ Queries Slower                  │          4 │
│ Queries with No Change          │         37 │
└─────────────────────────────────┴────────────┘

hmm...

Rachelint · 2024-07-30T12:56:38Z

@jayzhan211 interesting... q33 1.27x faster...
Maybe it is really related to hardware...

I run q32 in release mode and the result still same as release-nonlto mode...

I am running the whole queries now.

jayzhan211 · 2024-07-30T13:10:10Z

release mode

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃       main ┃ check-hash-first ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.40ms │           0.41ms │     no change │
│ QQuery 1     │    38.61ms │          37.93ms │     no change │
│ QQuery 2     │    73.86ms │          73.49ms │     no change │
│ QQuery 3     │    63.24ms │          64.05ms │     no change │
│ QQuery 4     │   425.21ms │         390.92ms │ +1.09x faster │
│ QQuery 5     │   694.98ms │         650.10ms │ +1.07x faster │
│ QQuery 6     │    37.20ms │          38.77ms │     no change │
│ QQuery 7     │    38.30ms │          37.54ms │     no change │
│ QQuery 8     │   757.20ms │         663.71ms │ +1.14x faster │
│ QQuery 9     │   654.19ms │         628.15ms │     no change │
│ QQuery 10    │   193.07ms │         184.71ms │     no change │
│ QQuery 11    │   215.94ms │         209.10ms │     no change │
│ QQuery 12    │   728.18ms │         689.16ms │ +1.06x faster │
│ QQuery 13    │  1386.26ms │        1318.11ms │     no change │
│ QQuery 14    │   975.30ms │         955.42ms │     no change │
│ QQuery 15    │   487.11ms │         472.50ms │     no change │
│ QQuery 16    │  1799.89ms │        1582.73ms │ +1.14x faster │
│ QQuery 17    │  1673.83ms │        1550.87ms │ +1.08x faster │
│ QQuery 18    │  4557.05ms │        3911.50ms │ +1.17x faster │
│ QQuery 19    │    57.09ms │          57.37ms │     no change │
│ QQuery 20    │  1540.04ms │        1538.60ms │     no change │
│ QQuery 21    │  1764.33ms │        1767.64ms │     no change │
│ QQuery 22    │  4006.80ms │        4007.95ms │     no change │
│ QQuery 23    │  8261.65ms │        7982.00ms │     no change │
│ QQuery 24    │   492.90ms │         474.27ms │     no change │
│ QQuery 25    │   472.54ms │         480.76ms │     no change │
│ QQuery 26    │   539.63ms │         537.50ms │     no change │
│ QQuery 27    │  1291.92ms │        1291.64ms │     no change │
│ QQuery 28    │ 10082.99ms │        9922.57ms │     no change │
│ QQuery 29    │   403.20ms │         403.61ms │     no change │
│ QQuery 30    │   833.31ms │         816.01ms │     no change │
│ QQuery 31    │   966.80ms │         905.63ms │ +1.07x faster │
│ QQuery 32    │  9874.88ms │        9189.33ms │ +1.07x faster │
│ QQuery 33    │  3504.33ms │        3956.20ms │  1.13x slower │
│ QQuery 34    │  3501.75ms │        3635.95ms │     no change │
│ QQuery 35    │  1069.16ms │         957.72ms │ +1.12x faster │
│ QQuery 36    │   148.00ms │         140.89ms │     no change │
│ QQuery 37    │   100.97ms │          99.02ms │     no change │
│ QQuery 38    │   103.56ms │         103.36ms │     no change │
│ QQuery 39    │   378.87ms │         373.10ms │     no change │
│ QQuery 40    │    33.78ms │          33.72ms │     no change │
│ QQuery 41    │    32.01ms │          32.04ms │     no change │
│ QQuery 42    │    40.22ms │          40.03ms │     no change │
└──────────────┴────────────┴──────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)               │ 64300.55ms │
│ Total Time (check-hash-first)   │ 62206.11ms │
│ Average Time (main)             │  1495.36ms │
│ Average Time (check-hash-first) │  1446.65ms │
│ Queries Faster                  │         10 │
│ Queries Slower                  │          1 │
│ Queries with No Change          │         32 │
└─────────────────────────────────┴────────────┘

I guess the change is not really related to those queries, we might need specialized query that could really benefit from it.
I think the query that has high cost on arrow::Row comparison could shows the difference.

Rachelint · 2024-07-30T13:21:15Z

@jayzhan211 Maybe hash table size should be considered, too.
If few collision exist, this pr added an extra u64 comparison actually 🤔 .

alamb

I ran again with release mode (#11722) and it seems to be showing an improvement in Q32 as well and several of the other

--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ check-hash-first ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.69ms │           0.71ms │     no change │
│ QQuery 1     │    91.81ms │          89.52ms │     no change │
│ QQuery 2     │   195.10ms │         190.79ms │     no change │
│ QQuery 3     │   203.15ms │         198.03ms │     no change │
│ QQuery 4     │  2230.05ms │        2237.94ms │     no change │
│ QQuery 5     │  1984.18ms │        2032.57ms │     no change │
│ QQuery 6     │    78.88ms │          79.93ms │     no change │
│ QQuery 7     │    94.95ms │          91.81ms │     no change │
│ QQuery 8     │  3227.81ms │        3011.43ms │ +1.07x faster │
│ QQuery 9     │  2384.85ms │        2360.24ms │     no change │
│ QQuery 10    │   847.94ms │         847.07ms │     no change │
│ QQuery 11    │   910.10ms │         915.36ms │     no change │
│ QQuery 12    │  2139.05ms │        2156.18ms │     no change │
│ QQuery 13    │  4697.20ms │        4549.50ms │     no change │
│ QQuery 14    │  2952.47ms │        2902.65ms │     no change │
│ QQuery 15    │  2450.43ms │        2445.44ms │     no change │
│ QQuery 16    │  5974.80ms │        5777.67ms │     no change │
│ QQuery 17    │  5895.89ms │        5650.60ms │     no change │
│ QQuery 18    │ 12165.50ms │       11615.76ms │     no change │
│ QQuery 19    │   168.33ms │         169.17ms │     no change │
│ QQuery 20    │  2749.31ms │        2709.84ms │     no change │
│ QQuery 21    │  3535.36ms │        3495.37ms │     no change │
│ QQuery 22    │  9567.54ms │        9511.10ms │     no change │
│ QQuery 23    │ 22300.42ms │       22589.13ms │     no change │
│ QQuery 24    │  1365.25ms │        1376.12ms │     no change │
│ QQuery 25    │  1176.10ms │        1176.65ms │     no change │
│ QQuery 26    │  1487.44ms │        1506.23ms │     no change │
│ QQuery 27    │  4022.80ms │        3984.70ms │     no change │
│ QQuery 28    │ 30287.63ms │       29893.82ms │     no change │
│ QQuery 29    │  1059.72ms │        1045.76ms │     no change │
│ QQuery 30    │  2503.34ms │        2480.10ms │     no change │
│ QQuery 31    │  3208.06ms │        3070.93ms │     no change │
│ QQuery 32    │ 17948.66ms │       16576.11ms │ +1.08x faster │
│ QQuery 33    │  9600.79ms │        9552.98ms │     no change │
│ QQuery 34    │  9703.76ms │        9574.08ms │     no change │
│ QQuery 35    │  3786.65ms │        3743.23ms │     no change │
│ QQuery 36    │   345.94ms │         344.00ms │     no change │
│ QQuery 37    │   232.55ms │         229.18ms │     no change │
│ QQuery 38    │   193.74ms │         199.95ms │     no change │
│ QQuery 39    │  1149.55ms │        1174.42ms │     no change │
│ QQuery 40    │    87.70ms │          92.48ms │  1.05x slower │
│ QQuery 41    │    79.73ms │          81.89ms │     no change │
│ QQuery 42    │    95.45ms │          98.37ms │     no change │
└──────────────┴────────────┴──────────────────┴───────────────┘

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary               ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (main_base)          │ 175180.68ms │
│ Total Time (check-hash-first)   │ 171828.82ms │
│ Average Time (main_base)        │   4073.97ms │
│ Average Time (check-hash-first) │   3996.02ms │
│ Queries Faster                  │           2 │
│ Queries Slower                  │           1 │
│ Queries with No Change          │          40 │
└─────────────────────────────────┴─────────────┘

So I think this looks good to me 👍

Thanks a lot @Rachelint and everyone

alamb · 2024-07-30T18:47:48Z

Here is another run showing improvement:

Details

Comparing main_base and check-hash-first
--------------------
Benchmark clickbench_1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃  main_base ┃ check-hash-first ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     0.70ms │           0.70ms │     no change │
│ QQuery 1     │    91.56ms │          90.72ms │     no change │
│ QQuery 2     │   191.85ms │         186.74ms │     no change │
│ QQuery 3     │   200.63ms │         198.96ms │     no change │
│ QQuery 4     │  2211.34ms │        2238.22ms │     no change │
│ QQuery 5     │  2055.51ms │        2071.21ms │     no change │
│ QQuery 6     │    85.87ms │          85.66ms │     no change │
│ QQuery 7     │    93.26ms │          92.07ms │     no change │
│ QQuery 8     │  3213.88ms │        2974.12ms │ +1.08x faster │
│ QQuery 9     │  2332.27ms │        2351.33ms │     no change │
│ QQuery 10    │   860.69ms │         845.38ms │     no change │
│ QQuery 11    │   936.60ms │         909.28ms │     no change │
│ QQuery 12    │  2126.25ms │        2133.29ms │     no change │
│ QQuery 13    │  4685.59ms │        4528.49ms │     no change │
│ QQuery 14    │  2958.13ms │        2864.44ms │     no change │
│ QQuery 15    │  2508.42ms │        2440.92ms │     no change │
│ QQuery 16    │  6054.02ms │        5737.46ms │ +1.06x faster │
│ QQuery 17    │  5929.43ms │        5760.15ms │     no change │
│ QQuery 18    │ 12280.25ms │       11745.12ms │     no change │
│ QQuery 19    │   170.05ms │         165.38ms │     no change │
│ QQuery 20    │  2726.60ms │        2716.05ms │     no change │
│ QQuery 21    │  3501.18ms │        3519.52ms │     no change │
│ QQuery 22    │  9514.93ms │        9516.19ms │     no change │
│ QQuery 23    │ 22818.47ms │       22238.19ms │     no change │
│ QQuery 24    │  1380.11ms │        1375.95ms │     no change │
│ QQuery 25    │  1176.97ms │        1165.53ms │     no change │
│ QQuery 26    │  1494.63ms │        1493.36ms │     no change │
│ QQuery 27    │  4064.91ms │        4014.09ms │     no change │
│ QQuery 28    │ 31117.67ms │       30759.59ms │     no change │
│ QQuery 29    │  1045.69ms │        1047.90ms │     no change │
│ QQuery 30    │  2551.21ms │        2483.07ms │     no change │
│ QQuery 31    │  3252.34ms │        3092.13ms │     no change │
│ QQuery 32    │ 17868.98ms │       16585.30ms │ +1.08x faster │
│ QQuery 33    │  9423.88ms │        9830.09ms │     no change │
│ QQuery 34    │  9699.31ms │        9704.12ms │     no change │
│ QQuery 35    │  3835.63ms │        3776.00ms │     no change │
│ QQuery 36    │   350.32ms │         340.48ms │     no change │
│ QQuery 37    │   229.44ms │         230.08ms │     no change │
│ QQuery 38    │   198.22ms │         194.55ms │     no change │
│ QQuery 39    │  1190.97ms │        1166.81ms │     no change │
│ QQuery 40    │    94.33ms │          85.10ms │ +1.11x faster │
│ QQuery 41    │    83.11ms │          80.14ms │     no change │
│ QQuery 42    │    96.20ms │          98.17ms │     no change │
└──────────────┴────────────┴──────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary               ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (main_base)          │ 176701.41ms │
│ Total Time (check-hash-first)   │ 172932.04ms │
│ Average Time (main_base)        │   4109.34ms │
│ Average Time (check-hash-first) │   4021.68ms │
│ Queries Faster                  │           4 │
│ Queries Slower                  │           0 │
│ Queries with No Change          │          39 │
└─────────────────────────────────┴─────────────┘

alamb · 2024-07-31T12:32:19Z

Thanks again @Rachelint @Dandandan and @2010YOUY01

Rachelint · 2024-07-31T16:53:32Z

Duckdb has the similar check(but just use u16 prefix of hash):
https://github.com/duckdb/duckdb/blob/f92559f42c118075baaa8daafc437954eb5c85ec/src/execution/aggregate_hashtable.cpp#L379-L386

And mentioned by @alamb , the related pr in duckdb is interesting to see
duckdb/duckdb#9575

Rachelint and others added 2 commits July 30, 2024 15:59

check hashes first during probing.

a0b8009

fix style.

9458a24

Rachelint marked this pull request as ready for review July 30, 2024 08:06

Dandandan approved these changes Jul 30, 2024

View reviewed changes

alamb reviewed Jul 30, 2024

View reviewed changes

alamb mentioned this pull request Jul 30, 2024

Use cargo release in bench.sh #11722

Merged

alamb approved these changes Jul 30, 2024

View reviewed changes

alamb merged commit 89677ae into apache:main Jul 31, 2024
24 checks passed

alamb mentioned this pull request Jul 31, 2024

Minor: Add comment explaining rationale for hash check #11750

Merged

alamb mentioned this pull request Aug 5, 2024

DataFusion weekly project plan (Andrew Lamb) - Aug 5, 2024 #11826

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check hashes first during probing the aggr hash table #11718

Check hashes first during probing the aggr hash table #11718

Rachelint commented Jul 30, 2024

Rachelint commented Jul 30, 2024

Dandandan left a comment

alamb left a comment

alamb Jul 30, 2024

jayzhan211 Jul 30, 2024

Rachelint Jul 30, 2024 •

edited

Loading

Rachelint Jul 30, 2024

Dandandan Jul 30, 2024 •

edited

Loading

Rachelint Jul 30, 2024 •

edited

Loading

jayzhan211 Jul 30, 2024

2010YOUY01 Jul 30, 2024

Rachelint Jul 30, 2024

alamb Jul 31, 2024

alamb commented Jul 30, 2024

alamb commented Jul 30, 2024

Rachelint commented Jul 30, 2024 •

edited

Loading

alamb commented Jul 30, 2024

Rachelint commented Jul 30, 2024 •

edited

Loading

Rachelint commented Jul 30, 2024 •

edited

Loading

jayzhan211 commented Jul 30, 2024

jayzhan211 commented Jul 30, 2024 •

edited

Loading

Rachelint commented Jul 30, 2024 •

edited

Loading

jayzhan211 commented Jul 30, 2024 •

edited

Loading

Rachelint commented Jul 30, 2024 •

edited

Loading

alamb left a comment

alamb commented Jul 30, 2024

alamb commented Jul 31, 2024

Rachelint commented Jul 31, 2024

Check hashes first during probing the aggr hash table #11718

Check hashes first during probing the aggr hash table #11718

Conversation

Rachelint commented Jul 30, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Rachelint commented Jul 30, 2024

Dandandan left a comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

alamb Jul 30, 2024

Choose a reason for hiding this comment

jayzhan211 Jul 30, 2024

Choose a reason for hiding this comment

Rachelint Jul 30, 2024 • edited Loading

Choose a reason for hiding this comment

Rachelint Jul 30, 2024

Choose a reason for hiding this comment

Dandandan Jul 30, 2024 • edited Loading

Choose a reason for hiding this comment

Rachelint Jul 30, 2024 • edited Loading

Choose a reason for hiding this comment

jayzhan211 Jul 30, 2024

Choose a reason for hiding this comment

2010YOUY01 Jul 30, 2024

Choose a reason for hiding this comment

Rachelint Jul 30, 2024

Choose a reason for hiding this comment

alamb Jul 31, 2024

Choose a reason for hiding this comment

alamb commented Jul 30, 2024

alamb commented Jul 30, 2024

Rachelint commented Jul 30, 2024 • edited Loading

alamb commented Jul 30, 2024

Rachelint commented Jul 30, 2024 • edited Loading

Rachelint commented Jul 30, 2024 • edited Loading

jayzhan211 commented Jul 30, 2024

jayzhan211 commented Jul 30, 2024 • edited Loading

Rachelint commented Jul 30, 2024 • edited Loading

jayzhan211 commented Jul 30, 2024 • edited Loading

Rachelint commented Jul 30, 2024 • edited Loading

alamb left a comment

Choose a reason for hiding this comment

alamb commented Jul 30, 2024

alamb commented Jul 31, 2024

Rachelint commented Jul 31, 2024

Rachelint Jul 30, 2024 •

edited

Loading

Dandandan Jul 30, 2024 •

edited

Loading

Rachelint Jul 30, 2024 •

edited

Loading

Rachelint commented Jul 30, 2024 •

edited

Loading

Rachelint commented Jul 30, 2024 •

edited

Loading

Rachelint commented Jul 30, 2024 •

edited

Loading

jayzhan211 commented Jul 30, 2024 •

edited

Loading

Rachelint commented Jul 30, 2024 •

edited

Loading

jayzhan211 commented Jul 30, 2024 •

edited

Loading

Rachelint commented Jul 30, 2024 •

edited

Loading