Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hash values to speed-up to comparing entities #104

Merged
merged 16 commits into from
Jun 6, 2024
Merged

Conversation

uatuko
Copy link
Owner

@uatuko uatuko commented Jun 4, 2024

Comparing int64s (8 bytes fixed length) is more efficient than comparing variable length strings. This change attempts to utilises this efficiency to reduce latencies when checking relations (specially when using set strategy).

A pre-computed int64 hash values for tuples' left and right entities are stored in the DB along with an index specifically created to yield index only scans when retrieving data for the set strategy. This index is also used to achieve a favourable query plan when listing tuples for the graph strategy.

Additional computations, hash data and index writes results in slightly slower writes but better reads.

Benchmarks

b1. Spot algorithm

Load Average: 1.84, 2.01, 2.03
------------------------------------------------------------------------------------------
Benchmark                                Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------------------
bm_spot_intersection/8                 160 ns          160 ns      4358872 comparisons=93.6867M/s ops=6.24578M/s
bm_spot_intersection/64               1848 ns         1845 ns       378323 comparisons=68.8411M/s ops=542.056k/s
bm_spot_intersection/512             14806 ns        14786 ns        47356 comparisons=69.185M/s ops=67.6295k/s
bm_spot_intersection/4096           118449 ns       118280 ns         5911 comparisons=69.251M/s ops=8.45453k/s
bm_spot_intersection/8192           236818 ns       236432 ns         2948 comparisons=69.2926M/s ops=4.22954k/s
bm_spot_intersection_int64/8          21.3 ns         21.3 ns     32957461 comparisons=705.842M/s ops=47.0561M/s
bm_spot_intersection_int64/64          371 ns          370 ns      2101030 comparisons=343.293M/s ops=2.7031M/s
bm_spot_intersection_int64/512        2725 ns         2721 ns       257593 comparisons=375.953M/s ops=367.501k/s
bm_spot_intersection_int64/4096      21324 ns        21291 ns        32302 comparisons=384.722M/s ops=46.9689k/s
bm_spot_intersection_int64/8192      42979 ns        42921 ns        16160 comparisons=381.697M/s ops=23.2984k/s

b2. Check relations using set strategy

Before
Load Average: 3.00, 2.84, 2.78
--------------------------------------------------------------------------------------
Benchmark                            Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------
bm_relations/check_set/8        196064 ns        35581 ns        19489 comparisons=56.2097k/s ops=28.1049k/s
bm_relations/check_set/64      1003334 ns       141064 ns         4932 comparisons=14.178k/s ops=7.08899k/s
bm_relations/check_set/512     8286319 ns       948926 ns          741 comparisons=2.10765k/s ops=1.05382k/s
bm_relations/check_set/4096   77921608 ns      7398617 ns           94 comparisons=270.321/s ops=135.16/s
bm_relations/check_set/8192  163626824 ns     15178500 ns           46 comparisons=131.765/s ops=65.8827/s
Load Average: 1.87, 2.05, 2.36
--------------------------------------------------------------------------------------
Benchmark                            Time             CPU   Iterations UserCounters...
--------------------------------------------------------------------------------------
bm_relations/check_set/8        256227 ns        38917 ns        10000 comparisons=308.351k/s ops=25.6959k/s
bm_relations/check_set/64       336578 ns        64024 ns        11156 comparisons=281.145k/s ops=15.6192k/s
bm_relations/check_set/512      898412 ns       277908 ns         2510 comparisons=892.38k/s ops=3.59831k/s
bm_relations/check_set/4096    5327552 ns      1957718 ns          358 comparisons=77.6414k/s ops=510.799/s
bm_relations/check_set/8192   10625457 ns      3873122 ns          181 comparisons=58.8672k/s ops=258.19/s

b3. Check relations using graph strategy

Before
Load Average: 5.95, 3.78, 3.13
-------------------------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------
bm_relations/check_graph/4/128       639445 ns        57550 ns        12220 ops=17.3762k/s vertices=104.257k/s
bm_relations/check_graph/8/128      1501169 ns        91245 ns         7583 ops=10.9595k/s vertices=109.595k/s
bm_relations/check_graph/32/128    14705872 ns       330233 ns         1000 ops=3.02816k/s vertices=102.958k/s
bm_relations/check_graph/4/512      1451171 ns        58645 ns        11816 ops=17.0517k/s vertices=102.31k/s
bm_relations/check_graph/8/512      4344049 ns        96089 ns         1000 ops=10.407k/s vertices=104.07k/s
bm_relations/check_graph/32/512    53038020 ns       353580 ns          100 ops=2.82821k/s vertices=96.1593k/s
bm_relations/check_graph/4/2048     4869842 ns        61965 ns         1000 ops=16.1381k/s vertices=96.8289k/s
bm_relations/check_graph/8/2048    15515533 ns       106336 ns         1000 ops=9.40415k/s vertices=94.0415k/s
bm_relations/check_graph/32/2048  217419540 ns       709720 ns          100 ops=1.40901k/s vertices=47.9062k/s
Load Average: 2.57, 2.33, 2.56
-------------------------------------------------------------------------------------------
Benchmark                                 Time             CPU   Iterations UserCounters...
-------------------------------------------------------------------------------------------
bm_relations/check_graph/4/128       424975 ns        60051 ns        11776 ops=16.6526k/s vertices=99.9157k/s
bm_relations/check_graph/8/128       658312 ns        93717 ns         7311 ops=10.6704k/s vertices=106.704k/s
bm_relations/check_graph/32/128     2067346 ns       320007 ns         2165 ops=3.12493k/s vertices=106.248k/s
bm_relations/check_graph/4/512       423596 ns        58971 ns        11884 ops=16.9576k/s vertices=101.746k/s
bm_relations/check_graph/8/512       658667 ns        93952 ns         7430 ops=10.6437k/s vertices=106.437k/s
bm_relations/check_graph/32/512     2091182 ns       328202 ns         2085 ops=3.0469k/s vertices=103.595k/s
bm_relations/check_graph/4/2048      431698 ns        60988 ns        11462 ops=16.3966k/s vertices=98.3795k/s
bm_relations/check_graph/8/2048      680199 ns        98905 ns         7242 ops=10.1107k/s vertices=101.107k/s
bm_relations/check_graph/32/2048    2118427 ns       325959 ns         2195 ops=3.06787k/s vertices=104.307k/s

b4. Create relations 🔻

Before
Load Average: 1.53, 2.37, 2.66
------------------------------------------------------------------------------
Benchmark                    Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------
bm_relations/create      89585 ns         7293 ns        95761 ops=137.118k/s writes=137.118k/s
Load Average: 1.61, 1.88, 1.98
------------------------------------------------------------------------------
Benchmark                    Time             CPU   Iterations UserCounters...
------------------------------------------------------------------------------
bm_relations/create     102199 ns         7479 ns        93777 ops=133.703k/s writes=133.703k/s

Query plans

p1. Listing tuplets right

explain select
  _id,
  _r_hash as _hash,
  relation,
  null as strand
from tuples
where space_id = '' and _l_hash = 8357126990540548874
order by _hash desc
limit 1000;
                                                QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..130.18 rows=1000 width=68)
   ->  Index Only Scan Backward using "tuples.idx-rtl" on tuples  (cost=0.29..1064.36 rows=8192 width=68)
         Index Cond: ((space_id = ''::text) AND (_l_hash = '8357126990540548874'::bigint))

p2. Listing tuplets left

explain select
  _id,
  _l_hash as _hash,
  relation,
  strand as strand
from tuples
where space_id = '' and _r_hash = 2410070456889820883 and relation = 'reader'
order by _hash desc
limit 1000;
                                                        QUERY PLAN                                                         
---------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.28..8.30 rows=1 width=43)
   ->  Index Only Scan Backward using "tuples.idx-rtl" on tuples  (cost=0.28..8.30 rows=1 width=43)
         Index Cond: ((space_id = ''::text) AND (_r_hash = '2410070456889820883'::bigint) AND (relation = 'reader'::text))

p3. Listing tuples left (with relation)

explain select
  space_id,
  strand,
  l_entity_type, l_entity_id,
  relation,
  r_entity_type, r_entity_id,
  attrs,
  l_principal_id, r_principal_id,
  _id, _rev,
  _l_hash, _r_hash,
  _rid_l, _rid_r
from tuples
where
  space_id = ''
  and _r_hash = 1704957511755688144
  and r_entity_type = 'bm_relations.check_graph' and r_entity_id = 'cpg94bbjuspgs1abd9pg'
  and relation = 'reader'
order by l_entity_id desc
limit 1000;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8.46..8.46 rows=1 width=306)
   ->  Sort  (cost=8.46..8.46 rows=1 width=306)
         Sort Key: l_entity_id DESC
         ->  Index Scan using "tuples.idx-rtl" on tuples  (cost=0.42..8.45 rows=1 width=306)
               Index Cond: ((space_id = ''::text) AND (_r_hash = '1704957511755688144'::bigint) AND (relation = 'reader'::text))
               Filter: ((r_entity_type = 'bm_relations.check_graph'::text) AND (r_entity_id = 'cpg94bbjuspgs1abd9pg'::text))

p4. Listing tuples left (without relation)

explain select
  space_id,
  strand,
  l_entity_type, l_entity_id,
  relation,
  r_entity_type, r_entity_id,
  attrs,
  l_principal_id, r_principal_id,
  _id, _rev,
  _l_hash, _r_hash,
  _rid_l, _rid_r
from tuples
where
  space_id = ''
  and _r_hash = 7809478143556808948
  and r_entity_type = 'bm_relations.check_graph' and r_entity_id = 'cpg94cjjuspgs1acv98g'
order by l_entity_id desc
limit 1000;
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8.45..8.46 rows=1 width=306)
   ->  Sort  (cost=8.45..8.46 rows=1 width=306)
         Sort Key: l_entity_id DESC
         ->  Index Scan using "tuples.idx-rtl" on tuples  (cost=0.42..8.44 rows=1 width=306)
               Index Cond: ((space_id = ''::text) AND (_r_hash = '7809478143556808948'::bigint))
               Filter: ((r_entity_type = 'bm_relations.check_graph'::text) AND (r_entity_id = 'cpg94cjjuspgs1acv98g'::text))

p5. Listing tuples right

explain select
  space_id,
  strand,
  l_entity_type, l_entity_id,
  relation,
  r_entity_type, r_entity_id,
  attrs,
  l_principal_id, r_principal_id,
  _id, _rev,
  _l_hash, _r_hash,
  _rid_l, _rid_r
from tuples
where
  space_id = ''
  and 0 = 0
  and l_entity_type = 'bm_relations.check_graph' and l_entity_id = 'cpgcqn3juspiar5r9esg'
order by r_entity_id desc
limit 1000;
                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=13.79..13.80 rows=4 width=306)
   ->  Sort  (cost=13.79..13.80 rows=4 width=306)
         Sort Key: r_entity_id DESC
         ->  Index Scan using "tuples.unique" on tuples  (cost=0.41..13.75 rows=4 width=306)
               Index Cond: ((space_id = ''::text) AND (l_entity_type = 'bm_relations.check_graph'::text) AND (l_entity_id = 'cpgcqn3juspiar5r9esg'::text))

Copy link

codecov bot commented Jun 4, 2024

Codecov Report

Attention: Patch coverage is 96.72131% with 4 lines in your changes missing coverage. Please review.

Project coverage is 93.50%. Comparing base (6442c2b) to head (e1c4f58).

Files Patch % Lines
src/svc/relations.cpp 80.00% 1 Missing and 2 partials ⚠️
src/db/tuples.cpp 97.82% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #104      +/-   ##
==========================================
+ Coverage   93.17%   93.50%   +0.32%     
==========================================
  Files          18       20       +2     
  Lines        1304     1401      +97     
  Branches      160      168       +8     
==========================================
+ Hits         1215     1310      +95     
- Misses         64       65       +1     
- Partials       25       26       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@uatuko uatuko changed the title Use hash values to speed to comparing entities Use hash values to speed-up to comparing entities Jun 4, 2024
@uatuko uatuko marked this pull request as ready for review June 5, 2024 16:50
neculalaura
neculalaura previously approved these changes Jun 6, 2024
@uatuko uatuko merged commit 41a32a3 into main Jun 6, 2024
4 checks passed
@uatuko uatuko deleted the feature/hash-compare branch June 6, 2024 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants