Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Caching of sharding function #82

Closed
Totktonada opened this issue Oct 23, 2021 · 1 comment · Fixed by #85
Closed

Caching of sharding function #82

Totktonada opened this issue Oct 23, 2021 · 1 comment · Fixed by #85
Assignees
Labels
feature A new functionality

Comments

@Totktonada
Copy link
Member

Totktonada commented Oct 23, 2021

(There is more general task about caching #69, but I want to eat this pie piece by piece.)

We're going to add ddl.bucket_id() function (see #76). The function may be called quite frequently, so it worth to take care to its performance.

The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space, mainly due to those actions:

  1. MsgPack decoding.
  2. loadstring() is the function is declared as code ({body = <...>}).
  3. Extra Lua GC pressure on re-cretion of usually same objects.

Ideally obtaining of the function should be just Lua table lookup. And it is possible to achieve.

The only way to track _ddl_sharding_func changes is to set a trigger on the space to track modifications (on_replace)1. Since it is not always possible to set a trigger when the module is just loaded2, I propose a trick.

The key idea is to generate an initial cache value and set the trigger when we access the sharding function information first time. After this the cache will be updated 'in background' (by the trigger) and we can just access the cache.

What to consider:

  • Take care to access from different fibers (see the locked() function and the synchronized module for examples).
  • But implement this locking in a way that works good with hot reload (including cartridge's one, which cleans up all globals).
  • How to remove the old trigger if our code was unloaded (especially if _G was cleaned up by cartridge)? There are some ideas how to keep a state between reloads in https://github.com/tarantool/conf/issues/2 (at end of the issue description). A Lua/C module may also use Lua registry, but it is not our case.
  • Don't use the cache in a transaction? That's slow, but how else we can ensure that the cache will correspond to the state visible in the transaction?
  • Perform loadstring() when the function is defined as code (to don't do that each time).
  • But don't save the function itself if it is defined by name: a user may replace it. Maybe just parse 'dot.notation' into something like {'dot', 'notation'}.

Optimization trick:

We can use the trick with two implementations of the cache access function (see src/box/lua/load_cfg.lua in tarantool for example). The first function doing all the work: check whether the trigger is set (and the initial cache is generated), set the trigger and generate the cache if necessary, access cache, replace itself with the second function. The second function skips extra checks and just access the cache.

I'll note that the first function must not set the trigger unconditionally, because it may be called after hot reload. See tarantool/tarantool#5826.


Looks a bit tricky, but doable. Opinions?

Footnotes

  1. I filed https://github.com/tarantool/tarantool/issues/6544 and https://github.com/tarantool/tarantool/issues/6545 to track it using the database schema version in a future. I think it may simplify some future code.

  2. box may be unconfigured (or not fully loaded), when the module is required first time.

@Totktonada Totktonada added the feature A new functionality label Oct 23, 2021
@ligurio
Copy link
Member

ligurio commented Oct 27, 2021

Alexander Turenko initiated a discussion in #77 regarding to replace of pcall() with raw function call.
Because we can see some performance drawbacks here. Need to check hypothesis.

0x501D added a commit that referenced this issue Dec 14, 2021
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.
This cache adds sharding function cache divided into two parts:
raw and processed. Raw part is used for get_space_schema() method.
Raw cache stored as is. Processed part is used for bucket_id().

Closes #82
0x501D added a commit that referenced this issue Dec 15, 2021
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.
This cache adds sharding function cache divided into two parts:
raw and processed. Raw part is used for get_space_schema() method.
Raw cache stored as is. Processed part is used for bucket_id().

Closes #82
0x501D added a commit that referenced this issue Dec 15, 2021
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.
This cache adds sharding function cache divided into two parts:
raw and processed. Raw part is used for get_space_schema() method.
Raw cache stored as is. Processed part is used for bucket_id().

Closes #82
0x501D added a commit that referenced this issue Dec 16, 2021
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.

This cache adds sharding function cache divided into two parts:
raw and processed. Raw part is used for get_schema() method.
Raw cache stored as is. Processed part is used for bucket_id().

Processed sharding_func cache entry may be:
* table with parsed dot notation (like {'foo', 'bar'})
* function ready to call, this offloads using of loadstring()
* string with an error

Cache will be rebuilded if:
* _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace
  trigger
* schema changed: cache checks box.internal.schema_version changes

This patch does not server hot reload techniques.
This entails an on_replace trigger duplication if hot reload occures.
Hot reload support will be done in separate task.

Closes #82
0x501D added a commit that referenced this issue Dec 16, 2021
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.

This cache adds sharding function cache divided into two parts:
raw and processed. Raw part is used for get_schema() method.
Raw cache stored as is. Processed part is used for bucket_id().

Processed sharding_func cache entry may be:
* table with parsed dot notation (like {'foo', 'bar'})
* function ready to call, this offloads using of loadstring()
* string with an error

Cache will be rebuilded if:
* _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace
  trigger
* schema changed: cache checks box.internal.schema_version changes

This patch does not serve hot reload techniques.
This entails an on_replace trigger duplication if hot reload occurs.
Hot reload support will be done in separate task.

Closes #82
0x501D added a commit that referenced this issue Dec 16, 2021
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.

This cache adds sharding function cache divided into two parts:
raw and processed. Raw part is used for get_schema() method.
Raw cache stored as is. Processed part is used for bucket_id().

Processed sharding_func cache entry may be:
* table with parsed dot notation (like {'foo', 'bar'})
* function ready to call, this offloads using of loadstring()
* string with an error

Cache will be rebuilded if:
* _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace
  trigger
* schema changed: cache checks box.internal.schema_version changes

This patch does not serve hot reload techniques.
This entails an on_replace trigger duplication if hot reload occurs.
Hot reload support will be done in separate task:
#87

Closes #82
0x501D added a commit that referenced this issue Dec 23, 2021
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.

This cache adds sharding function cache divided into two parts:
raw and processed. Raw part is used for get_schema() method.
Raw cache stored as is. Processed part is used for bucket_id().

Processed sharding_func cache entry may be:
* table with parsed dot notation (like {'foo', 'bar'})
* function ready to call, this offloads using of loadstring()
* string with an error

Cache will be rebuilded if:
* _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace
  trigger
* schema changed: cache checks box.internal.schema_version changes

This patch does not serve hot reload techniques.
This entails an on_replace trigger duplication if hot reload occurs.
Hot reload support will be done in separate task:
#87

Closes #82
0x501D added a commit that referenced this issue Dec 27, 2021
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.

This cache adds sharding function cache divided into two parts:
raw and processed. Raw part is used for get_schema() method.
Raw cache stored as is. Processed part is used for bucket_id().

Processed sharding_func cache entry may be:
* table with parsed dot notation (like {'foo', 'bar'})
* function ready to call, this offloads using of loadstring()
* string with an error

Cache will be rebuilded if:
* _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace
  trigger
* schema changed: cache checks box.internal.schema_version changes

This patch does not serve hot reload techniques.
This entails an on_replace trigger duplication if hot reload occurs.
Hot reload support will be done in separate task:
#87

Closes #82
0x501D added a commit that referenced this issue Dec 28, 2021
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.

This cache adds sharding function cache divided into two parts:
raw and processed. Raw part is used for get_schema() method.
Raw cache stored as is. Processed part is used for bucket_id().

Processed sharding_func cache entry may be:
* table with parsed dot notation (like {'foo', 'bar'})
* function ready to call, this offloads using of loadstring()
* string with an error

Cache will be rebuilded if:
* _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace
  trigger
* schema changed: cache checks box.internal.schema_version changes

This patch does not serve hot reload techniques.
This entails an on_replace trigger duplication if hot reload occurs.
Hot reload support will be done in separate task:
#87

Closes #82
0x501D added a commit that referenced this issue Dec 30, 2021
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.

This cache adds sharding function cache divided into two parts:
raw and processed. Raw part is used for get_schema() method.
Raw cache stored as is. Processed part is used for bucket_id().

Processed sharding_func cache entry may be:
* table with parsed dot notation (like {'foo', 'bar'})
* function ready to call, this offloads using of loadstring()
* string with an error

Cache will be rebuilded if:
* _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace
  trigger
* schema changed: cache checks box.internal.schema_version changes

This patch does not serve hot reload techniques.
This entails an on_replace trigger duplication if hot reload occurs.
Hot reload support will be done in separate task:
#87

Closes #82
0x501D added a commit that referenced this issue Dec 30, 2021
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.

This cache adds sharding function cache divided into two parts:
raw and processed. Raw part is used for get_schema() method.
Raw cache stored as is. Processed part is used for bucket_id().

Processed sharding_func cache entry may be:
* table with parsed dot notation (like {'foo', 'bar'})
* function ready to call, this offloads using of loadstring()
* string with an error

Cache will be rebuilded if:
* _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace
  trigger
* schema changed: cache checks box.internal.schema_version changes

This patch does not serve hot reload techniques.
This entails an on_replace trigger duplication if hot reload occurs.
Hot reload support will be done in separate task:
#87

Closes #82
0x501D added a commit that referenced this issue Jan 14, 2022
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.

This cache adds sharding function cache divided into two parts:
raw and processed. Raw part is used for get_schema() method.
Raw cache stored as is. Processed part is used for bucket_id().

Processed sharding_func cache entry may be:
* table with parsed dot notation (like {'foo', 'bar'})
* function ready to call, this offloads using of loadstring()
* string with an error

Cache will be rebuilded if:
* _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace
  trigger
* schema changed: cache checks box.internal.schema_version changes

This patch does not serve hot reload techniques.
This entails an on_replace trigger duplication if hot reload occurs.
Hot reload support will be done in separate task:
#87

Benchmarks - 10000000 bucket_id() calls (file test/bench_cache.lua):
Baseline (no DDL):     3.38s user 0.01s system 99% cpu 3.389 total
After this patch:
DDL with function body: 3.81s user 0.01s system 99% cpu 3.818 total
DDL with function name: 5.49s user 0.00s system 99% cpu 5.495 total
Before patch:
DDL with function body: 55.95s user 0.40s system 99% cpu 56.354 total
DDL with function name: 13.68s user 0.13s system 99% cpu 13.807 total

Closes #82
0x501D added a commit that referenced this issue Jan 14, 2022
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.

Cache contains:
raw_tuple - raw sharding metadata, used for get_schema()
parsed_func_name - parsed dot notation (like {'foo', 'bar'})
callable - function ready to call, this offloads using of loadstring()
error - string with an error: not nil only if setting callable fails

Cache will be rebuilded if:
* _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace
  trigger
* schema changed: cache checks box.internal.schema_version changes

This patch does not serve hot reload techniques.
This entails an on_replace trigger duplication if hot reload occurs.
Hot reload support will be done in separate task:
#87

Benchmarks - 10000000 bucket_id() calls (file test/bench_cache.lua):
Baseline (no DDL):     3.38s user 0.01s system 99% cpu 3.389 total
After this patch:
DDL with function body: 3.81s user 0.01s system 99% cpu 3.818 total
DDL with function name: 5.49s user 0.00s system 99% cpu 5.495 total
Before patch:
DDL with function body: 55.95s user 0.40s system 99% cpu 56.354 total
DDL with function name: 13.68s user 0.13s system 99% cpu 13.807 total

Closes #82
0x501D added a commit that referenced this issue Jan 21, 2022
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.

Cache contains:
raw_tuple - raw sharding metadata, used for get_schema()
parsed_func_name - parsed dot notation (like {'foo', 'bar'})
callable - function ready to call, this offloads using of loadstring()
error - string with an error: not nil only if setting callable fails

Cache will be rebuilded if:
* _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace
  trigger
* schema changed: cache checks box.internal.schema_version changes

This patch does not serve hot reload techniques.
This entails an on_replace trigger duplication if hot reload occurs.
Hot reload support will be done in separate task:
#87

Benchmarks - 10000000 bucket_id() calls (file test/bench_cache.lua):
Baseline (no DDL):     3.38s user 0.01s system 99% cpu 3.389 total
After this patch:
DDL with function body: 3.81s user 0.01s system 99% cpu 3.818 total
DDL with function name: 5.49s user 0.00s system 99% cpu 5.495 total
Before patch:
DDL with function body: 55.95s user 0.40s system 99% cpu 56.354 total
DDL with function name: 13.68s user 0.13s system 99% cpu 13.807 total

Closes #82
0x501D added a commit that referenced this issue Jan 21, 2022
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.

Cache contains:
raw_tuple - raw sharding metadata, used for get_schema()
parsed_func_name - parsed dot notation (like {'foo', 'bar'})
callable - function ready to call, this offloads using of loadstring()
error - string with an error: not nil only if setting callable fails

Cache will be rebuilded if:
* _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace
  trigger
* schema changed: cache checks box.internal.schema_version changes

This patch does not serve hot reload techniques.
This entails an on_replace trigger duplication if hot reload occurs.
Hot reload support will be done in separate task:
#87

Benchmarks - 10000000 bucket_id() calls (file test/bench_cache.lua):
Baseline (no DDL):     3.38s user 0.01s system 99% cpu 3.389 total
After this patch:
DDL with function body: 3.81s user 0.01s system 99% cpu 3.818 total
DDL with function name: 5.49s user 0.00s system 99% cpu 5.495 total
Before patch:
DDL with function body: 55.95s user 0.40s system 99% cpu 56.354 total
DDL with function name: 13.68s user 0.13s system 99% cpu 13.807 total

Closes #82
0x501D added a commit that referenced this issue Jan 24, 2022
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.

Cache contains:
raw_tuple - raw sharding metadata, used for get_schema()
parsed_func_name - parsed dot notation (like {'foo', 'bar'})
callable - function ready to call, this offloads using of loadstring()
error - string with an error: not nil only if setting callable fails

Cache will be rebuilded if:
* _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace
  trigger
* schema changed: cache checks box.internal.schema_version changes

This patch does not serve hot reload techniques.
This entails an on_replace trigger duplication if hot reload occurs.
Hot reload support will be done in separate task:
#87

Benchmarks - 10000000 bucket_id() calls (file test/bench_cache.lua):
Baseline (no DDL):     3.38s user 0.01s system 99% cpu 3.389 total
After this patch:
DDL with function body: 3.81s user 0.01s system 99% cpu 3.818 total
DDL with function name: 5.49s user 0.00s system 99% cpu 5.495 total
Before patch:
DDL with function body: 55.95s user 0.40s system 99% cpu 56.354 total
DDL with function name: 13.68s user 0.13s system 99% cpu 13.807 total

Closes #82
0x501D added a commit that referenced this issue Jan 26, 2022
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.

Cache contains:
raw_tuple - raw sharding metadata, used for get_schema()
parsed_func_name - parsed dot notation (like {'foo', 'bar'})
callable - function ready to call, this offloads using of loadstring()
error - string with an error: not nil only if setting callable fails

Cache will be rebuilded if:
* _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace
  trigger
* schema changed: cache checks box.internal.schema_version changes

This patch does not serve hot reload techniques.
This entails an on_replace trigger duplication if hot reload occurs.
Hot reload support will be done in separate task:
#87

Benchmarks - 10000000 bucket_id() calls (file test/bench_cache.lua):
Baseline (no DDL):     3.38s user 0.01s system 99% cpu 3.389 total
After this patch:
DDL with function body: 3.81s user 0.01s system 99% cpu 3.818 total
DDL with function name: 5.49s user 0.00s system 99% cpu 5.495 total
Before patch:
DDL with function body: 55.95s user 0.40s system 99% cpu 56.354 total
DDL with function name: 13.68s user 0.13s system 99% cpu 13.807 total

Closes #82
Totktonada pushed a commit that referenced this issue Jan 31, 2022
The ddl.bucket_id() function needs to know a sharding function.
It is costly to obtain the function declaration /
definition stored in the _ddl_sharding_func space.

Cache contains:
raw_tuple - raw sharding metadata, used for get_schema()
parsed_func_name - parsed dot notation (like {'foo', 'bar'})
callable - function ready to call, this offloads using of loadstring()
error - string with an error: not nil only if setting callable fails

Cache will be rebuilded if:
* _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace
  trigger
* schema changed: cache checks box.internal.schema_version changes

This patch does not serve hot reload techniques.
This entails an on_replace trigger duplication if hot reload occurs.
Hot reload support will be done in separate task:
#87

Benchmarks - 10000000 bucket_id() calls (file test/bench_cache.lua):
Baseline (no DDL):     3.38s user 0.01s system 99% cpu 3.389 total
After this patch:
DDL with function body: 3.81s user 0.01s system 99% cpu 3.818 total
DDL with function name: 5.49s user 0.00s system 99% cpu 5.495 total
Before patch:
DDL with function body: 55.95s user 0.40s system 99% cpu 56.354 total
DDL with function name: 13.68s user 0.13s system 99% cpu 13.807 total

Closes #82
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature A new functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants