-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Caching of sharding function #82
Labels
feature
A new functionality
Comments
Alexander Turenko initiated a discussion in #77 regarding to replace of |
0x501D
added a commit
that referenced
this issue
Dec 14, 2021
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. This cache adds sharding function cache divided into two parts: raw and processed. Raw part is used for get_space_schema() method. Raw cache stored as is. Processed part is used for bucket_id(). Closes #82
0x501D
added a commit
that referenced
this issue
Dec 15, 2021
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. This cache adds sharding function cache divided into two parts: raw and processed. Raw part is used for get_space_schema() method. Raw cache stored as is. Processed part is used for bucket_id(). Closes #82
0x501D
added a commit
that referenced
this issue
Dec 15, 2021
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. This cache adds sharding function cache divided into two parts: raw and processed. Raw part is used for get_space_schema() method. Raw cache stored as is. Processed part is used for bucket_id(). Closes #82
0x501D
added a commit
that referenced
this issue
Dec 16, 2021
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. This cache adds sharding function cache divided into two parts: raw and processed. Raw part is used for get_schema() method. Raw cache stored as is. Processed part is used for bucket_id(). Processed sharding_func cache entry may be: * table with parsed dot notation (like {'foo', 'bar'}) * function ready to call, this offloads using of loadstring() * string with an error Cache will be rebuilded if: * _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace trigger * schema changed: cache checks box.internal.schema_version changes This patch does not server hot reload techniques. This entails an on_replace trigger duplication if hot reload occures. Hot reload support will be done in separate task. Closes #82
0x501D
added a commit
that referenced
this issue
Dec 16, 2021
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. This cache adds sharding function cache divided into two parts: raw and processed. Raw part is used for get_schema() method. Raw cache stored as is. Processed part is used for bucket_id(). Processed sharding_func cache entry may be: * table with parsed dot notation (like {'foo', 'bar'}) * function ready to call, this offloads using of loadstring() * string with an error Cache will be rebuilded if: * _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace trigger * schema changed: cache checks box.internal.schema_version changes This patch does not serve hot reload techniques. This entails an on_replace trigger duplication if hot reload occurs. Hot reload support will be done in separate task. Closes #82
0x501D
added a commit
that referenced
this issue
Dec 16, 2021
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. This cache adds sharding function cache divided into two parts: raw and processed. Raw part is used for get_schema() method. Raw cache stored as is. Processed part is used for bucket_id(). Processed sharding_func cache entry may be: * table with parsed dot notation (like {'foo', 'bar'}) * function ready to call, this offloads using of loadstring() * string with an error Cache will be rebuilded if: * _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace trigger * schema changed: cache checks box.internal.schema_version changes This patch does not serve hot reload techniques. This entails an on_replace trigger duplication if hot reload occurs. Hot reload support will be done in separate task: #87 Closes #82
0x501D
added a commit
that referenced
this issue
Dec 23, 2021
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. This cache adds sharding function cache divided into two parts: raw and processed. Raw part is used for get_schema() method. Raw cache stored as is. Processed part is used for bucket_id(). Processed sharding_func cache entry may be: * table with parsed dot notation (like {'foo', 'bar'}) * function ready to call, this offloads using of loadstring() * string with an error Cache will be rebuilded if: * _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace trigger * schema changed: cache checks box.internal.schema_version changes This patch does not serve hot reload techniques. This entails an on_replace trigger duplication if hot reload occurs. Hot reload support will be done in separate task: #87 Closes #82
0x501D
added a commit
that referenced
this issue
Dec 27, 2021
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. This cache adds sharding function cache divided into two parts: raw and processed. Raw part is used for get_schema() method. Raw cache stored as is. Processed part is used for bucket_id(). Processed sharding_func cache entry may be: * table with parsed dot notation (like {'foo', 'bar'}) * function ready to call, this offloads using of loadstring() * string with an error Cache will be rebuilded if: * _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace trigger * schema changed: cache checks box.internal.schema_version changes This patch does not serve hot reload techniques. This entails an on_replace trigger duplication if hot reload occurs. Hot reload support will be done in separate task: #87 Closes #82
0x501D
added a commit
that referenced
this issue
Dec 28, 2021
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. This cache adds sharding function cache divided into two parts: raw and processed. Raw part is used for get_schema() method. Raw cache stored as is. Processed part is used for bucket_id(). Processed sharding_func cache entry may be: * table with parsed dot notation (like {'foo', 'bar'}) * function ready to call, this offloads using of loadstring() * string with an error Cache will be rebuilded if: * _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace trigger * schema changed: cache checks box.internal.schema_version changes This patch does not serve hot reload techniques. This entails an on_replace trigger duplication if hot reload occurs. Hot reload support will be done in separate task: #87 Closes #82
0x501D
added a commit
that referenced
this issue
Dec 30, 2021
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. This cache adds sharding function cache divided into two parts: raw and processed. Raw part is used for get_schema() method. Raw cache stored as is. Processed part is used for bucket_id(). Processed sharding_func cache entry may be: * table with parsed dot notation (like {'foo', 'bar'}) * function ready to call, this offloads using of loadstring() * string with an error Cache will be rebuilded if: * _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace trigger * schema changed: cache checks box.internal.schema_version changes This patch does not serve hot reload techniques. This entails an on_replace trigger duplication if hot reload occurs. Hot reload support will be done in separate task: #87 Closes #82
0x501D
added a commit
that referenced
this issue
Dec 30, 2021
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. This cache adds sharding function cache divided into two parts: raw and processed. Raw part is used for get_schema() method. Raw cache stored as is. Processed part is used for bucket_id(). Processed sharding_func cache entry may be: * table with parsed dot notation (like {'foo', 'bar'}) * function ready to call, this offloads using of loadstring() * string with an error Cache will be rebuilded if: * _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace trigger * schema changed: cache checks box.internal.schema_version changes This patch does not serve hot reload techniques. This entails an on_replace trigger duplication if hot reload occurs. Hot reload support will be done in separate task: #87 Closes #82
0x501D
added a commit
that referenced
this issue
Jan 14, 2022
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. This cache adds sharding function cache divided into two parts: raw and processed. Raw part is used for get_schema() method. Raw cache stored as is. Processed part is used for bucket_id(). Processed sharding_func cache entry may be: * table with parsed dot notation (like {'foo', 'bar'}) * function ready to call, this offloads using of loadstring() * string with an error Cache will be rebuilded if: * _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace trigger * schema changed: cache checks box.internal.schema_version changes This patch does not serve hot reload techniques. This entails an on_replace trigger duplication if hot reload occurs. Hot reload support will be done in separate task: #87 Benchmarks - 10000000 bucket_id() calls (file test/bench_cache.lua): Baseline (no DDL): 3.38s user 0.01s system 99% cpu 3.389 total After this patch: DDL with function body: 3.81s user 0.01s system 99% cpu 3.818 total DDL with function name: 5.49s user 0.00s system 99% cpu 5.495 total Before patch: DDL with function body: 55.95s user 0.40s system 99% cpu 56.354 total DDL with function name: 13.68s user 0.13s system 99% cpu 13.807 total Closes #82
0x501D
added a commit
that referenced
this issue
Jan 14, 2022
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. Cache contains: raw_tuple - raw sharding metadata, used for get_schema() parsed_func_name - parsed dot notation (like {'foo', 'bar'}) callable - function ready to call, this offloads using of loadstring() error - string with an error: not nil only if setting callable fails Cache will be rebuilded if: * _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace trigger * schema changed: cache checks box.internal.schema_version changes This patch does not serve hot reload techniques. This entails an on_replace trigger duplication if hot reload occurs. Hot reload support will be done in separate task: #87 Benchmarks - 10000000 bucket_id() calls (file test/bench_cache.lua): Baseline (no DDL): 3.38s user 0.01s system 99% cpu 3.389 total After this patch: DDL with function body: 3.81s user 0.01s system 99% cpu 3.818 total DDL with function name: 5.49s user 0.00s system 99% cpu 5.495 total Before patch: DDL with function body: 55.95s user 0.40s system 99% cpu 56.354 total DDL with function name: 13.68s user 0.13s system 99% cpu 13.807 total Closes #82
This was referenced Jan 20, 2022
0x501D
added a commit
that referenced
this issue
Jan 21, 2022
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. Cache contains: raw_tuple - raw sharding metadata, used for get_schema() parsed_func_name - parsed dot notation (like {'foo', 'bar'}) callable - function ready to call, this offloads using of loadstring() error - string with an error: not nil only if setting callable fails Cache will be rebuilded if: * _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace trigger * schema changed: cache checks box.internal.schema_version changes This patch does not serve hot reload techniques. This entails an on_replace trigger duplication if hot reload occurs. Hot reload support will be done in separate task: #87 Benchmarks - 10000000 bucket_id() calls (file test/bench_cache.lua): Baseline (no DDL): 3.38s user 0.01s system 99% cpu 3.389 total After this patch: DDL with function body: 3.81s user 0.01s system 99% cpu 3.818 total DDL with function name: 5.49s user 0.00s system 99% cpu 5.495 total Before patch: DDL with function body: 55.95s user 0.40s system 99% cpu 56.354 total DDL with function name: 13.68s user 0.13s system 99% cpu 13.807 total Closes #82
0x501D
added a commit
that referenced
this issue
Jan 21, 2022
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. Cache contains: raw_tuple - raw sharding metadata, used for get_schema() parsed_func_name - parsed dot notation (like {'foo', 'bar'}) callable - function ready to call, this offloads using of loadstring() error - string with an error: not nil only if setting callable fails Cache will be rebuilded if: * _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace trigger * schema changed: cache checks box.internal.schema_version changes This patch does not serve hot reload techniques. This entails an on_replace trigger duplication if hot reload occurs. Hot reload support will be done in separate task: #87 Benchmarks - 10000000 bucket_id() calls (file test/bench_cache.lua): Baseline (no DDL): 3.38s user 0.01s system 99% cpu 3.389 total After this patch: DDL with function body: 3.81s user 0.01s system 99% cpu 3.818 total DDL with function name: 5.49s user 0.00s system 99% cpu 5.495 total Before patch: DDL with function body: 55.95s user 0.40s system 99% cpu 56.354 total DDL with function name: 13.68s user 0.13s system 99% cpu 13.807 total Closes #82
0x501D
added a commit
that referenced
this issue
Jan 24, 2022
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. Cache contains: raw_tuple - raw sharding metadata, used for get_schema() parsed_func_name - parsed dot notation (like {'foo', 'bar'}) callable - function ready to call, this offloads using of loadstring() error - string with an error: not nil only if setting callable fails Cache will be rebuilded if: * _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace trigger * schema changed: cache checks box.internal.schema_version changes This patch does not serve hot reload techniques. This entails an on_replace trigger duplication if hot reload occurs. Hot reload support will be done in separate task: #87 Benchmarks - 10000000 bucket_id() calls (file test/bench_cache.lua): Baseline (no DDL): 3.38s user 0.01s system 99% cpu 3.389 total After this patch: DDL with function body: 3.81s user 0.01s system 99% cpu 3.818 total DDL with function name: 5.49s user 0.00s system 99% cpu 5.495 total Before patch: DDL with function body: 55.95s user 0.40s system 99% cpu 56.354 total DDL with function name: 13.68s user 0.13s system 99% cpu 13.807 total Closes #82
0x501D
added a commit
that referenced
this issue
Jan 26, 2022
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. Cache contains: raw_tuple - raw sharding metadata, used for get_schema() parsed_func_name - parsed dot notation (like {'foo', 'bar'}) callable - function ready to call, this offloads using of loadstring() error - string with an error: not nil only if setting callable fails Cache will be rebuilded if: * _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace trigger * schema changed: cache checks box.internal.schema_version changes This patch does not serve hot reload techniques. This entails an on_replace trigger duplication if hot reload occurs. Hot reload support will be done in separate task: #87 Benchmarks - 10000000 bucket_id() calls (file test/bench_cache.lua): Baseline (no DDL): 3.38s user 0.01s system 99% cpu 3.389 total After this patch: DDL with function body: 3.81s user 0.01s system 99% cpu 3.818 total DDL with function name: 5.49s user 0.00s system 99% cpu 5.495 total Before patch: DDL with function body: 55.95s user 0.40s system 99% cpu 56.354 total DDL with function name: 13.68s user 0.13s system 99% cpu 13.807 total Closes #82
Totktonada
pushed a commit
that referenced
this issue
Jan 31, 2022
The ddl.bucket_id() function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the _ddl_sharding_func space. Cache contains: raw_tuple - raw sharding metadata, used for get_schema() parsed_func_name - parsed dot notation (like {'foo', 'bar'}) callable - function ready to call, this offloads using of loadstring() error - string with an error: not nil only if setting callable fails Cache will be rebuilded if: * _ddl_sharding_func space changed: cache sets _ddl_sharding_func:on_replace trigger * schema changed: cache checks box.internal.schema_version changes This patch does not serve hot reload techniques. This entails an on_replace trigger duplication if hot reload occurs. Hot reload support will be done in separate task: #87 Benchmarks - 10000000 bucket_id() calls (file test/bench_cache.lua): Baseline (no DDL): 3.38s user 0.01s system 99% cpu 3.389 total After this patch: DDL with function body: 3.81s user 0.01s system 99% cpu 3.818 total DDL with function name: 5.49s user 0.00s system 99% cpu 5.495 total Before patch: DDL with function body: 55.95s user 0.40s system 99% cpu 56.354 total DDL with function name: 13.68s user 0.13s system 99% cpu 13.807 total Closes #82
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
(There is more general task about caching #69, but I want to eat this pie piece by piece.)
We're going to add
ddl.bucket_id()
function (see #76). The function may be called quite frequently, so it worth to take care to its performance.The
ddl.bucket_id()
function needs to know a sharding function. It is costly to obtain the function declaration / definition stored in the_ddl_sharding_func
space, mainly due to those actions:loadstring()
is the function is declared as code ({body = <...>}
).Ideally obtaining of the function should be just Lua table lookup. And it is possible to achieve.
The only way to track
_ddl_sharding_func
changes is to set a trigger on the space to track modifications (on_replace)1. Since it is not always possible to set a trigger when the module is just loaded2, I propose a trick.The key idea is to generate an initial cache value and set the trigger when we access the sharding function information first time. After this the cache will be updated 'in background' (by the trigger) and we can just access the cache.
What to consider:
loadstring()
when the function is defined as code (to don't do that each time).'dot.notation'
into something like{'dot', 'notation'}
.Optimization trick:
We can use the trick with two implementations of the cache access function (see src/box/lua/load_cfg.lua in tarantool for example). The first function doing all the work: check whether the trigger is set (and the initial cache is generated), set the trigger and generate the cache if necessary, access cache, replace itself with the second function. The second function skips extra checks and just access the cache.
I'll note that the first function must not set the trigger unconditionally, because it may be called after hot reload. See tarantool/tarantool#5826.
Looks a bit tricky, but doable. Opinions?
Footnotes
I filed https://github.com/tarantool/tarantool/issues/6544 and https://github.com/tarantool/tarantool/issues/6545 to track it using the database schema version in a future. I think it may simplify some future code. ↩
box may be unconfigured (or not fully loaded), when the module is
require
d first time. ↩The text was updated successfully, but these errors were encountered: