-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a method to provide statistics about CRUD operations #224
Comments
I don't want to lean on box.stat() information here, because I don't control all iproto calls to storages: say, vshard rebalancer may perform them in background. Instead, I wrapped particular storage function I'm interested in. The goal is to be able to determine how much storages are involved into a select/pairs request. It is implemented as a helper for testing, but hopefully we'll implement some nice statistics as part of the module in a future (see #224). Part of #220
I don't want to lean on box.stat() information here, because I don't control all iproto calls to storages: say, vshard rebalancer may perform them in background. Instead, I wrapped particular storage function I'm interested in. The goal is to be able to determine how much storages are involved into a select/pairs request. It is implemented as a helper for testing, but hopefully we'll implement some nice statistics as part of the module in a future (see #224). Part of #220
NB: Whatever we'll implement, add it into default metrics. |
Raw thoughts about implementation. We can decouple statistics implementation from main crud code for readability reasons. We should provide ability to set a callback on storage and on router, which is called before/after execution a function that serves insert/delete/select/etc (or one callback for all of them with a types passed to the function). A return value of 'before' callback should be passed to 'after' one, so we can store startup time and calculate a request duration. Or a context table should be created per request and passed to 'before' and 'after' callbacks both. After this we can place all statistics code in its own module within crud or even implement it entirely in metrics. I would prefer the former, because it may be useful to inspect crud's statistics interactively from tarantool console (say, if monitoring is not configured for the application). Whether we need to pass statistics information from storage to router together with a response? If all nodes are in monitoring, it does not look required. |
One of the main questions is how There are two possible options:
The plan is to support crud statistics, such as requests count and requests latency. If requests count can be manually organized with a plain So I think the answer is
|
The other one is how to enable/disable metrics for
Since there are already "optional" groups of default metrics (like cartridge metrics or luajit metrics) that works if they can work (there is cartridge/Tarantool have required version), I don't think there will be any problems with second one. Both solutions should work fine, but both integrations should be discussed with Considering the handles, we can describe them in |
Considering sending info between storages and routers, I don't think it is necessary. We can measure full time of request execution on router without getting info from storage, since each request starts and finishes on a router. I see the following model:
This info then could be visualized separately. So we may analyze
|
Add statistics module for collecting metrics of CRUD operations on router. Wrap all CRUD operation calls in statistics collector. Statistics may be disabled and re-enabled. Some internal methods of select/pairs were reworked or extended to provide statistics info. `cursor` returned from storage on select/pairs now contains stats of tuple count and lookup count. All changes are backward-compatible and should work even with older versions of crud routers and storages. Part of #224
Use in-built `crud.stats()` info instead on `storage_stat` helper in tests to track map reduce calls. Part of #224
Add statistics module for collecting metrics of CRUD operations on router. Wrap all CRUD operation calls in statistics collector. Statistics may be disabled and re-enabled. Some internal methods of select/pairs were reworked or extended to provide statistics info. `cursor` returned from storage on select/pairs now contains stats of tuple count and lookup count. All changes are backward-compatible and should work even with older versions of crud routers and storages. Part of #224
Use in-built `crud.stats()` info instead on `storage_stat` helper in tests to track map reduce calls. Part of #224
Add statistics module for collecting metrics of CRUD operations on router. Wrap all CRUD operation calls in statistics collector. Statistics may be disabled and re-enabled. Some internal methods of select/pairs were reworked or extended to provide statistics info. `cursor` returned from storage on select/pairs now contains stats of tuple count and lookup count. All changes are backward-compatible and should work even with older versions of crud routers and storages. Part of #224
Use in-built `crud.stats()` info instead on `storage_stat` helper in tests to track map reduce calls. Part of #224
Add statistics module for collecting metrics of CRUD operations on router. Wrap all CRUD operation calls in statistics collector. Statistics may be disabled and re-enabled. Some internal methods of select/pairs were reworked or extended to provide statistics info. `cursor` returned from storage on select/pairs now contains stats of tuple count and lookup count. All changes are backward-compatible and should work between older versions of crud routers or storages. Part of #224
Use in-built `crud.stats()` info instead on `storage_stat` helper in tests to track map reduce calls. Part of #224
If `metrics` [1] found, metrics collectors are used to store statistics. It is required to use `>= 0.5.0`, while at least `0.9.0` is recommended to support age buckets in summary. The metrics are part of global registry and can be exported together (e.g. to Prometheus) with default tools without any additional configuration. Disabling stats destroys the collectors. If `metrics` found, `latency` statistics are changed to 0.99 quantile of request execution time (with aging). Add CI matrix to run tests with `metrics` installed. 1. https://github.com/tarantool/metrics Closes #224
Since introducing CI matrix for `metrics` install, there are more than one pipeline that should be taken into consideration while computing coverage. See more in documentation [1]. 1. https://docs.coveralls.io/parallel-build-webhook Follows up #224
Since introducing CI matrix for `metrics` install, there are more than one pipeline that should be taken into consideration while computing coverage. See more in documentation [1]. 1. https://docs.coveralls.io/parallel-build-webhook Follows up #224
If `metrics` [1] found, metrics collectors are used to store statistics. It is required to use `>= 0.5.0`, while at least `0.9.0` is recommended to support age buckets in summary. The metrics are part of global registry and can be exported together (e.g. to Prometheus) with default tools without any additional configuration. Disabling stats destroys the collectors. If `metrics` found, `latency` statistics are changed to 0.99 quantile of request execution time (with aging). Add CI matrix to run tests with `metrics` installed. 1. https://github.com/tarantool/metrics Closes #224
Since introducing CI matrix for `metrics` install, there are more than one pipeline that should be taken into consideration while computing coverage. See more in documentation [1]. 1. https://docs.coveralls.io/parallel-build-webhook Follows up #224
If `metrics` [1] found, metrics collectors are used to store statistics. It is required to use `>= 0.5.0`, while at least `0.9.0` is recommended to support age buckets in summary. The metrics are part of global registry and can be exported together (e.g. to Prometheus) with default tools without any additional configuration. Disabling stats destroys the collectors. If `metrics` found, `latency` statistics are changed to 0.99 quantile of request execution time (with aging). Add CI matrix to run tests with `metrics` installed. 1. https://github.com/tarantool/metrics Closes #224
Since introducing CI matrix for `metrics` install, there are more than one pipeline that should be taken into consideration while computing coverage. See more in documentation [1]. 1. https://docs.coveralls.io/parallel-build-webhook Follows up #224
If `metrics` [1] found, metrics collectors are used to store statistics. It is required to use `>= 0.5.0`, while at least `0.9.0` is recommended to support age buckets in summary. The metrics are part of global registry and can be exported together (e.g. to Prometheus) with default tools without any additional configuration. Disabling stats destroys the collectors. If `metrics` found, `latency` statistics are changed to 0.99 quantile of request execution time (with aging). Add CI matrix to run tests with `metrics` installed. 1. https://github.com/tarantool/metrics Closes #224
Since introducing CI matrix for `metrics` install, there are more than one pipeline that should be taken into consideration while computing coverage. See more in documentation [1]. 1. https://docs.coveralls.io/parallel-build-webhook Follows up #224
If `metrics` [1] found, metrics collectors are used to store statistics. It is required to use `>= 0.5.0`, while at least `0.9.0` is recommended to support age buckets in summary. The metrics are part of global registry and can be exported together (e.g. to Prometheus) with default tools without any additional configuration. Disabling stats destroys the collectors. If `metrics` found, `latency` statistics are changed to 0.99 quantile of request execution time (with aging). Add CI matrix to run tests with `metrics` installed. 1. https://github.com/tarantool/metrics Closes #224
Before this patch, performance tests ran together with unit and integration with `--coverage` flag. Coverage analysis cropped the result of performance tests to 10-15 times. For metrics integration it resulted in timeout errors and drop of performance which is not reproduces with coverage disabled. Moreover, before this patch log capture was disabled and performance tests did not displayed any results after run. Now performance tests also run is separate CI job. After this patch, `make -C build coverage` will run lightweight version of performance test. `make -C build performance` will run real performance tests. You can paste output table to GitHub [1]. This path also reworks current performance test. It adds new cases to compare module performance with or without statistics, statistic wrappers and compare different metrics drivers and reports new info: average call time and max call time. Performance test result: overhead is 3-10% in case of `local` driver and 5-15% in case of `metrics` driver, up to 20% for `metrics` with quantiles. Based on several runs on HP ProBook 440 G7 i7/16Gb/256SSD. 1. https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/organizing-information-with-tables Closes #233, follows up #224
If `metrics` [1] found, you can use metrics collectors to store statistics. `metrics >= 0.10.0` is required to use metrics driver. (`metrics >= 0.9.0` is required to use summary quantiles with age buckets. `metrics >= 0.5.0, < 0.9.0` is unsupported due to quantile overflow bug [2]. `metrics == 0.9.0` has bug that do not permits to create summary collector without quantiles [3]. In fact, user may use `metrics >= 0.5.0`, `metrics != 0.9.0` if he wants to use metrics without quantiles, and `metrics >= 0.9.0` if he wants to use metrics with quantiles. But this is confusing, so let's use a single restriction for both cases.) The metrics are part of global registry and can be exported together (e.g. to Prometheus) with default tools without any additional configuration. Disabling stats destroys the collectors. Metrics collectors are used by default if supported. To explicitly set driver, call `crud.cfg{ stats = true, stats_driver = driver }` ('local' or 'metrics'). To enable quantiles, call ``` crud.cfg{ stats = true, stats_driver = 'metrics', stats_quantiles = true, } ``` With quantiles, `latency` statistics are changed to 0.99 quantile of request execution time (with aging). Quantiles computations increases performance overhead up to 10% when used in statistics. Add CI matrix to run tests with `metrics` installed. To get full coverage on coveralls, #248 must be resolved. 1. https://github.com/tarantool/metrics 2. tarantool/metrics#235 3. tarantool/metrics#262 Closes #224
Before this patch, performance tests ran together with unit and integration with `--coverage` flag. Coverage analysis cropped the result of performance tests to 10-15 times. For metrics integration it resulted in timeout errors and drop of performance which is not reproduces with coverage disabled. Moreover, before this patch log capture was disabled and performance tests did not displayed any results after run. Now performance tests also run is separate CI job. After this patch, `make -C build coverage` will run lightweight version of performance test. `make -C build performance` will run real performance tests. You can paste output table to GitHub [1]. This path also reworks current performance test. It adds new cases to compare module performance with or without statistics, statistic wrappers and compare different metrics drivers and reports new info: average call time and max call time. Performance test result: overhead is 3-10% in case of `local` driver and 5-15% in case of `metrics` driver, up to 20% for `metrics` with quantiles. Based on several runs on HP ProBook 440 G7 i7/16Gb/256SSD. 1. https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/organizing-information-with-tables Closes #233, follows up #224
If `metrics` [1] found, you can use metrics collectors to store statistics. `metrics >= 0.10.0` is required to use metrics driver. (`metrics >= 0.9.0` is required to use summary quantiles with age buckets. `metrics >= 0.5.0, < 0.9.0` is unsupported due to quantile overflow bug [2]. `metrics == 0.9.0` has bug that do not permits to create summary collector without quantiles [3]. In fact, user may use `metrics >= 0.5.0`, `metrics != 0.9.0` if he wants to use metrics without quantiles, and `metrics >= 0.9.0` if he wants to use metrics with quantiles. But this is confusing, so let's use a single restriction for both cases.) The metrics are part of global registry and can be exported together (e.g. to Prometheus) with default tools without any additional configuration. Disabling stats destroys the collectors. Metrics collectors are used by default if supported. To explicitly set driver, call `crud.cfg{ stats = true, stats_driver = driver }` ('local' or 'metrics'). To enable quantiles, call ``` crud.cfg{ stats = true, stats_driver = 'metrics', stats_quantiles = true, } ``` With quantiles, `latency` statistics are changed to 0.99 quantile of request execution time (with aging). Quantiles computations increases performance overhead up to 10% when used in statistics. Add CI matrix to run tests with `metrics` installed. To get full coverage on coveralls, #248 must be resolved. 1. https://github.com/tarantool/metrics 2. tarantool/metrics#235 3. tarantool/metrics#262 Closes #224
Before this patch, performance tests ran together with unit and integration with `--coverage` flag. Coverage analysis cropped the result of performance tests to 10-15 times. For metrics integration it resulted in timeout errors and drop of performance which is not reproduces with coverage disabled. Moreover, before this patch log capture was disabled and performance tests did not displayed any results after run. Now performance tests also run is separate CI job. After this patch, `make -C build coverage` will run lightweight version of performance test. `make -C build performance` will run real performance tests. You can paste output table to GitHub [1]. This path also reworks current performance test. It adds new cases to compare module performance with or without statistics, statistic wrappers and compare different metrics drivers and reports new info: average call time and max call time. Performance test result: overhead is 3-10% in case of `local` driver and 5-15% in case of `metrics` driver, up to 20% for `metrics` with quantiles. Based on several runs on HP ProBook 440 G7 i7/16Gb/256SSD. 1. https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/organizing-information-with-tables Closes #233, follows up #224
After this patch, statistics `select` section additionally contains `details` collectors. ``` crud.stats('my_space').select.details --- - map_reduces: 4 tuples_fetched: 10500 tuples_lookup: 238000 ... ``` `map_reduces` is the count of planned map reduces (including those not executed successfully). `tuples_fetched` is the count of tuples fetched from storages during execution, `tuples_lookup` is the count of tuples looked up on storages while collecting responses for calls (including scrolls for multibatch requests). Details data is updated as part of the request process, so you may get new details before `select`/`pairs` call is finished and observed with count, latency and time collectors. Part of #224
Use in-built `crud.stats()` info instead on `storage_stat` helper in tests to track map reduce calls. Part of #224
If `metrics` [1] found, you can use metrics collectors to store statistics. `metrics >= 0.10.0` is required to use metrics driver. (`metrics >= 0.9.0` is required to use summary quantiles with age buckets. `metrics >= 0.5.0, < 0.9.0` is unsupported due to quantile overflow bug [2]. `metrics == 0.9.0` has bug that do not permits to create summary collector without quantiles [3]. In fact, user may use `metrics >= 0.5.0`, `metrics != 0.9.0` if he wants to use metrics without quantiles, and `metrics >= 0.9.0` if he wants to use metrics with quantiles. But this is confusing, so let's use a single restriction for both cases.) The metrics are part of global registry and can be exported together (e.g. to Prometheus) with default tools without any additional configuration. Disabling stats destroys the collectors. Metrics collectors are used by default if supported. To explicitly set driver, call `crud.cfg{ stats = true, stats_driver = driver }` ('local' or 'metrics'). To enable quantiles, call ``` crud.cfg{ stats = true, stats_driver = 'metrics', stats_quantiles = true, } ``` With quantiles, `latency` statistics are changed to 0.99 quantile of request execution time (with aging). Quantiles computations increases performance overhead up to 10% when used in statistics. Add CI matrix to run tests with `metrics` installed. To get full coverage on coveralls, #248 must be resolved. 1. https://github.com/tarantool/metrics 2. tarantool/metrics#235 3. tarantool/metrics#262 Closes #224
Before this patch, performance tests ran together with unit and integration with `--coverage` flag. Coverage analysis cropped the result of performance tests to 10-15 times. For metrics integration it resulted in timeout errors and drop of performance which is not reproduces with coverage disabled. Moreover, before this patch log capture was disabled and performance tests did not displayed any results after run. Now performance tests also run is separate CI job. After this patch, `make -C build coverage` will run lightweight version of performance test. `make -C build performance` will run real performance tests. You can paste output table to GitHub [1]. This path also reworks current performance test. It adds new cases to compare module performance with or without statistics, statistic wrappers and compare different metrics drivers and reports new info: average call time and max call time. Performance test result: overhead is 3-10% in case of `local` driver and 5-15% in case of `metrics` driver, up to 20% for `metrics` with quantiles. Based on several runs on HP ProBook 440 G7 i7/16Gb/256SSD. 1. https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/organizing-information-with-tables Closes #233, follows up #224
After this patch, statistics `select` section additionally contains `details` collectors. ``` crud.stats('my_space').select.details --- - map_reduces: 4 tuples_fetched: 10500 tuples_lookup: 238000 ... ``` `map_reduces` is the count of planned map reduces (including those not executed successfully). `tuples_fetched` is the count of tuples fetched from storages during execution, `tuples_lookup` is the count of tuples looked up on storages while collecting responses for calls (including scrolls for multibatch requests). Details data is updated as part of the request process, so you may get new details before `select`/`pairs` call is finished and observed with count, latency and time collectors. Part of #224
Use in-built `crud.stats()` info instead on `storage_stat` helper in tests to track map reduce calls. Part of #224
If `metrics` [1] found, you can use metrics collectors to store statistics. `metrics >= 0.10.0` is required to use metrics driver. (`metrics >= 0.9.0` is required to use summary quantiles with age buckets. `metrics >= 0.5.0, < 0.9.0` is unsupported due to quantile overflow bug [2]. `metrics == 0.9.0` has bug that do not permits to create summary collector without quantiles [3]. In fact, user may use `metrics >= 0.5.0`, `metrics != 0.9.0` if he wants to use metrics without quantiles, and `metrics >= 0.9.0` if he wants to use metrics with quantiles. But this is confusing, so let's use a single restriction for both cases.) The metrics are part of global registry and can be exported together (e.g. to Prometheus) with default tools without any additional configuration. Disabling stats destroys the collectors. Metrics collectors are used by default if supported. To explicitly set driver, call `crud.cfg{ stats = true, stats_driver = driver }` ('local' or 'metrics'). To enable quantiles, call ``` crud.cfg{ stats = true, stats_driver = 'metrics', stats_quantiles = true, } ``` With quantiles, `latency` statistics are changed to 0.99 quantile of request execution time (with aging). Quantiles computations increases performance overhead up to 10% when used in statistics. Add CI matrix to run tests with `metrics` installed. To get full coverage on coveralls, #248 must be resolved. 1. https://github.com/tarantool/metrics 2. tarantool/metrics#235 3. tarantool/metrics#262 Closes #224
Before this patch, performance tests ran together with unit and integration with `--coverage` flag. Coverage analysis cropped the result of performance tests to 10-15 times. For metrics integration it resulted in timeout errors and drop of performance which is not reproduces with coverage disabled. Moreover, before this patch log capture was disabled and performance tests did not displayed any results after run. Now performance tests also run is separate CI job. After this patch, `make -C build coverage` will run lightweight version of performance test. `make -C build performance` will run real performance tests. You can paste output table to GitHub [1]. This path also reworks current performance test. It adds new cases to compare module performance with or without statistics, statistic wrappers and compare different metrics drivers and reports new info: average call time and max call time. Performance test result: overhead is 3-10% in case of `local` driver and 5-15% in case of `metrics` driver, up to 20% for `metrics` with quantiles. Based on several runs on HP ProBook 440 G7 i7/16Gb/256SSD. 1. https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/organizing-information-with-tables Closes #233, follows up #224
Add statistics module for collecting metrics of CRUD operations on router. Wrap all CRUD operation calls in the statistics collector. Statistics must be enabled manually with `crud.cfg`. They can be disabled, restarted or re-enabled later. This patch introduces `crud.cfg`. `crud.cfg` is a tool to set module configuration. It is similar to Tarantool `box.cfg`, although we don't need to call it to bootstrap the module -- it is used only to change configuration. `crud.cfg` is a callable table. To change configuration, call it: `crud.cfg{ stats = true }`. You can check table contents as with ordinary table, but do not change them directly -- use call instead. Table contents is immutable and use proxy approach (see [1, 2]). Iterating through `crud.cfg` with pairs is not supported yet, refer to #265. `crud.stats()` returns --- - spaces: my_space: insert: ok: latency: 0.002 count: 19800 time: 39.6 error: latency: 0.000001 count: 4 time: 0.000004 ... `spaces` section contains statistics for each observed space. If operation has never been called for a space, the corresponding field will be empty. If no requests has been called for a space, it will not be represented. Space data is based on client requests rather than storages schema, so requests for non-existing spaces are also collected. Possible statistics operation labels are `insert` (for `insert` and `insert_object` calls), `get`, `replace` (for `replace` and `replace_object` calls), `update`, `upsert` (for `upsert` and `upsert_object` calls), `delete`, `select` (for `select` and `pairs` calls), `truncate`, `len`, `count` and `borders` (for `min` and `max` calls). Each operation section consists of different collectors for success calls and error (both error throw and `nil, err`) returns. `count` is the total requests count since instance start or stats restart. `latency` is the average time of requests execution, `time` is the total time of requests execution. Since `pairs` request behavior differs from any other crud request, its statistics collection also has specific behavior. Statistics (`select` section) are updated after `pairs` cycle is finished: you either have iterated through all records or an error was thrown. If your pairs cycle was interrupted with `break`, statistics will be collected when pairs objects are cleaned up with Lua garbage collector. Statistics are preserved between package reloads. Statistics are preserved between Tarantool Cartridge role reloads [3] if CRUD Cartridge roles are used. 1. http://lua-users.org/wiki/ReadOnlyTables 2. tarantool/tarantool#2867 3. https://www.tarantool.io/en/doc/latest/book/cartridge/cartridge_api/modules/cartridge.roles/#reload Part of #224
In some cases LuaJit optimizes using gc_observer table to handle pairs object gc. It had lead to incorrect behavior (ignoring some pairs interrupted with break in stats) and tests fail in some cases (for example, if you run only stats unit tests). Part of #224
After this patch, statistics `select` section additionally contains `details` collectors. ``` crud.stats('my_space').select.details --- - map_reduces: 4 tuples_fetched: 10500 tuples_lookup: 238000 ... ``` `map_reduces` is the count of planned map reduces (including those not executed successfully). `tuples_fetched` is the count of tuples fetched from storages during execution, `tuples_lookup` is the count of tuples looked up on storages while collecting responses for calls (including scrolls for multibatch requests). Details data is updated as part of the request process, so you may get new details before `select`/`pairs` call is finished and observed with count, latency and time collectors. Part of #224
Use in-built `crud.stats()` info instead on `storage_stat` helper in tests to track map reduce calls. Part of #224
If `metrics` [1] found, you can use metrics collectors to store statistics. `metrics >= 0.10.0` is required to use metrics driver. (`metrics >= 0.9.0` is required to use summary quantiles with age buckets. `metrics >= 0.5.0, < 0.9.0` is unsupported due to quantile overflow bug [2]. `metrics == 0.9.0` has bug that do not permits to create summary collector without quantiles [3]. In fact, user may use `metrics >= 0.5.0`, `metrics != 0.9.0` if he wants to use metrics without quantiles, and `metrics >= 0.9.0` if he wants to use metrics with quantiles. But this is confusing, so let's use a single restriction for both cases.) The metrics are part of global registry and can be exported together (e.g. to Prometheus) with default tools without any additional configuration. Disabling stats destroys the collectors. Metrics collectors are used by default if supported. To explicitly set driver, call `crud.cfg{ stats = true, stats_driver = driver }` ('local' or 'metrics'). To enable quantiles, call ``` crud.cfg{ stats = true, stats_driver = 'metrics', stats_quantiles = true, } ``` With quantiles, `latency` statistics are changed to 0.99 quantile of request execution time (with aging). Quantiles computations increases performance overhead up to 10% when used in statistics. Add CI matrix to run tests with `metrics` installed. To get full coverage on coveralls, #248 must be resolved. 1. https://github.com/tarantool/metrics 2. tarantool/metrics#235 3. tarantool/metrics#262 Closes #224
Before this patch, performance tests ran together with unit and integration with `--coverage` flag. Coverage analysis cropped the result of performance tests to 10-15 times. For metrics integration it resulted in timeout errors and drop of performance which is not reproduces with coverage disabled. Moreover, before this patch log capture was disabled and performance tests did not displayed any results after run. Now performance tests also run is separate CI job. After this patch, `make -C build coverage` will run lightweight version of performance test. `make -C build performance` will run real performance tests. You can paste output table to GitHub [1]. This path also reworks current performance test. It adds new cases to compare module performance with or without statistics, statistic wrappers and compare different metrics drivers and reports new info: average call time and max call time. Performance test result: overhead is 3-10% in case of `local` driver and 5-15% in case of `metrics` driver, up to 20% for `metrics` with quantiles. Based on several runs on HP ProBook 440 G7 i7/16Gb/256SSD. 1. https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/organizing-information-with-tables Closes #233, follows up #224
Add statistics module for collecting metrics of CRUD operations on router. Wrap all CRUD operation calls in the statistics collector. Statistics must be enabled manually with `crud.cfg`. They can be disabled, restarted or re-enabled later. This patch introduces `crud.cfg`. `crud.cfg` is a tool to set module configuration. It is similar to Tarantool `box.cfg`, although we don't need to call it to bootstrap the module -- it is used only to change configuration. `crud.cfg` is a callable table. To change configuration, call it: `crud.cfg{ stats = true }`. You can check table contents as with ordinary table, but do not change them directly -- use call instead. Table contents is immutable and use proxy approach (see [1, 2]). Iterating through `crud.cfg` with pairs is not supported yet, refer to #265. `crud.stats()` returns --- - spaces: my_space: insert: ok: latency: 0.002 count: 19800 time: 39.6 error: latency: 0.000001 count: 4 time: 0.000004 ... `spaces` section contains statistics for each observed space. If operation has never been called for a space, the corresponding field will be empty. If no requests has been called for a space, it will not be represented. Space data is based on client requests rather than storages schema, so requests for non-existing spaces are also collected. Possible statistics operation labels are `insert` (for `insert` and `insert_object` calls), `get`, `replace` (for `replace` and `replace_object` calls), `update`, `upsert` (for `upsert` and `upsert_object` calls), `delete`, `select` (for `select` and `pairs` calls), `truncate`, `len`, `count` and `borders` (for `min` and `max` calls). Each operation section consists of different collectors for success calls and error (both error throw and `nil, err`) returns. `count` is the total requests count since instance start or stats restart. `latency` is the average time of requests execution, `time` is the total time of requests execution. Since `pairs` request behavior differs from any other crud request, its statistics collection also has specific behavior. Statistics (`select` section) are updated after `pairs` cycle is finished: you either have iterated through all records or an error was thrown. If your pairs cycle was interrupted with `break`, statistics will be collected when pairs objects are cleaned up with Lua garbage collector. Statistics are preserved between package reloads. Statistics are preserved between Tarantool Cartridge role reloads [3] if CRUD Cartridge roles are used. 1. http://lua-users.org/wiki/ReadOnlyTables 2. tarantool/tarantool#2867 3. https://www.tarantool.io/en/doc/latest/book/cartridge/cartridge_api/modules/cartridge.roles/#reload Part of #224
In some cases LuaJit optimizes using gc_observer table to handle pairs object gc. It had lead to incorrect behavior (ignoring some pairs interrupted with break in stats) and tests fail in some cases (for example, if you run only stats unit tests). Part of #224
After this patch, statistics `select` section additionally contains `details` collectors. ``` crud.stats('my_space').select.details --- - map_reduces: 4 tuples_fetched: 10500 tuples_lookup: 238000 ... ``` `map_reduces` is the count of planned map reduces (including those not executed successfully). `tuples_fetched` is the count of tuples fetched from storages during execution, `tuples_lookup` is the count of tuples looked up on storages while collecting responses for calls (including scrolls for multibatch requests). Details data is updated as part of the request process, so you may get new details before `select`/`pairs` call is finished and observed with count, latency and time collectors. Part of #224
Use in-built `crud.stats()` info instead on `storage_stat` helper in tests to track map reduce calls. Part of #224
If `metrics` [1] found, you can use metrics collectors to store statistics. `metrics >= 0.10.0` is required to use metrics driver. (`metrics >= 0.9.0` is required to use summary quantiles with age buckets. `metrics >= 0.5.0, < 0.9.0` is unsupported due to quantile overflow bug [2]. `metrics == 0.9.0` has bug that do not permits to create summary collector without quantiles [3]. In fact, user may use `metrics >= 0.5.0`, `metrics != 0.9.0` if he wants to use metrics without quantiles, and `metrics >= 0.9.0` if he wants to use metrics with quantiles. But this is confusing, so let's use a single restriction for both cases.) The metrics are part of global registry and can be exported together (e.g. to Prometheus) with default tools without any additional configuration. Disabling stats destroys the collectors. Metrics collectors are used by default if supported. To explicitly set driver, call `crud.cfg{ stats = true, stats_driver = driver }` ('local' or 'metrics'). To enable quantiles, call ``` crud.cfg{ stats = true, stats_driver = 'metrics', stats_quantiles = true, } ``` With quantiles, `latency` statistics are changed to 0.99 quantile of request execution time (with aging). Quantiles computations increases performance overhead up to 10% when used in statistics. Add CI matrix to run tests with `metrics` installed. To get full coverage on coveralls, #248 must be resolved. 1. https://github.com/tarantool/metrics 2. tarantool/metrics#235 3. tarantool/metrics#262 Closes #224
Before this patch, performance tests ran together with unit and integration with `--coverage` flag. Coverage analysis cropped the result of performance tests to 10-15 times. For metrics integration it resulted in timeout errors and drop of performance which is not reproduces with coverage disabled. Moreover, before this patch log capture was disabled and performance tests did not displayed any results after run. Now performance tests also run is separate CI job. After this patch, `make -C build coverage` will run lightweight version of performance test. `make -C build performance` will run real performance tests. You can paste output table to GitHub [1]. This path also reworks current performance test. It adds new cases to compare module performance with or without statistics, statistic wrappers and compare different metrics drivers and reports new info: average call time and max call time. Performance test result: overhead is 3-10% in case of `local` driver and 5-15% in case of `metrics` driver, up to 20% for `metrics` with quantiles. Based on several runs on HP ProBook 440 G7 i7/16Gb/256SSD. 1. https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/organizing-information-with-tables Closes #233, follows up #224
Description
This is a raw idea, but I hope it would be useful for CRUD users.
CRUD perform operations on storage servers, and sometimes we need to know how many operations were done on each storage (in tests for #222 and #166). Probably (it's a hypothesis now) our users need such information too. However, we cannot use
box.stat
on storage servers because CRUD is not a single who uses storage servers, there is at least vshard that can move buckets between servers in background.In this ticket we need to understand
Stat counters in Tarantool
CRUD implements the same interface as in Tarantool (create, replace, update, delete etc) and probably we can inherit stat interface from Tarantool too. See module box.stat:
Stat counters in graphql
Also graphql.0 has a method to provide statistics:
The text was updated successfully, but these errors were encountered: