From ad556cc434ee723eb2d817af531eedc6e0fb2f2c Mon Sep 17 00:00:00 2001 From: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Date: Wed, 9 Sep 2020 16:08:36 +0300 Subject: [PATCH 01/13] Tablet throttler documentation (references/features) Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> --- .../reference/features/tablet-throttler.md | 148 ++++++++++++++++++ 1 file changed, 148 insertions(+) create mode 100644 content/en/docs/reference/features/tablet-throttler.md diff --git a/content/en/docs/reference/features/tablet-throttler.md b/content/en/docs/reference/features/tablet-throttler.md new file mode 100644 index 000000000..e1ab8377a --- /dev/null +++ b/content/en/docs/reference/features/tablet-throttler.md @@ -0,0 +1,148 @@ +--- +title: Tablet throttler +aliases: ['/docs/user-guides/tablet-throttler/','/docs/reference/tablet-throttler/'] +--- + +VTTablet runs a cooperative throttling service, that probes the shard's MySQL topology and observes replication lag on servers. This throttler is derived from GitHub's [freno](https://github.com/github/freno). + +## Why throttler + +Vitess uses MySQL with asynchronous or semi-synchronous replication. In these modes, the `primary` applies changes and logs them to the binary log. The replicas will get binary log entires from the primary, potentially acknowledge getting them, and apply them in their own good time. A running replica normally applies the entires as soon as possile, unless it's stopped or configured to delay. However, if the replica is busy (e.g. by serving traffic), then it may not have the resources (disk IO, CPU) to apply events in a timely fashion, and can therefore start lagging. + +Maintaining low replication lag is important in production: + +- A lagging replica does not represent well the data on the `primary`. Reads from the replica reflect data that is not consistent with the master's. This is noticeable on web services where read-after-write can produce results not reflecting the write. +- An up-to-date replica makes for a good failover experience. If all replicas are lagging, then a failover process must choose between waiting for a replica to catch up, or losing data. + +Some common database operations include mass writes to the database: + +- Online schema migrations, duplicating entire tables. +- Mass population of columns (e.g. following a `ADD COLUMN` migration, populate the new column with derived value). +- Purging of old data. +- Purging of tables as part of safe table `DROP` operation. + +These operations can easily incur replication lag. However, these operations are typically not time limited. It is possible to spread them a bit so as to reduce database load. + +This is where a throttler gets in. A throttler can tell "replication lag is low, cluster is healthy, go ahead and do some work" or it may say "replication lag is high, please hold your next operation". + +Applications are expected to break down their tasks into small sub-tasks (e.g. instead of deleting `1,000,000` rows, only delete `50` at a time), and check in with the throttler in between. + +The throttler is to be used by such apps only. It should not be used for ongoing, normal OLTP queries (aka transactional workload). + +## Throttler overview + +Each `vttablet` runs an internal throttler service, and provides API endpoints to the throttler. Only the `primary` throttler is doing actual work, though. The throttlers on the replicas are mostly dormant, and wait for their turn to become "leaders", i.e. for the tablet to transition into `MASTER` type. + +The `primary` tablet's throttler does the following things, continuously: + +- Confirm it's still the primary tablet for its shard. +- Every `10sec`, use topology server to refresh the shard's tablets listing +- Probe all `REPLICA` tablets for their replication lag. This is done by querying `_vt.heartbeat` table. + - Throttler begins in dormant probe mode. As long as no app/client is actually looking for metrics, it probes the servers in multi-second interval. + - When apps check for throttle advice, it begins probing servers in subsecond intervals. It reverts to dormant probe mode if no requests are made in the duration of `1min`. +- Aggregate last probed value from all relevant tablets; this is _the cluster's metric _. + +The cluster's metric is only as accurate as: + +- The probe interval, +- The heartbeat injection interval, and +- The aggregation interval + +The error margin is about the sum of the above values, plus additional overhead. Default probe interval is `100ms`, aggregatoin interval is `100ms` and default heartbeat interval is `250ms`. The latter may be overriden by the user via `-heartbeat_interval` flag to `vttablet`. + +Thus, the aggregated interval can be off, by default, by some `500ms`. This makes it inaccurate for evaluations that require high resolution lag evaluation. Fortunately, for throttling purposes, this resolution is fine. + + +The throttler allows clients/apps to `check` for throttle advice. The check is a `HTTP` request, `HEAD` or `GET` method. Throttler returns a HTTP response code as an answer: + +- `200` (OK): Application may write to data store. This is the desired response. +- `404` (Not Found): Unknown metric name. This can take place immediately upon startup or immediately after failover. +- `417` (Expectation Failed): Requesting application is explicitly forbidden to write. Tablet throttler does not implement this at this time. +- `429` (Too Many Requests): Do not write. A normal, expected state indicating there is replication lag. This is the hint for apps/clients to withhold writes. +- `500` (Internal Server Error): Internal error. Do not write. + +Normally, apps will see either `200` or `429`. An app should only ever proceed to write to the database when it receives a `200` response code. + +The throttler chooses the response by comparing the replication lag with a pre-defined _threshold_. If the lag is lower than the threshold, response can be `200` (OK). I fthe lag is higher than the threshold, response would be `429` (Too Many Requests). + +Throttler only collects and evaluates lag on `REPLICA` tablets. It ignores any lag on other tables, such as `RDONLY`. It requires at least one `REPLICA` tablets or else it responds with `500` code. + + +## Configuration + +The default threshold is `1sec` and is set upon tablet startup. + +Use `vttablet -throttle_threshold` command line flag to set a different value, e.g. `-throttle_threshold=0.5s` for a half second. + +## API & usage + +Apps will use `/throttler/check` + +- Apps may indicate their identity via `?app=` param. +- Apps may further declare themselves to be _low priority_ via `?p=low` param. Managed online schema migrations (`gh-ost`, `pt-online-schema-change`) do so, as does the table purge process. + +Examples: + +- `gh-ost` uses this throttler endpoint: `/throttler/check?app=gh-ost&p=low` +- A data backfill app may use: `/throttler/check?app=backfill` (using _normal_ priority) + +A `HEAD` request is sufficient. A `GET` request also provides a `JSON` output. Examples: + +- `{"StatusCode":200,"Value":0.207709,"Threshold":1,"Message":""}` +- `{"StatusCode":429,"Value":3.494452,"Threshold":1,"Message":"Threshold exceeded"}` +- `{"StatusCode":404,"Value":0,"Threshold":0,"Message":"No such metric"}` + +In the above we can see that the tablet is configured to throttle at `1sec` + +Tablet also provides `/throttler/status` endpoint. This is useful for monitoring/management purposes. Examples: + +On a `primary`, healthy tablet: + +```shell +$ curl -s http://tablet1:15100/throttler/status | jq . +``` +```json +{ + "Keyspace": "commerce", + "Shard": "80-c0", + "IsLeader": true, + "IsOpen": true, + "IsDormant": false, + "AggregatedMetrics": { + "mysql/local": { + "Value": 0.193576 + } + }, + "MetricsHealth": {} +} + +``` + +Notable: + +- `"IsLeader": true` indicates this tablet is active, is the `primary`, is running probes +- `"IsDormant": false,` means an app has recently issues a `check`, and the throttler is probing for lag at high frequency. + +On a `REPLICA` tablet: + +```shell +$ curl -s http://tablet2:15100/throttler/status | jq . +``` +```json +{ + "Keyspace": "commerce", + "Shard": "80-c0", + "IsLeader": false, + "IsOpen": true, + "IsDormant": true, + "AggregatedMetrics": {}, + "MetricsHealth": {} +} +``` + + +## Resources + +- [freno](https://github.com/github/freno) project page +- [Mitigating replication lag and reducing read load with freno](https://github.blog/2017-10-13-mitigating-replication-lag-and-reducing-read-load-with-freno/), a GitHub Engineering blog post + From e5f54aea5c7960faa1c9b45c27cb480dc7396053 Mon Sep 17 00:00:00 2001 From: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Date: Wed, 30 Sep 2020 14:25:14 +0300 Subject: [PATCH 02/13] introducing -throttle_threshold flag; clarify behavior where no replicas exist Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> --- content/en/docs/reference/features/tablet-throttler.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/content/en/docs/reference/features/tablet-throttler.md b/content/en/docs/reference/features/tablet-throttler.md index e1ab8377a..508f66fe6 100644 --- a/content/en/docs/reference/features/tablet-throttler.md +++ b/content/en/docs/reference/features/tablet-throttler.md @@ -56,7 +56,7 @@ Thus, the aggregated interval can be off, by default, by some `500ms`. This make The throttler allows clients/apps to `check` for throttle advice. The check is a `HTTP` request, `HEAD` or `GET` method. Throttler returns a HTTP response code as an answer: - `200` (OK): Application may write to data store. This is the desired response. -- `404` (Not Found): Unknown metric name. This can take place immediately upon startup or immediately after failover. +- `404` (Not Found): Unknown metric name. This can take place immediately upon startup or immediately after failover, and should resovle within 10 seconds. - `417` (Expectation Failed): Requesting application is explicitly forbidden to write. Tablet throttler does not implement this at this time. - `429` (Too Many Requests): Do not write. A normal, expected state indicating there is replication lag. This is the hint for apps/clients to withhold writes. - `500` (Internal Server Error): Internal error. Do not write. @@ -65,14 +65,16 @@ Normally, apps will see either `200` or `429`. An app should only ever proceed t The throttler chooses the response by comparing the replication lag with a pre-defined _threshold_. If the lag is lower than the threshold, response can be `200` (OK). I fthe lag is higher than the threshold, response would be `429` (Too Many Requests). -Throttler only collects and evaluates lag on `REPLICA` tablets. It ignores any lag on other tables, such as `RDONLY`. It requires at least one `REPLICA` tablets or else it responds with `500` code. +Throttler only collects and evaluates lag on predefined types of tbles. These are, by default, `REPLICA` tablets. See configuration, following. +When the throttler sees no relevant replicas in the shard, the behavior is to allow writes (respond with `HTTP 200 OK`). ## Configuration The default threshold is `1sec` and is set upon tablet startup. -Use `vttablet -throttle_threshold` command line flag to set a different value, e.g. `-throttle_threshold=0.5s` for a half second. +- Use `vttablet -throttle_threshold` command line flag to set a different value, e.g. `-throttle_threshold=0.5s` for a half second. +- Use `vttablet -throttle_tablet_types="replica,rdonly"` to set the tablet types which are queried for lag and considered by the throttler. `replica` is always implicitly included, and you may add any other tablet type. Any type not specified is ignored by the throttler. ## API & usage From e699becafb1e1828a732c2be75eb1c8cd2e5223d Mon Sep 17 00:00:00 2001 From: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Date: Sun, 4 Oct 2020 11:06:21 +0300 Subject: [PATCH 03/13] documenting '-enable-lag-throttler' feature flag Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> --- content/en/docs/reference/features/tablet-throttler.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/content/en/docs/reference/features/tablet-throttler.md b/content/en/docs/reference/features/tablet-throttler.md index 508f66fe6..73ac023b7 100644 --- a/content/en/docs/reference/features/tablet-throttler.md +++ b/content/en/docs/reference/features/tablet-throttler.md @@ -71,9 +71,11 @@ When the throttler sees no relevant replicas in the shard, the behavior is to al ## Configuration -The default threshold is `1sec` and is set upon tablet startup. -- Use `vttablet -throttle_threshold` command line flag to set a different value, e.g. `-throttle_threshold=0.5s` for a half second. +- throttler is currently disabled by default. Use `-enable-lag-throttler` to enable the throttler. + When the throttler is disabled, it still serves `/throttler/check` API and responds with `HTTP 200 OK` to all requests. + When the throttler is enabled, it implicitly also runs heartbeat injections. +- Use `vttablet -throttle_threshold` command line flag to set a lag threshold value, e.g. `-throttle_threshold=0.5s` for a half second. The default threshold is `1sec` and is set upon tablet startup. - Use `vttablet -throttle_tablet_types="replica,rdonly"` to set the tablet types which are queried for lag and considered by the throttler. `replica` is always implicitly included, and you may add any other tablet type. Any type not specified is ignored by the throttler. ## API & usage From f5aac8da9c5f516a23dc35e6472ab2568d476561 Mon Sep 17 00:00:00 2001 From: Jacques Grove Date: Tue, 6 Oct 2020 09:21:02 -0700 Subject: [PATCH 04/13] Review, slight cleanups. Signed-off-by: Jacques Grove --- .../reference/features/tablet-throttler.md | 37 +++++++++---------- 1 file changed, 18 insertions(+), 19 deletions(-) diff --git a/content/en/docs/reference/features/tablet-throttler.md b/content/en/docs/reference/features/tablet-throttler.md index 73ac023b7..75ac83b4e 100644 --- a/content/en/docs/reference/features/tablet-throttler.md +++ b/content/en/docs/reference/features/tablet-throttler.md @@ -7,11 +7,11 @@ VTTablet runs a cooperative throttling service, that probes the shard's MySQL to ## Why throttler -Vitess uses MySQL with asynchronous or semi-synchronous replication. In these modes, the `primary` applies changes and logs them to the binary log. The replicas will get binary log entires from the primary, potentially acknowledge getting them, and apply them in their own good time. A running replica normally applies the entires as soon as possile, unless it's stopped or configured to delay. However, if the replica is busy (e.g. by serving traffic), then it may not have the resources (disk IO, CPU) to apply events in a timely fashion, and can therefore start lagging. +Vitess uses MySQL with asynchronous or semi-synchronous replication. In these modes, each shard has a primary that applies changes and logs them to the binary log. The replicas for that shard will get binary log entires from the primary, potentially acknowledge them (if semi-synchronous replication is enabled), and apply them. A running replica normally applies the entires as soon as possile, unless it is stopped or configured to delay. However, if the replica is busy (e.g. by serving traffic), then it may not have the resources (disk IO, CPU) to apply events in a timely fashion, and can therefore start lagging. Maintaining low replication lag is important in production: -- A lagging replica does not represent well the data on the `primary`. Reads from the replica reflect data that is not consistent with the master's. This is noticeable on web services where read-after-write can produce results not reflecting the write. +- A lagging replica may not be representative of the data on the primary. Reads from the replica reflect data that is not consistent with the primary's. This is noticeable on web services following read-after-write from the replica, and this then can produce results not reflecting the write. - An up-to-date replica makes for a good failover experience. If all replicas are lagging, then a failover process must choose between waiting for a replica to catch up, or losing data. Some common database operations include mass writes to the database: @@ -21,23 +21,23 @@ Some common database operations include mass writes to the database: - Purging of old data. - Purging of tables as part of safe table `DROP` operation. -These operations can easily incur replication lag. However, these operations are typically not time limited. It is possible to spread them a bit so as to reduce database load. +These operations can easily incur replication lag. However, these operations are typically not time-limited. It is possible to rate-limit them to reduce database load. This is where a throttler gets in. A throttler can tell "replication lag is low, cluster is healthy, go ahead and do some work" or it may say "replication lag is high, please hold your next operation". -Applications are expected to break down their tasks into small sub-tasks (e.g. instead of deleting `1,000,000` rows, only delete `50` at a time), and check in with the throttler in between. +Applications are expected to break down their tasks into small sub-tasks (e.g. instead of deleting `1,000,000` rows, only delete `50` at a time), and check in with the throttler in-between. -The throttler is to be used by such apps only. It should not be used for ongoing, normal OLTP queries (aka transactional workload). +The throttler is intended for use only by application such as the above mass write cases. It should not be used for ongoing, normal OLTP queries. ## Throttler overview -Each `vttablet` runs an internal throttler service, and provides API endpoints to the throttler. Only the `primary` throttler is doing actual work, though. The throttlers on the replicas are mostly dormant, and wait for their turn to become "leaders", i.e. for the tablet to transition into `MASTER` type. +Each `vttablet` runs an internal throttler service, and provides API endpoints to the throttler. Only the primary throttler is doing actual work at any given time. The throttlers on the replicas are mostly dormant, and wait for their turn to become "leaders", i.e. for the tablet to transition into `MASTER` (primary) type. -The `primary` tablet's throttler does the following things, continuously: +The primary tablet's throttler does the following things, continuously: - Confirm it's still the primary tablet for its shard. -- Every `10sec`, use topology server to refresh the shard's tablets listing -- Probe all `REPLICA` tablets for their replication lag. This is done by querying `_vt.heartbeat` table. +- Every `10sec`, use topology server to refresh the shard's tablets list +- Probe all `REPLICA` tablets for their replication lag. This is done by querying the `_vt.heartbeat` table. - Throttler begins in dormant probe mode. As long as no app/client is actually looking for metrics, it probes the servers in multi-second interval. - When apps check for throttle advice, it begins probing servers in subsecond intervals. It reverts to dormant probe mode if no requests are made in the duration of `1min`. - Aggregate last probed value from all relevant tablets; this is _the cluster's metric _. @@ -48,10 +48,9 @@ The cluster's metric is only as accurate as: - The heartbeat injection interval, and - The aggregation interval -The error margin is about the sum of the above values, plus additional overhead. Default probe interval is `100ms`, aggregatoin interval is `100ms` and default heartbeat interval is `250ms`. The latter may be overriden by the user via `-heartbeat_interval` flag to `vttablet`. - -Thus, the aggregated interval can be off, by default, by some `500ms`. This makes it inaccurate for evaluations that require high resolution lag evaluation. Fortunately, for throttling purposes, this resolution is fine. +The error margin is about the sum of the above values, plus additional overhead. Default probe interval is `100ms`, aggregation interval is `100ms` and default heartbeat interval is `250ms`. The latter may be overriden by the user via `-heartbeat_interval` flag to `vttablet`. +Thus, the aggregated interval can be off, by default, by some `500ms`. This makes it inaccurate for evaluations that require high resolution lag evaluation. Fortunately, for throttling purposes, this resolution is sufficient. The throttler allows clients/apps to `check` for throttle advice. The check is a `HTTP` request, `HEAD` or `GET` method. Throttler returns a HTTP response code as an answer: @@ -63,20 +62,20 @@ The throttler allows clients/apps to `check` for throttle advice. The check is a Normally, apps will see either `200` or `429`. An app should only ever proceed to write to the database when it receives a `200` response code. -The throttler chooses the response by comparing the replication lag with a pre-defined _threshold_. If the lag is lower than the threshold, response can be `200` (OK). I fthe lag is higher than the threshold, response would be `429` (Too Many Requests). +The throttler chooses the response by comparing the replication lag with a pre-defined _threshold_. If the lag is lower than the threshold, response can be `200` (OK). I the lag is higher than the threshold, response would be `429` (Too Many Requests). -Throttler only collects and evaluates lag on predefined types of tbles. These are, by default, `REPLICA` tablets. See configuration, following. +The throttler only collects and evaluates lag on a set of predefined tablet types. By default, this tablet type set is `REPLICA`. See configuration, below. When the throttler sees no relevant replicas in the shard, the behavior is to allow writes (respond with `HTTP 200 OK`). ## Configuration -- throttler is currently disabled by default. Use `-enable-lag-throttler` to enable the throttler. +- The throttler is currently disabled by default. Use the `vttablet` option `-enable-lag-throttler` to enable the throttler. When the throttler is disabled, it still serves `/throttler/check` API and responds with `HTTP 200 OK` to all requests. When the throttler is enabled, it implicitly also runs heartbeat injections. -- Use `vttablet -throttle_threshold` command line flag to set a lag threshold value, e.g. `-throttle_threshold=0.5s` for a half second. The default threshold is `1sec` and is set upon tablet startup. -- Use `vttablet -throttle_tablet_types="replica,rdonly"` to set the tablet types which are queried for lag and considered by the throttler. `replica` is always implicitly included, and you may add any other tablet type. Any type not specified is ignored by the throttler. +- Use the `vttablet` flag `-throttle_threshold` to set a lag threshold value, e.g. `-throttle_threshold=0.5s` for a half second. The default threshold is `1sec` and is set upon tablet startup. +- Use the `vttablet` flag `-throttle_tablet_types="replica,rdonly"` to set the tablet types which are queried for lag and considered by the throttler. `replica` is always implicitly included (and the default), and you may add any other tablet type. Any type not specified is ignored by the throttler. ## API & usage @@ -124,7 +123,7 @@ $ curl -s http://tablet1:15100/throttler/status | jq . Notable: -- `"IsLeader": true` indicates this tablet is active, is the `primary`, is running probes +- `"IsLeader": true` indicates this tablet is active, is the `primary`, and is running probes - `"IsDormant": false,` means an app has recently issues a `check`, and the throttler is probing for lag at high frequency. On a `REPLICA` tablet: @@ -145,7 +144,7 @@ $ curl -s http://tablet2:15100/throttler/status | jq . ``` -## Resources +## Resources - [freno](https://github.com/github/freno) project page - [Mitigating replication lag and reducing read load with freno](https://github.blog/2017-10-13-mitigating-replication-lag-and-reducing-read-load-with-freno/), a GitHub Engineering blog post From 758c085d10d18477d0aa9ed244c3c9857a94a5d5 Mon Sep 17 00:00:00 2001 From: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Date: Wed, 7 Oct 2020 09:17:16 +0300 Subject: [PATCH 05/13] fix typos per review Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> --- content/en/docs/reference/features/tablet-throttler.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/content/en/docs/reference/features/tablet-throttler.md b/content/en/docs/reference/features/tablet-throttler.md index 75ac83b4e..822aa4c05 100644 --- a/content/en/docs/reference/features/tablet-throttler.md +++ b/content/en/docs/reference/features/tablet-throttler.md @@ -7,7 +7,7 @@ VTTablet runs a cooperative throttling service, that probes the shard's MySQL to ## Why throttler -Vitess uses MySQL with asynchronous or semi-synchronous replication. In these modes, each shard has a primary that applies changes and logs them to the binary log. The replicas for that shard will get binary log entires from the primary, potentially acknowledge them (if semi-synchronous replication is enabled), and apply them. A running replica normally applies the entires as soon as possile, unless it is stopped or configured to delay. However, if the replica is busy (e.g. by serving traffic), then it may not have the resources (disk IO, CPU) to apply events in a timely fashion, and can therefore start lagging. +Vitess uses MySQL with asynchronous or semi-synchronous replication. In these modes, each shard has a primary that applies changes and logs them to the binary log. The replicas for that shard will get binary log entries from the primary, potentially acknowledge them (if semi-synchronous replication is enabled), and apply them. A running replica normally applies the entires as soon as possile, unless it is stopped or configured to delay. However, if the replica is busy (e.g. by serving traffic), then it may not have the resources (disk IO, CPU) to apply events in a timely fashion, and can therefore start lagging. Maintaining low replication lag is important in production: @@ -27,7 +27,7 @@ This is where a throttler gets in. A throttler can tell "replication lag is low, Applications are expected to break down their tasks into small sub-tasks (e.g. instead of deleting `1,000,000` rows, only delete `50` at a time), and check in with the throttler in-between. -The throttler is intended for use only by application such as the above mass write cases. It should not be used for ongoing, normal OLTP queries. +The throttler is intended for use only for operations such as the above mass write cases. It should not be used for ongoing, normal OLTP queries. ## Throttler overview @@ -62,7 +62,7 @@ The throttler allows clients/apps to `check` for throttle advice. The check is a Normally, apps will see either `200` or `429`. An app should only ever proceed to write to the database when it receives a `200` response code. -The throttler chooses the response by comparing the replication lag with a pre-defined _threshold_. If the lag is lower than the threshold, response can be `200` (OK). I the lag is higher than the threshold, response would be `429` (Too Many Requests). +The throttler chooses the response by comparing the replication lag with a pre-defined _threshold_. If the lag is lower than the threshold, response can be `200` (OK). If the lag is higher than the threshold, response would be `429` (Too Many Requests). The throttler only collects and evaluates lag on a set of predefined tablet types. By default, this tablet type set is `REPLICA`. See configuration, below. @@ -124,7 +124,7 @@ $ curl -s http://tablet1:15100/throttler/status | jq . Notable: - `"IsLeader": true` indicates this tablet is active, is the `primary`, and is running probes -- `"IsDormant": false,` means an app has recently issues a `check`, and the throttler is probing for lag at high frequency. +- `"IsDormant": false,` means an app has recently issued a `check`, and the throttler is probing for lag at high frequency. On a `REPLICA` tablet: From 9d16f21af8b770cbe1171ee71a27d7e4b6cf85b8 Mon Sep 17 00:00:00 2001 From: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Date: Thu, 8 Oct 2020 08:04:06 +0300 Subject: [PATCH 06/13] fixes per review by @bnlandry, reapplying so as to sign the commit Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> --- .../reference/features/tablet-throttler.md | 34 ++++++++++--------- 1 file changed, 18 insertions(+), 16 deletions(-) diff --git a/content/en/docs/reference/features/tablet-throttler.md b/content/en/docs/reference/features/tablet-throttler.md index 822aa4c05..00e40e0c7 100644 --- a/content/en/docs/reference/features/tablet-throttler.md +++ b/content/en/docs/reference/features/tablet-throttler.md @@ -3,31 +3,31 @@ title: Tablet throttler aliases: ['/docs/user-guides/tablet-throttler/','/docs/reference/tablet-throttler/'] --- -VTTablet runs a cooperative throttling service, that probes the shard's MySQL topology and observes replication lag on servers. This throttler is derived from GitHub's [freno](https://github.com/github/freno). +VTTablet runs a cooperative throttling service. This service probes the shard's MySQL topology and observes replication lag on servers. This throttler is derived from GitHub's [freno](https://github.com/github/freno). ## Why throttler -Vitess uses MySQL with asynchronous or semi-synchronous replication. In these modes, each shard has a primary that applies changes and logs them to the binary log. The replicas for that shard will get binary log entries from the primary, potentially acknowledge them (if semi-synchronous replication is enabled), and apply them. A running replica normally applies the entires as soon as possile, unless it is stopped or configured to delay. However, if the replica is busy (e.g. by serving traffic), then it may not have the resources (disk IO, CPU) to apply events in a timely fashion, and can therefore start lagging. +Vitess uses MySQL with asynchronous or semi-synchronous replication. In these modes, each shard has a primary instance that applies changes and logs them to the binary log. The replicas for that shard will get binary log entries from the primary, potentially acknowledge them (if semi-synchronous replication is enabled), and apply them. A running replica normally applies the entries as soon as possible, unless it is stopped or configured to delay. However, if the replica is busy, then it may not have the resources to apply events in a timely fashion, and can therefore start lagging. For example, if the replica is serving traffic, it may lack the necessary disk I/O or CPU to avoid lagging behind the primary. -Maintaining low replication lag is important in production: +Maintaining low replication lag is important in production for two reasons: -- A lagging replica may not be representative of the data on the primary. Reads from the replica reflect data that is not consistent with the primary's. This is noticeable on web services following read-after-write from the replica, and this then can produce results not reflecting the write. -- An up-to-date replica makes for a good failover experience. If all replicas are lagging, then a failover process must choose between waiting for a replica to catch up, or losing data. +- A lagging replica may not be representative of the data on the primary. Reads from the replica reflect data that is not consistent with the data on the primary. This is noticeable on web services following read-after-write from the replica, and this can produce results not reflecting the write. +- An up-to-date replica makes for a good failover experience. If all replicas are lagging, then a failover process must choose between waiting for a replica to catch up or losing data. -Some common database operations include mass writes to the database: +Some common database operations include mass writes to the database, including the following: -- Online schema migrations, duplicating entire tables. -- Mass population of columns (e.g. following a `ADD COLUMN` migration, populate the new column with derived value). -- Purging of old data. -- Purging of tables as part of safe table `DROP` operation. +- Online schema migrations duplicating entire tables +- Mass population of columns, such as populating the new column with derived values following an `ADD COLUMN` migration +- Purging of old data +- Purging of tables as part of safe table `DROP` operation These operations can easily incur replication lag. However, these operations are typically not time-limited. It is possible to rate-limit them to reduce database load. -This is where a throttler gets in. A throttler can tell "replication lag is low, cluster is healthy, go ahead and do some work" or it may say "replication lag is high, please hold your next operation". +This is where a throttler becomes useful. A throttler can detect when replication lag is low, a cluster is healthy, and operations can proceed. It can also detect when replication lag is high and advise applications to hold the next operation. -Applications are expected to break down their tasks into small sub-tasks (e.g. instead of deleting `1,000,000` rows, only delete `50` at a time), and check in with the throttler in-between. +Applications are expected to break down their tasks into small sub-tasks. For example, instead of deleting `1,000,000` rows, an application should only delete `50` at a time. Between these sub-tasks, the application should check in with the throttler. -The throttler is intended for use only for operations such as the above mass write cases. It should not be used for ongoing, normal OLTP queries. +The throttler is only intended for use with operations such as the above mass write cases. It should not be used for ongoing, normal OLTP queries. ## Throttler overview @@ -123,14 +123,17 @@ $ curl -s http://tablet1:15100/throttler/status | jq . Notable: -- `"IsLeader": true` indicates this tablet is active, is the `primary`, and is running probes -- `"IsDormant": false,` means an app has recently issued a `check`, and the throttler is probing for lag at high frequency. +`"IsLeader": true` indicates this tablet is active, is the `primary`, and is running probes. +`"IsDormant": false,` means that an application has recently issued a `check`, and the throttler is probing for lag at high frequency. On a `REPLICA` tablet: ```shell $ curl -s http://tablet2:15100/throttler/status | jq . ``` + +This API call returns the following JSON object: + ```json { "Keyspace": "commerce", @@ -148,4 +151,3 @@ $ curl -s http://tablet2:15100/throttler/status | jq . - [freno](https://github.com/github/freno) project page - [Mitigating replication lag and reducing read load with freno](https://github.blog/2017-10-13-mitigating-replication-lag-and-reducing-read-load-with-freno/), a GitHub Engineering blog post - From e7d6d04a9ebd9ed87bfcf95f3d90fa6b026b9fdb Mon Sep 17 00:00:00 2001 From: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Date: Thu, 8 Oct 2020 08:35:44 +0300 Subject: [PATCH 07/13] fixes per review by @bnlandry, reapplying so as to sign the commit Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> --- .../reference/features/tablet-throttler.md | 68 +++++++++++-------- 1 file changed, 38 insertions(+), 30 deletions(-) diff --git a/content/en/docs/reference/features/tablet-throttler.md b/content/en/docs/reference/features/tablet-throttler.md index 00e40e0c7..f33727c52 100644 --- a/content/en/docs/reference/features/tablet-throttler.md +++ b/content/en/docs/reference/features/tablet-throttler.md @@ -31,42 +31,47 @@ The throttler is only intended for use with operations such as the above mass wr ## Throttler overview -Each `vttablet` runs an internal throttler service, and provides API endpoints to the throttler. Only the primary throttler is doing actual work at any given time. The throttlers on the replicas are mostly dormant, and wait for their turn to become "leaders", i.e. for the tablet to transition into `MASTER` (primary) type. +Each `vttablet` runs an internal throttler service, and provides API endpoints to the throttler. Only the primary throttler is doing actual work at any given time. The throttlers on the replicas are mostly dormant, and wait for their turn to become "leaders," such as when the tablet transitions into `MASTER` (primary) type. -The primary tablet's throttler does the following things, continuously: +The primary tablet's throttler continuously does the following things: -- Confirm it's still the primary tablet for its shard. -- Every `10sec`, use topology server to refresh the shard's tablets list -- Probe all `REPLICA` tablets for their replication lag. This is done by querying the `_vt.heartbeat` table. - - Throttler begins in dormant probe mode. As long as no app/client is actually looking for metrics, it probes the servers in multi-second interval. - - When apps check for throttle advice, it begins probing servers in subsecond intervals. It reverts to dormant probe mode if no requests are made in the duration of `1min`. -- Aggregate last probed value from all relevant tablets; this is _the cluster's metric _. +- The throttler confirms it is still the primary tablet for its shard. +- Every `10sec`, the throttler uses the topology server to refresh the shard's tablets list. +- The throttler probes all `REPLICA` tablets for their replication lag. This is done by querying the `_vt.heartbeat` table. + - The throttler begins in dormant probe mode. As long as no application or client is actually looking for metrics, it probes the servers at multi-second intervals. + - When applications check for throttle advice, the throttler begins probing servers in subsecond intervals. It reverts to dormant probe mode if no requests are made in the duration of `1min`. +- The throttler aggregates the last probed values from all relevant tablets. This is _the cluster's metric _. -The cluster's metric is only as accurate as: +The cluster's metric is only as accurate as the following metrics: -- The probe interval, -- The heartbeat injection interval, and +- The probe interval +- The heartbeat injection interval - The aggregation interval -The error margin is about the sum of the above values, plus additional overhead. Default probe interval is `100ms`, aggregation interval is `100ms` and default heartbeat interval is `250ms`. The latter may be overriden by the user via `-heartbeat_interval` flag to `vttablet`. +The error margin equals approximately the sum of the above values, plus additional overhead. The defaults for these intervals are as follows: ++ Probe interval: `100ms` ++ Aggregation interval: `100ms` ++ Heartbeat interval: `250ms` -Thus, the aggregated interval can be off, by default, by some `500ms`. This makes it inaccurate for evaluations that require high resolution lag evaluation. Fortunately, for throttling purposes, this resolution is sufficient. +The user may override the heartbeat interval by sending `-heartbeat_interval` flag to `vttablet`. -The throttler allows clients/apps to `check` for throttle advice. The check is a `HTTP` request, `HEAD` or `GET` method. Throttler returns a HTTP response code as an answer: +Thus, the aggregated interval can be off, by default, by some `500ms`. This makes it inaccurate for evaluations that require high resolution lag evaluation. This resolution is sufficient for throttling purposes. -- `200` (OK): Application may write to data store. This is the desired response. -- `404` (Not Found): Unknown metric name. This can take place immediately upon startup or immediately after failover, and should resovle within 10 seconds. -- `417` (Expectation Failed): Requesting application is explicitly forbidden to write. Tablet throttler does not implement this at this time. -- `429` (Too Many Requests): Do not write. A normal, expected state indicating there is replication lag. This is the hint for apps/clients to withhold writes. -- `500` (Internal Server Error): Internal error. Do not write. +The throttler allows clients and applications to `check` for throttle advice. The check is an `HTTP` request, `HEAD` method, or `GET` method. Throttler returns one of the following HTTP response codes as an answer: + +- `200` (OK): The application may write to the data store. This is the desired response. +- `404` (Not Found): The check contains an unknown metric name. This can take place immediately upon startup or immediately after failover, and should resolve within 10 seconds. +- `417` (Expectation Failed): The requesting application is explicitly forbidden to write. The throttler does not implement this at this time. +- `429` (Too Many Requests): Do not write. A normal, expected state indicating there is replication lag. This is the hint for applications or clients to withhold writes. +- `500` (Internal Server Error): An internal error has occurred. Do not write. Normally, apps will see either `200` or `429`. An app should only ever proceed to write to the database when it receives a `200` response code. -The throttler chooses the response by comparing the replication lag with a pre-defined _threshold_. If the lag is lower than the threshold, response can be `200` (OK). If the lag is higher than the threshold, response would be `429` (Too Many Requests). +The throttler chooses the response by comparing the replication lag with a pre-defined _threshold_. If the lag is lower than the threshold, response can be `200` (OK). If the lag is higher than the threshold, the response would be `429` (Too Many Requests). -The throttler only collects and evaluates lag on a set of predefined tablet types. By default, this tablet type set is `REPLICA`. See configuration, below. +The throttler only collects and evaluates lag on a set of predefined tablet types. By default, this tablet type set is `REPLICA`. See [Configuration](#Configuration). -When the throttler sees no relevant replicas in the shard, the behavior is to allow writes (respond with `HTTP 200 OK`). +When the throttler sees no relevant replicas in the shard, it allows writes by responding with `HTTP 200 OK`. ## Configuration @@ -74,28 +79,31 @@ When the throttler sees no relevant replicas in the shard, the behavior is to al - The throttler is currently disabled by default. Use the `vttablet` option `-enable-lag-throttler` to enable the throttler. When the throttler is disabled, it still serves `/throttler/check` API and responds with `HTTP 200 OK` to all requests. When the throttler is enabled, it implicitly also runs heartbeat injections. -- Use the `vttablet` flag `-throttle_threshold` to set a lag threshold value, e.g. `-throttle_threshold=0.5s` for a half second. The default threshold is `1sec` and is set upon tablet startup. -- Use the `vttablet` flag `-throttle_tablet_types="replica,rdonly"` to set the tablet types which are queried for lag and considered by the throttler. `replica` is always implicitly included (and the default), and you may add any other tablet type. Any type not specified is ignored by the throttler. +- Use the `vttablet` flag `-throttle_threshold` to set a lag threshold value. The default threshold is `1sec` and is set upon tablet startup. For example, to set a half-second lag threshold, use the flag `-throttle_threshold=0.5s`. + + + +- To set the tablet types that the throttler queries for lag, use the `vttablet` flag `-throttle_tablet_types="replica,rdonly"`. The default tablet type is `replica`; this type is always implicitly included in the tablet types list. You may add any other tablet type. Any type not specified is ignored by the throttler. ## API & usage -Apps will use `/throttler/check` +Applicaitons use the API `/throttler/check`. -- Apps may indicate their identity via `?app=` param. -- Apps may further declare themselves to be _low priority_ via `?p=low` param. Managed online schema migrations (`gh-ost`, `pt-online-schema-change`) do so, as does the table purge process. +- Applications may indicate their identity via `?app=` parameter. +- Applications may also declare themselves to be _low priority_ via `?p=low` param. Managed online schema migrations (`gh-ost`, `pt-online-schema-change`) do so, as does the table purge process. Examples: - `gh-ost` uses this throttler endpoint: `/throttler/check?app=gh-ost&p=low` -- A data backfill app may use: `/throttler/check?app=backfill` (using _normal_ priority) +- A data backfill application may use this parameter: `/throttler/check?app=backfill` (using _normal_ priority) -A `HEAD` request is sufficient. A `GET` request also provides a `JSON` output. Examples: +A `HEAD` request is sufficient. A `GET` request also provides a `JSON` output. For example: - `{"StatusCode":200,"Value":0.207709,"Threshold":1,"Message":""}` - `{"StatusCode":429,"Value":3.494452,"Threshold":1,"Message":"Threshold exceeded"}` - `{"StatusCode":404,"Value":0,"Threshold":0,"Message":"No such metric"}` -In the above we can see that the tablet is configured to throttle at `1sec` +In the first two above examples we can see that the tablet is configured to throttle at `1sec` Tablet also provides `/throttler/status` endpoint. This is useful for monitoring/management purposes. Examples: From 4349669fc4a8f1fe54c1979675c61d71ce46833d Mon Sep 17 00:00:00 2001 From: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Date: Thu, 8 Oct 2020 08:41:06 +0300 Subject: [PATCH 08/13] fixes per review by @bnlandry, reapplying so as to sign the commit Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> --- content/en/docs/reference/features/tablet-throttler.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/content/en/docs/reference/features/tablet-throttler.md b/content/en/docs/reference/features/tablet-throttler.md index f33727c52..49f1293b8 100644 --- a/content/en/docs/reference/features/tablet-throttler.md +++ b/content/en/docs/reference/features/tablet-throttler.md @@ -105,13 +105,18 @@ A `HEAD` request is sufficient. A `GET` request also provides a `JSON` output. F In the first two above examples we can see that the tablet is configured to throttle at `1sec` -Tablet also provides `/throttler/status` endpoint. This is useful for monitoring/management purposes. Examples: +Tablet also provides `/throttler/status` endpoint. This is useful for monitoring and management purposes. -On a `primary`, healthy tablet: +**Example: Healthy primary tablet** + +The following command gets throttler status on a tablet hosted on `tablet1`, serving on port `15100`. ```shell $ curl -s http://tablet1:15100/throttler/status | jq . ``` + +This API call returns the following JSON object: + ```json { "Keyspace": "commerce", @@ -129,7 +134,6 @@ $ curl -s http://tablet1:15100/throttler/status | jq . ``` -Notable: `"IsLeader": true` indicates this tablet is active, is the `primary`, and is running probes. `"IsDormant": false,` means that an application has recently issued a `check`, and the throttler is probing for lag at high frequency. From 4aff55a0c0290de279d5c27ecf851fea3e45347e Mon Sep 17 00:00:00 2001 From: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Date: Thu, 8 Oct 2020 08:44:02 +0300 Subject: [PATCH 09/13] fixes per review Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> --- content/en/docs/reference/features/tablet-throttler.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/content/en/docs/reference/features/tablet-throttler.md b/content/en/docs/reference/features/tablet-throttler.md index 49f1293b8..6f4a63eb6 100644 --- a/content/en/docs/reference/features/tablet-throttler.md +++ b/content/en/docs/reference/features/tablet-throttler.md @@ -138,7 +138,9 @@ This API call returns the following JSON object: `"IsLeader": true` indicates this tablet is active, is the `primary`, and is running probes. `"IsDormant": false,` means that an application has recently issued a `check`, and the throttler is probing for lag at high frequency. -On a `REPLICA` tablet: +**Example: replica tablet** + +The following command gets throttler status on a tablet hosted on `tablet2`, serving on port `15100`. ```shell $ curl -s http://tablet2:15100/throttler/status | jq . From 6546d8dc539156601a18c81c19fa219bbb9abc29 Mon Sep 17 00:00:00 2001 From: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Date: Thu, 8 Oct 2020 08:45:14 +0300 Subject: [PATCH 10/13] fixes per review Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> --- content/en/docs/reference/features/tablet-throttler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/reference/features/tablet-throttler.md b/content/en/docs/reference/features/tablet-throttler.md index 6f4a63eb6..319538a91 100644 --- a/content/en/docs/reference/features/tablet-throttler.md +++ b/content/en/docs/reference/features/tablet-throttler.md @@ -5,7 +5,7 @@ aliases: ['/docs/user-guides/tablet-throttler/','/docs/reference/tablet-throttle VTTablet runs a cooperative throttling service. This service probes the shard's MySQL topology and observes replication lag on servers. This throttler is derived from GitHub's [freno](https://github.com/github/freno). -## Why throttler +## Why throttler: maintaining low replication lag Vitess uses MySQL with asynchronous or semi-synchronous replication. In these modes, each shard has a primary instance that applies changes and logs them to the binary log. The replicas for that shard will get binary log entries from the primary, potentially acknowledge them (if semi-synchronous replication is enabled), and apply them. A running replica normally applies the entries as soon as possible, unless it is stopped or configured to delay. However, if the replica is busy, then it may not have the resources to apply events in a timely fashion, and can therefore start lagging. For example, if the replica is serving traffic, it may lack the necessary disk I/O or CPU to avoid lagging behind the primary. From e19de0a5da75d3192ee057460cbadb30c87ea8b0 Mon Sep 17 00:00:00 2001 From: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Date: Sun, 11 Oct 2020 12:45:34 +0300 Subject: [PATCH 11/13] clarify master-primary transition Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> --- content/en/docs/reference/features/tablet-throttler.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/content/en/docs/reference/features/tablet-throttler.md b/content/en/docs/reference/features/tablet-throttler.md index 319538a91..9164047ad 100644 --- a/content/en/docs/reference/features/tablet-throttler.md +++ b/content/en/docs/reference/features/tablet-throttler.md @@ -5,6 +5,8 @@ aliases: ['/docs/user-guides/tablet-throttler/','/docs/reference/tablet-throttle VTTablet runs a cooperative throttling service. This service probes the shard's MySQL topology and observes replication lag on servers. This throttler is derived from GitHub's [freno](https://github.com/github/freno). +_Note: the Vitess documentation is transitioning the term "Master" (with regard to MySQL replicaiton) to "Primary". this document reflects this transition._ + ## Why throttler: maintaining low replication lag Vitess uses MySQL with asynchronous or semi-synchronous replication. In these modes, each shard has a primary instance that applies changes and logs them to the binary log. The replicas for that shard will get binary log entries from the primary, potentially acknowledge them (if semi-synchronous replication is enabled), and apply them. A running replica normally applies the entries as soon as possible, unless it is stopped or configured to delay. However, if the replica is busy, then it may not have the resources to apply events in a timely fashion, and can therefore start lagging. For example, if the replica is serving traffic, it may lack the necessary disk I/O or CPU to avoid lagging behind the primary. From aecdd5de4f29e00cdde577a831c5dc2c5ee9bcce Mon Sep 17 00:00:00 2001 From: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Date: Sun, 11 Oct 2020 12:47:57 +0300 Subject: [PATCH 12/13] grammar Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> --- content/en/docs/reference/features/tablet-throttler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/reference/features/tablet-throttler.md b/content/en/docs/reference/features/tablet-throttler.md index 9164047ad..4804abfcc 100644 --- a/content/en/docs/reference/features/tablet-throttler.md +++ b/content/en/docs/reference/features/tablet-throttler.md @@ -5,7 +5,7 @@ aliases: ['/docs/user-guides/tablet-throttler/','/docs/reference/tablet-throttle VTTablet runs a cooperative throttling service. This service probes the shard's MySQL topology and observes replication lag on servers. This throttler is derived from GitHub's [freno](https://github.com/github/freno). -_Note: the Vitess documentation is transitioning the term "Master" (with regard to MySQL replicaiton) to "Primary". this document reflects this transition._ +_Note: the Vitess documentation is transitioning from the term "Master" (with regard to MySQL replicaiton) to "Primary". this document reflects this transition._ ## Why throttler: maintaining low replication lag From c3c905ebb454e66919bdf9c3c263984696ca2bfd Mon Sep 17 00:00:00 2001 From: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> Date: Sun, 11 Oct 2020 12:49:18 +0300 Subject: [PATCH 13/13] typo Signed-off-by: Shlomi Noach <2607934+shlomi-noach@users.noreply.github.com> --- content/en/docs/reference/features/tablet-throttler.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/en/docs/reference/features/tablet-throttler.md b/content/en/docs/reference/features/tablet-throttler.md index 4804abfcc..12d805fbf 100644 --- a/content/en/docs/reference/features/tablet-throttler.md +++ b/content/en/docs/reference/features/tablet-throttler.md @@ -5,7 +5,7 @@ aliases: ['/docs/user-guides/tablet-throttler/','/docs/reference/tablet-throttle VTTablet runs a cooperative throttling service. This service probes the shard's MySQL topology and observes replication lag on servers. This throttler is derived from GitHub's [freno](https://github.com/github/freno). -_Note: the Vitess documentation is transitioning from the term "Master" (with regard to MySQL replicaiton) to "Primary". this document reflects this transition._ +_Note: the Vitess documentation is transitioning from the term "Master" (with regard to MySQL replication) to "Primary". this document reflects this transition._ ## Why throttler: maintaining low replication lag