-
Notifications
You must be signed in to change notification settings - Fork 695
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi tenant monitoring #6725
Multi tenant monitoring #6725
Conversation
484a6aa
to
01eb68a
Compare
Codecov Report
@@ Coverage Diff @@
## main #6725 +/- ##
==========================================
- Coverage 93.24% 91.43% -1.81%
==========================================
Files 269 271 +2
Lines 57366 57632 +266
==========================================
- Hits 53489 52698 -791
- Misses 3877 4934 +1057 |
@@ -21,7 +21,7 @@ SELECT citus.clear_network_traffic(); | |||
---- at each significant point. These transactions are 2pc | |||
|
|||
-- fail at DELETE | |||
SELECT citus.mitmproxy('conn.onQuery(query="^DELETE").kill()'); | |||
SELECT citus.mitmproxy('conn.onQuery(query="DELETE").kill()'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove all these ^
characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the queries have the tenant attribute comment at the beginning. So they don't start with, for example, DELETE
anymore.
@@ -316,7 +318,9 @@ ORDER BY 1; | |||
view citus_shards_on_worker | |||
view citus_stat_activity | |||
view citus_stat_statements | |||
view citus_stats_tenants | |||
view citus_stats_tenants_storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one and the related function should be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is
src/test/regress/bin/normalize.sed
Outdated
@@ -303,3 +303,5 @@ s/(NOTICE: issuing SET LOCAL application_name TO 'citus_rebalancer gpid=)[0-9]+ | |||
|
|||
# shard_rebalancer output, flaky improvement number | |||
s/improvement of 0.1[0-9]* is lower/improvement of 0.1xxxxx is lower/g | |||
|
|||
s/\/\* attributeTo.*\*\///g |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use a comment what this replacement is for
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should've mentioned this.
Gökhan has another PR that fixes the string parsing. So this part will be changed a bit
#6753
int selectsInLastPeriod; | ||
int selectsInThisPeriod; | ||
|
||
int insertCount; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I probably missed something. Why are we specifically tracking inserts and not other writes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I should add them. Should it be INSERT/UPDATE/DELETE
for writes and SELECT
for reads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated my PR and added UPDATE
and DELETE
queries
} | ||
TenantStats *tenantStats = &monitor->tenants[tenantIndex]; | ||
|
||
LWLockAcquire(&tenantStats->lock, LW_EXCLUSIVE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could this cause contention?
(mainly wondering whether atomic integers would be better, though that might result in multiple locks, so might be worse overall)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had offline discussion. We are keeping this for now, until we run some benchmarks.
|
||
if (MultiTenantMonitoringLogLevel != CITUS_LOG_LEVEL_OFF) | ||
{ | ||
ereport(NOTICE, (errmsg("total select count = %d, total CPU time = %f " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTICE -> MultiTenantMonitoringLogLevel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this log really necessary? I think I can remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I think this is a good start and we can use this as a good base for other sub-PRs to be merged against.
The main issue I see is the locking behaviour. I think we probably want to rethink that a bit. Because the current approach is quite complex and thus unsurprisingly has bugs, some of which I don't know if we can easily fix without changing the logic significantly.
I think the biggest problems stem from the fact that it's trying to divide priorities a bit too much. For example, each backend tries to reorder its own rank all the time. I think the locking logic would become much simpler if we don't do that and instead order the tenants only when we need to truncate the list.
This refactoring of the locking can be done in a separate sub-PR though, to keep it easy to review. Let's use the current branch as a sort-of main branch for now.
INSERT INTO dist_tbl VALUES (1, 'abcd'); | ||
INSERT INTO dist_tbl VALUES (2, 'abcd'); | ||
UPDATE dist_tbl SET b = a + 1 WHERE a = 3; | ||
UPDATE dist_tbl SET b = a + 1 WHERE a = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is there no tenant 4 in the output below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I set the citus.stats_tenants_limit
to 2. Tenant 4 didn't run enough queries yet to be in top 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, it's set to 2 in pg_regress_multi.pl
. Any reason why it's set there instead of in this file? I think having it in this file would be more discoverable, and clearer when reading the test what is going on. I do realize that these tests change anyway in #6761, so maybe changing the default isn't even necessary anymore then.
void | ||
CitusAttributeToEnd(QueryDesc *queryDesc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing comment
|
||
|
||
/* | ||
* AttributeQueryIfAnnotated assigns the attributes of tenant if the query is annotated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence is a bit strange an it could use a longer description.
typedef struct TenantStats | ||
{ | ||
char tenantAttribute[100]; | ||
|
||
int colocationGroupId; | ||
|
||
int readCount; | ||
double totalReadTime; | ||
int readsInLastPeriod; | ||
int readsInThisPeriod; | ||
|
||
int writeCount; | ||
double totalWriteTime; | ||
int writesInLastPeriod; | ||
int writesInThisPeriod; | ||
|
||
time_t lastQueryTime; | ||
|
||
long long score; | ||
time_t lastScoreReduction; | ||
|
||
NamedLWLockTranche namedLockTranche; | ||
LWLock lock; | ||
} TenantStats; | ||
|
||
typedef struct MultiTenantMonitor | ||
{ | ||
time_t periodStart; | ||
|
||
NamedLWLockTranche namedLockTranche; | ||
LWLock lock; | ||
|
||
int tenantCount; | ||
TenantStats tenants[FLEXIBLE_ARRAY_MEMBER]; | ||
} MultiTenantMonitor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These structs could use some comments. Some things that would be useful:
- What do the structs represent?
- What are the different locks for?
- How large is the tenants array?
|
||
|
||
/* | ||
* FindTenantStats finds the dsm (dynamic shared memory) handle for the current tenant's statistics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* FindTenantStats finds the dsm (dynamic shared memory) handle for the current tenant's statistics. | |
* FindTenantStats finds the index for the current tenant's statistics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch 👍
|
||
if (tenantIndex == -1) | ||
{ | ||
tenantIndex = CreateTenantStats(monitor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also a race condition if two tenants are created at the same time. One way to resolve this would be by incrementing tenantCount atomically in CreateTenantStats. That way concurrent calls would get a different index.
TenantStats tempTenant = monitor->tenants[tenantIndex]; | ||
monitor->tenants[tenantIndex] = monitor->tenants[tenantIndex - 1]; | ||
monitor->tenants[tenantIndex - 1] = tempTenant; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copying the tenant lock. Copying locks is not an operation that's supported on them. They require that people look at the same memory locations. So if some other process is waiting on &monitor->tenants[tenantIndex - 1].lock
while we do this swap then that's going to cause many problems.
#define ATTRIBUTE_PREFIX "/* attributeTo: " | ||
#define ATTRIBUTE_STRING_FORMAT "/* attributeTo: %s,%d */" | ||
#define CITUS_STATS_TENANTS_COLUMNS 7 | ||
#define ONE_QUERY_SCORE 1000000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this such a high number?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value will be halved every period, which is by default 1 minute. So in about half an hour this number will become 0.
* After updating the score we might need to change the rank of the tenant in the monitor | ||
*/ | ||
while (tenantIndex != 0 && | ||
monitor->tenants[tenantIndex - 1].score < |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not threadsafe to do since we don't have a read lock on this tenant.
|
||
MultiTenantMonitor *monitor = GetMultiTenantMonitor(); | ||
|
||
LWLockAcquire(&monitor->lock, LW_EXCLUSIVE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a really broad lock that we shouldn't acquire exclusively when not necessary. Now we acquire it for every update of the stats, effectively making the finer grained per-tenant lock completely unused. We should only take this lock exclusively when modifying the order of the tenants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes from meeting
SHARE_LOCK
find tenants
if found tenants -> modify tenant
else
release lock
acquire EXCLUSIVE_LOCK
find not find tenanants -> create tenant
release ECLUSIVE LOCK
SHARE_LOCK
find tenant
if found tenant -> modify tenant
else -> stop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with @marcocitus: It's probably nice to only sort the array in the exclusive lock some percentage of the time, for the many tenants case (e.g. deviceids). This can be done either probabilistically, or based on time since last taking the lock to create a tenant, or a combination of both. So e.g. if (lastAdditionByThisProcess > 5 seconds ago || random() > 0.001)
. Not necessarily now, more as something to try out if benchmarks show perf issues.
* | ||
* Every time tenant count hits CitusStatsTenantsLimit * 3, we reduce it back to CitusStatsTenantsLimit * 2. | ||
*/ | ||
if (monitor->tenantCount >= CitusStatsTenantsLimit * 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be part of createtenant check
/* | ||
* The time that the monitor's latest period started. | ||
* This value is incremented by citus.stats_tenants_period each time. | ||
* The increment is done lazily. | ||
*/ | ||
time_t periodStart; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion: We don't need to store this in the monitor anymore. It can be a local variable everywhere that we use it.
* and is 3 * citus.stats_tenants_limit | ||
*/ | ||
int tenantCount; | ||
TenantStats tenants[FLEXIBLE_ARRAY_MEMBER]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussing with @marcocitus: Should this be a hashtable? Hashtables are O(1) for lookups, but small arrays can be faster in practice due to cache locality. We use shared memory hash table in SharedConnectionStatsShmemInit
. If benchmarks show that the current approach is slow, we could try out a hashtable.
/* | ||
* If the tenant count reached 3 * CitusStatsTenantsLimit, we evict the tenants | ||
* with the lowest score. | ||
*/ | ||
EvictTenantsIfNecessary(queryTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move this to before we insert the new tenant.
static void | ||
RecordTenantStats(TenantStats *tenantStats) | ||
{ | ||
tenantStats->score += ONE_QUERY_SCORE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make sure this addition doesn't overflow. Making it clamp to MAX_INT64 seems fine.
7707875
to
51d167b
Compare
* The tenant monitoring score of this tenant. This value is increased by ONE_QUERY_SCORE at every query | ||
* and halved after every period. | ||
*/ | ||
long long score; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a particular established algorithm? (it looks vaguely familiar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made research a bit and then made this algorithm up. I'm halving just to make it more efficient by using bit shift left operator <<
.
This is open to any suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think explaining the goals of the algorithm in the comment would be good. Also probably good to mention that it's a custom algorithm. Mainly in case we realise it doesn't do what we want in some scenarios.
src/test/regress/pg_regress_multi.pl
Outdated
@@ -487,6 +487,7 @@ sub generate_hba | |||
push(@pgOptions, "citus.enable_manual_changes_to_shards=on"); | |||
push(@pgOptions, "citus.allow_unsafe_locks_from_workers=on"); | |||
push(@pgOptions, "citus.stat_statements_track = 'all'"); | |||
push(@pgOptions, "citus.stats_tenants_limit = 2"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to have this change globally instead of only in the test that we care about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we set the shared memory at the beginning, I thought I have to set this up before setting up the cluster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah makes sense
d786b2f
to
d660339
Compare
LWLockAcquire(&monitor->lock, LW_SHARED); | ||
|
||
int currentTenantIndex = FindTenantStats(monitor); | ||
|
||
if (currentTenantIndex != -1) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
locking logic could definitely use some comments, because it's not trivial
|
||
|
||
/* | ||
* CreateTenantStats creates the data structure for a tenant's statistics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* CreateTenantStats creates the data structure for a tenant's statistics. | |
* CreateTenantStats creates the data structure for a tenant's statistics. | |
* NOTE: Calling this function requires that the monitor->lock is held in LW_EXCLUSIVE mode |
partitionColumnType); | ||
} | ||
|
||
task->partitionColumn = partitionColumnString; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partitionColumn suggests that this contains a description/Var of the partition column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can also use partitionColumnString then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or maybe partitionColumnValueString?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this to partitionKeyValue and it is now a Const*
value like in Job
struct
gettext_noop("monitor limit"), | ||
NULL, | ||
&CitusStatsTenantsLimit, | ||
10, 1, 100, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These values seem pretty low, which does not make me super confident that we will actually be able to capture the top tenants.
For comparison, our TopN extension defaults to 1000.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's set the default to 100, and the maximum to 10000.
@@ -157,6 +158,8 @@ distributed_planner(Query *parse, | |||
bool fastPathRouterQuery = false; | |||
Node *distributionKeyValue = NULL; | |||
|
|||
AttributeQueryIfAnnotated(query_string, parse->commandType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we move it ot the end of this function? the logic here is quite important to understanding the planner and already quite messy and verbose
* for the tenant statistics monitoring this function records the tenant attributes. | ||
*/ | ||
void | ||
AttributeQueryIfAnnotated(const char *query_string, CmdType commandType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to disable the annotation injection & parsing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just merged #6769 I think it does what you're asking for.
* sleep_until_next_period sleeps until the next monitoring period starts. | ||
*/ | ||
Datum | ||
sleep_until_next_period(PG_FUNCTION_ARGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to a test directory
* clean_citus_stats_tenants cleans the citus_stats_tenants monitor. | ||
*/ | ||
Datum | ||
clean_citus_stats_tenants(PG_FUNCTION_ARGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be consistent with pg/citus_stat_statements, should we call this citus_stat_tenants_reset and make it a properly exposed function?
@@ -0,0 +1,66 @@ | |||
-- cts in the query is an abbreviation for citus_stats_tenants | |||
CREATE OR REPLACE FUNCTION pg_catalog.citus_stats_tenants ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we use citus_stat_tenants (without the extra s) to be consistent with existing pg_stat_ and citus_stat_ views?
@mulander this seems like a typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is. I have this in the task list in the description. I was waiting for other PRs to be merged to fix this. I'll get to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merged a PR to fix this
partitionColumnType = workerJob->partitionKeyValue->consttype; | ||
partitionColumnString = DatumToString(partitionColumnValue, | ||
partitionColumnType); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe wrap in something like task->partitionColumnValueString = PartitionColumnValueStringFromJob(workerJob)
since this seems to repeat in a few places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related, do we need to do the string conversion at this particular moment, or could we do it more lazily?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it to end functions
Datum partitionColumnValue; | ||
Oid partitionColumnType = 0; | ||
char *partitionColumnString = NULL; | ||
if (workerJob->partitionKeyValue != NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be possible to refactor partitionKeyValue out of workerJob? it seems like it's not the right place, since a job can span across many partition column values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean Task
? It was already in the Job
before this PR.
|
||
if (MultiTenantMonitoringLogLevel != CITUS_LOG_LEVEL_OFF) | ||
{ | ||
ereport(NOTICE, (errmsg("attributing query to tenant: %s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NOTICE -> MultiTenantMonitoringLogLevel ?
(otherwise, why is this a log level?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted this log
* AnnotateQuery annotates the query with tenant attributes. | ||
*/ | ||
char * | ||
AnnotateQuery(char *queryString, char *partitionColumn, int colocationId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
partitionColumn sounds like it contains the name of the partition column
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to partitionKeyValue
static void | ||
AttributeMetricsIfApplicable() | ||
{ | ||
if (strcmp(attributeToTenant, "") != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe
if (attributeToTenant[0] == '\0')
{
/* no tenant ID to attribute */
return;
}
clock_t end = { 0 }; | ||
|
||
end = clock(); | ||
time_t queryTime = time(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we usually use postgres' time functions like GetCurrentTimestamp()
{ | ||
LWLockRelease(&monitor->lock); | ||
|
||
LWLockAcquire(&monitor->lock, LW_EXCLUSIVE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should be careful that this does not happen often for typical CRUD workloads (e.g. distribution column is bigserial), otherwise we get significant contention
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm turning the tracking off by default
char attributeToTenant[MAX_TENANT_ATTRIBUTE_LENGTH] = ""; | ||
CmdType attributeToCommandType = CMD_UNKNOWN; | ||
int attributeToColocationGroupId = INVALID_COLOCATION_ID; | ||
|
||
const char *SharedMemoryNameForMultiTenantMonitor = | ||
"Shared memory for multi tenant monitor"; | ||
|
||
char *tenantTrancheName = "Tenant Tranche"; | ||
char *monitorTrancheName = "Multi Tenant Monitor Tranche"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: These could all be static
afaict, because they are not used outside of this file. Also in general we try to use CapitalizedCamelCase
for global variables to make it clear that they are special.
char attributeToTenant[MAX_TENANT_ATTRIBUTE_LENGTH] = ""; | |
CmdType attributeToCommandType = CMD_UNKNOWN; | |
int attributeToColocationGroupId = INVALID_COLOCATION_ID; | |
const char *SharedMemoryNameForMultiTenantMonitor = | |
"Shared memory for multi tenant monitor"; | |
char *tenantTrancheName = "Tenant Tranche"; | |
char *monitorTrancheName = "Multi Tenant Monitor Tranche"; | |
char attributeToTenant[MAX_TENANT_ATTRIBUTE_LENGTH] = ""; | |
CmdType attributeToCommandType = CMD_UNKNOWN; | |
int attributeToColocationGroupId = INVALID_COLOCATION_ID; | |
const char *SharedMemoryNameForMultiTenantMonitor = | |
"Shared memory for multi tenant monitor"; | |
char *tenantTrancheName = "Tenant Tranche"; | |
char *monitorTrancheName = "Multi Tenant Monitor Tranche"; |
@@ -2283,6 +2304,37 @@ RegisterCitusConfigVariables(void) | |||
GUC_STANDARD, | |||
NULL, NULL, NULL); | |||
|
|||
DefineCustomEnumVariable( | |||
"citus.stat_tenants_track", | |||
gettext_noop("enable disable"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The descriptions of these GUCs should be improved
gettext_noop("monitor period"), | ||
NULL, | ||
&CitusStatsTenantsPeriod, | ||
60, 1, 1000000000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The max seems quite excessively high. I think 1 day is probably the most that it useful to anyone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially made this 60 * 60, so 1 hour. But in testing I needed to make sure a period doesn't pass when I don't want it to. That's why I increased it to this number.
What do you think I can do instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
60*60*24
seems like a good maximum
3e2cf73
to
e302be1
Compare
aeb604b
to
14ad3ef
Compare
Add a view that collects statistics from all nodes
Creates log message for citus_stats_tenants
This pull request modifies our query annotation and parsing logic by using a JSON-structured annotation string. Basically, it prepends a JSON string in a multiline comment that contains `tenantId` and `colocationId` to a query string to be able to track query statistics on the worker nodes. It also parses the received annotation in the query string and sets the relevant tenantId and colocationId on the worker nodes. --------- Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
Adds a GUC for turning tenant monitoring on and off
#6812) This PR revokes permissions from stats views and functions from the public role and grants `SELECT` and `EXECUTE` to pg_monitor.
Rename some objects `*citus_stats_tenants*` to `*citus_stat_tenants*` `citus_stats_tenants*` and `citus_stat_tenants*` GUCs to `stat_tenants*` `MultiTenantMonitoringLogLevel` to `CitusStatTenantsLogLevel` `multi_tenant_monitoring_log_level` to `citus.stat_tenants_log_level` `attribute` files to `citus_stat_tenants`
Turn it back on for in tests Also address some reviews
3df5a55
to
83bfe90
Compare
DESCRIPTION: Adds views that monitor statistics on tenant usages
This PR adds
citus_stats_tenants
view that monitors the tenants on the cluster.citus_stats_tenants
shows the node id, colocation id, tenant attribute, read count in this period and last period, and query count in this period and last period of the tenant.Tenant attribute currently is the tenant's distribution column value, later when schema based sharding is introduced, this meaning might change.
A period is a time bucket the queries are counted by. Read and query counts for this period can increase until the current period ends. After that those counts are moved to last period's counts, which cannot change. The period length can be set using 'citus.stats_tenants_period'.
SELECT
queries are counted as read queries,INSERT
,UPDATE
andDELETE
queries are counted as write queries. So in the view read counts areSELECT
counts and query counts areSELECT
,INSERT
,UPDATE
andDELETE
count.The data is stored in shared memory, in a struct named
MultiTenantMonitor
.citus_stats_tenants
shows the data from local tenants.citus_stats_tenants
show up tocitus.stats_tenant_limit
number of tenants.The tenants are scored based on the number of queries they run and the recency of those queries. Every query ran increases the score of tenant by
ONE_QUERY_SCORE
, and after every period ends the scores are halved. Halving is done lazily.To retain information a longer the monitor keeps up to 3 times
citus.stats_tenant_limit
tenants. When the tenant count hits3 * citus.stats_tenant_limit
, lastcitus.stats_tenant_limit
tenants are removed. To see all stored tenants you can usecitus_stats_tenants(return_all_tenants := true)