Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi tenant monitoring #6725

Merged
merged 13 commits into from
Apr 5, 2023
Merged

Multi tenant monitoring #6725

merged 13 commits into from
Apr 5, 2023

Conversation

halilozanakgul
Copy link
Contributor

@halilozanakgul halilozanakgul commented Feb 17, 2023

DESCRIPTION: Adds views that monitor statistics on tenant usages

This PR adds citus_stats_tenants view that monitors the tenants on the cluster.

citus_stats_tenants shows the node id, colocation id, tenant attribute, read count in this period and last period, and query count in this period and last period of the tenant.
Tenant attribute currently is the tenant's distribution column value, later when schema based sharding is introduced, this meaning might change.
A period is a time bucket the queries are counted by. Read and query counts for this period can increase until the current period ends. After that those counts are moved to last period's counts, which cannot change. The period length can be set using 'citus.stats_tenants_period'.

SELECT queries are counted as read queries, INSERT, UPDATE and DELETE queries are counted as write queries. So in the view read counts are SELECT counts and query counts are SELECT, INSERT, UPDATE and DELETE count.

The data is stored in shared memory, in a struct named MultiTenantMonitor.

citus_stats_tenants shows the data from local tenants.

citus_stats_tenants show up to citus.stats_tenant_limit number of tenants.
The tenants are scored based on the number of queries they run and the recency of those queries. Every query ran increases the score of tenant by ONE_QUERY_SCORE, and after every period ends the scores are halved. Halving is done lazily.
To retain information a longer the monitor keeps up to 3 times citus.stats_tenant_limit tenants. When the tenant count hits 3 * citus.stats_tenant_limit, last citus.stats_tenant_limit tenants are removed. To see all stored tenants you can use citus_stats_tenants(return_all_tenants := true)

@halilozanakgul halilozanakgul force-pushed the multi-tenant-monitoring branch from 484a6aa to 01eb68a Compare March 6, 2023 14:25
@codecov
Copy link

codecov bot commented Mar 10, 2023

Codecov Report

Merging #6725 (5980d30) into main (d04d32b) will decrease coverage by 1.81%.
The diff coverage is 98.51%.

@@            Coverage Diff             @@
##             main    #6725      +/-   ##
==========================================
- Coverage   93.24%   91.43%   -1.81%     
==========================================
  Files         269      271       +2     
  Lines       57366    57632     +266     
==========================================
- Hits        53489    52698     -791     
- Misses       3877     4934    +1057     

@@ -21,7 +21,7 @@ SELECT citus.clear_network_traffic();
---- at each significant point. These transactions are 2pc

-- fail at DELETE
SELECT citus.mitmproxy('conn.onQuery(query="^DELETE").kill()');
SELECT citus.mitmproxy('conn.onQuery(query="DELETE").kill()');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove all these ^ characters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the queries have the tenant attribute comment at the beginning. So they don't start with, for example, DELETE anymore.

@@ -316,7 +318,9 @@ ORDER BY 1;
view citus_shards_on_worker
view citus_stat_activity
view citus_stat_statements
view citus_stats_tenants
view citus_stats_tenants_storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one and the related function should be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is

@@ -303,3 +303,5 @@ s/(NOTICE: issuing SET LOCAL application_name TO 'citus_rebalancer gpid=)[0-9]+

# shard_rebalancer output, flaky improvement number
s/improvement of 0.1[0-9]* is lower/improvement of 0.1xxxxx is lower/g

s/\/\* attributeTo.*\*\///g
Copy link
Contributor

@JelteF JelteF Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a comment what this replacement is for

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should've mentioned this.
Gökhan has another PR that fixes the string parsing. So this part will be changed a bit
#6753

int selectsInLastPeriod;
int selectsInThisPeriod;

int insertCount;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I probably missed something. Why are we specifically tracking inserts and not other writes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should add them. Should it be INSERT/UPDATE/DELETE for writes and SELECT for reads

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated my PR and added UPDATE and DELETE queries

}
TenantStats *tenantStats = &monitor->tenants[tenantIndex];

LWLockAcquire(&tenantStats->lock, LW_EXCLUSIVE);
Copy link
Member

@marcocitus marcocitus Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this cause contention?

(mainly wondering whether atomic integers would be better, though that might result in multiple locks, so might be worse overall)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had offline discussion. We are keeping this for now, until we run some benchmarks.


if (MultiTenantMonitoringLogLevel != CITUS_LOG_LEVEL_OFF)
{
ereport(NOTICE, (errmsg("total select count = %d, total CPU time = %f "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTICE -> MultiTenantMonitoringLogLevel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this log really necessary? I think I can remove it

Copy link
Contributor

@JelteF JelteF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this is a good start and we can use this as a good base for other sub-PRs to be merged against.

The main issue I see is the locking behaviour. I think we probably want to rethink that a bit. Because the current approach is quite complex and thus unsurprisingly has bugs, some of which I don't know if we can easily fix without changing the logic significantly.

I think the biggest problems stem from the fact that it's trying to divide priorities a bit too much. For example, each backend tries to reorder its own rank all the time. I think the locking logic would become much simpler if we don't do that and instead order the tenants only when we need to truncate the list.

This refactoring of the locking can be done in a separate sub-PR though, to keep it easy to review. Let's use the current branch as a sort-of main branch for now.

INSERT INTO dist_tbl VALUES (1, 'abcd');
INSERT INTO dist_tbl VALUES (2, 'abcd');
UPDATE dist_tbl SET b = a + 1 WHERE a = 3;
UPDATE dist_tbl SET b = a + 1 WHERE a = 4;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there no tenant 4 in the output below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I set the citus.stats_tenants_limit to 2. Tenant 4 didn't run enough queries yet to be in top 2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's set to 2 in pg_regress_multi.pl. Any reason why it's set there instead of in this file? I think having it in this file would be more discoverable, and clearer when reading the test what is going on. I do realize that these tests change anyway in #6761, so maybe changing the default isn't even necessary anymore then.

Comment on lines 255 to 267
void
CitusAttributeToEnd(QueryDesc *queryDesc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comment



/*
* AttributeQueryIfAnnotated assigns the attributes of tenant if the query is annotated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence is a bit strange an it could use a longer description.

Comment on lines 18 to 89
typedef struct TenantStats
{
char tenantAttribute[100];

int colocationGroupId;

int readCount;
double totalReadTime;
int readsInLastPeriod;
int readsInThisPeriod;

int writeCount;
double totalWriteTime;
int writesInLastPeriod;
int writesInThisPeriod;

time_t lastQueryTime;

long long score;
time_t lastScoreReduction;

NamedLWLockTranche namedLockTranche;
LWLock lock;
} TenantStats;

typedef struct MultiTenantMonitor
{
time_t periodStart;

NamedLWLockTranche namedLockTranche;
LWLock lock;

int tenantCount;
TenantStats tenants[FLEXIBLE_ARRAY_MEMBER];
} MultiTenantMonitor;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These structs could use some comments. Some things that would be useful:

  1. What do the structs represent?
  2. What are the different locks for?
  3. How large is the tenants array?



/*
* FindTenantStats finds the dsm (dynamic shared memory) handle for the current tenant's statistics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* FindTenantStats finds the dsm (dynamic shared memory) handle for the current tenant's statistics.
* FindTenantStats finds the index for the current tenant's statistics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch 👍


if (tenantIndex == -1)
{
tenantIndex = CreateTenantStats(monitor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a race condition if two tenants are created at the same time. One way to resolve this would be by incrementing tenantCount atomically in CreateTenantStats. That way concurrent calls would get a different index.

Comment on lines 338 to 340
TenantStats tempTenant = monitor->tenants[tenantIndex];
monitor->tenants[tenantIndex] = monitor->tenants[tenantIndex - 1];
monitor->tenants[tenantIndex - 1] = tempTenant;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copying the tenant lock. Copying locks is not an operation that's supported on them. They require that people look at the same memory locations. So if some other process is waiting on &monitor->tenants[tenantIndex - 1].lock while we do this swap then that's going to cause many problems.

#define ATTRIBUTE_PREFIX "/* attributeTo: "
#define ATTRIBUTE_STRING_FORMAT "/* attributeTo: %s,%d */"
#define CITUS_STATS_TENANTS_COLUMNS 7
#define ONE_QUERY_SCORE 1000000000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this such a high number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value will be halved every period, which is by default 1 minute. So in about half an hour this number will become 0.

* After updating the score we might need to change the rank of the tenant in the monitor
*/
while (tenantIndex != 0 &&
monitor->tenants[tenantIndex - 1].score <
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not threadsafe to do since we don't have a read lock on this tenant.


MultiTenantMonitor *monitor = GetMultiTenantMonitor();

LWLockAcquire(&monitor->lock, LW_EXCLUSIVE);
Copy link
Contributor

@JelteF JelteF Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really broad lock that we shouldn't acquire exclusively when not necessary. Now we acquire it for every update of the stats, effectively making the finer grained per-tenant lock completely unused. We should only take this lock exclusively when modifying the order of the tenants.

Copy link
Contributor

@JelteF JelteF Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notes from meeting

SHARE_LOCK
find tenants
if found tenants -> modify tenant
else
    release lock
    acquire EXCLUSIVE_LOCK
    find not find tenanants -> create tenant
    release ECLUSIVE LOCK
    SHARE_LOCK
    find tenant
    if found tenant -> modify tenant
    else -> stop

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @marcocitus: It's probably nice to only sort the array in the exclusive lock some percentage of the time, for the many tenants case (e.g. deviceids). This can be done either probabilistically, or based on time since last taking the lock to create a tenant, or a combination of both. So e.g. if (lastAdditionByThisProcess > 5 seconds ago || random() > 0.001). Not necessarily now, more as something to try out if benchmarks show perf issues.

*
* Every time tenant count hits CitusStatsTenantsLimit * 3, we reduce it back to CitusStatsTenantsLimit * 2.
*/
if (monitor->tenantCount >= CitusStatsTenantsLimit * 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be part of createtenant check

Comment on lines 73 to 78
/*
* The time that the monitor's latest period started.
* This value is incremented by citus.stats_tenants_period each time.
* The increment is done lazily.
*/
time_t periodStart;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion: We don't need to store this in the monitor anymore. It can be a local variable everywhere that we use it.

* and is 3 * citus.stats_tenants_limit
*/
int tenantCount;
TenantStats tenants[FLEXIBLE_ARRAY_MEMBER];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @marcocitus: Should this be a hashtable? Hashtables are O(1) for lookups, but small arrays can be faster in practice due to cache locality. We use shared memory hash table in SharedConnectionStatsShmemInit. If benchmarks show that the current approach is slow, we could try out a hashtable.

Comment on lines 635 to 639
/*
* If the tenant count reached 3 * CitusStatsTenantsLimit, we evict the tenants
* with the lowest score.
*/
EvictTenantsIfNecessary(queryTime);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to before we insert the new tenant.

static void
RecordTenantStats(TenantStats *tenantStats)
{
tenantStats->score += ONE_QUERY_SCORE;
Copy link
Contributor

@JelteF JelteF Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure this addition doesn't overflow. Making it clamp to MAX_INT64 seems fine.

@halilozanakgul halilozanakgul force-pushed the multi-tenant-monitoring branch 2 times, most recently from 7707875 to 51d167b Compare March 27, 2023 09:16
* The tenant monitoring score of this tenant. This value is increased by ONE_QUERY_SCORE at every query
* and halved after every period.
*/
long long score;
Copy link
Member

@marcocitus marcocitus Mar 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a particular established algorithm? (it looks vaguely familiar)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made research a bit and then made this algorithm up. I'm halving just to make it more efficient by using bit shift left operator <<.
This is open to any suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think explaining the goals of the algorithm in the comment would be good. Also probably good to mention that it's a custom algorithm. Mainly in case we realise it doesn't do what we want in some scenarios.

@@ -487,6 +487,7 @@ sub generate_hba
push(@pgOptions, "citus.enable_manual_changes_to_shards=on");
push(@pgOptions, "citus.allow_unsafe_locks_from_workers=on");
push(@pgOptions, "citus.stat_statements_track = 'all'");
push(@pgOptions, "citus.stats_tenants_limit = 2");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to have this change globally instead of only in the test that we care about it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we set the shared memory at the beginning, I thought I have to set this up before setting up the cluster

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah makes sense

@halilozanakgul halilozanakgul force-pushed the multi-tenant-monitoring branch from d786b2f to d660339 Compare March 27, 2023 14:14
Comment on lines 347 to 371
LWLockAcquire(&monitor->lock, LW_SHARED);

int currentTenantIndex = FindTenantStats(monitor);

if (currentTenantIndex != -1)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

locking logic could definitely use some comments, because it's not trivial



/*
* CreateTenantStats creates the data structure for a tenant's statistics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* CreateTenantStats creates the data structure for a tenant's statistics.
* CreateTenantStats creates the data structure for a tenant's statistics.
* NOTE: Calling this function requires that the monitor->lock is held in LW_EXCLUSIVE mode

partitionColumnType);
}

task->partitionColumn = partitionColumnString;
Copy link
Member

@marcocitus marcocitus Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partitionColumn suggests that this contains a description/Var of the partition column

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also use partitionColumnString then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or maybe partitionColumnValueString?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to partitionKeyValue and it is now a Const* value like in Job struct

gettext_noop("monitor limit"),
NULL,
&CitusStatsTenantsLimit,
10, 1, 100,
Copy link
Member

@marcocitus marcocitus Mar 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These values seem pretty low, which does not make me super confident that we will actually be able to capture the top tenants.

For comparison, our TopN extension defaults to 1000.

Copy link
Contributor

@JelteF JelteF Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's set the default to 100, and the maximum to 10000.

@@ -157,6 +158,8 @@ distributed_planner(Query *parse,
bool fastPathRouterQuery = false;
Node *distributionKeyValue = NULL;

AttributeQueryIfAnnotated(query_string, parse->commandType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we move it ot the end of this function? the logic here is quite important to understanding the planner and already quite messy and verbose

* for the tenant statistics monitoring this function records the tenant attributes.
*/
void
AttributeQueryIfAnnotated(const char *query_string, CmdType commandType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to disable the annotation injection & parsing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just merged #6769 I think it does what you're asking for.

* sleep_until_next_period sleeps until the next monitoring period starts.
*/
Datum
sleep_until_next_period(PG_FUNCTION_ARGS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to a test directory

* clean_citus_stats_tenants cleans the citus_stats_tenants monitor.
*/
Datum
clean_citus_stats_tenants(PG_FUNCTION_ARGS)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be consistent with pg/citus_stat_statements, should we call this citus_stat_tenants_reset and make it a properly exposed function?

@@ -0,0 +1,66 @@
-- cts in the query is an abbreviation for citus_stats_tenants
CREATE OR REPLACE FUNCTION pg_catalog.citus_stats_tenants (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use citus_stat_tenants (without the extra s) to be consistent with existing pg_stat_ and citus_stat_ views?

@mulander this seems like a typo?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is. I have this in the task list in the description. I was waiting for other PRs to be merged to fix this. I'll get to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merged a PR to fix this

partitionColumnType = workerJob->partitionKeyValue->consttype;
partitionColumnString = DatumToString(partitionColumnValue,
partitionColumnType);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe wrap in something like task->partitionColumnValueString = PartitionColumnValueStringFromJob(workerJob) since this seems to repeat in a few places

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

related, do we need to do the string conversion at this particular moment, or could we do it more lazily?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it to end functions

Datum partitionColumnValue;
Oid partitionColumnType = 0;
char *partitionColumnString = NULL;
if (workerJob->partitionKeyValue != NULL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to refactor partitionKeyValue out of workerJob? it seems like it's not the right place, since a job can span across many partition column values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean Task? It was already in the Job before this PR.


if (MultiTenantMonitoringLogLevel != CITUS_LOG_LEVEL_OFF)
{
ereport(NOTICE, (errmsg("attributing query to tenant: %s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTICE -> MultiTenantMonitoringLogLevel ?

(otherwise, why is this a log level?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted this log

* AnnotateQuery annotates the query with tenant attributes.
*/
char *
AnnotateQuery(char *queryString, char *partitionColumn, int colocationId)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partitionColumn sounds like it contains the name of the partition column

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this to partitionKeyValue

static void
AttributeMetricsIfApplicable()
{
if (strcmp(attributeToTenant, "") != 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

if (attributeToTenant[0] == '\0')
{
    /* no tenant ID to attribute */
    return;
}

clock_t end = { 0 };

end = clock();
time_t queryTime = time(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we usually use postgres' time functions like GetCurrentTimestamp()

{
LWLockRelease(&monitor->lock);

LWLockAcquire(&monitor->lock, LW_EXCLUSIVE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should be careful that this does not happen often for typical CRUD workloads (e.g. distribution column is bigserial), otherwise we get significant contention

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm turning the tracking off by default

Comment on lines 44 to 53
char attributeToTenant[MAX_TENANT_ATTRIBUTE_LENGTH] = "";
CmdType attributeToCommandType = CMD_UNKNOWN;
int attributeToColocationGroupId = INVALID_COLOCATION_ID;

const char *SharedMemoryNameForMultiTenantMonitor =
"Shared memory for multi tenant monitor";

char *tenantTrancheName = "Tenant Tranche";
char *monitorTrancheName = "Multi Tenant Monitor Tranche";
Copy link
Contributor

@JelteF JelteF Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: These could all be static afaict, because they are not used outside of this file. Also in general we try to use CapitalizedCamelCase for global variables to make it clear that they are special.

Suggested change
char attributeToTenant[MAX_TENANT_ATTRIBUTE_LENGTH] = "";
CmdType attributeToCommandType = CMD_UNKNOWN;
int attributeToColocationGroupId = INVALID_COLOCATION_ID;
const char *SharedMemoryNameForMultiTenantMonitor =
"Shared memory for multi tenant monitor";
char *tenantTrancheName = "Tenant Tranche";
char *monitorTrancheName = "Multi Tenant Monitor Tranche";
char attributeToTenant[MAX_TENANT_ATTRIBUTE_LENGTH] = "";
CmdType attributeToCommandType = CMD_UNKNOWN;
int attributeToColocationGroupId = INVALID_COLOCATION_ID;
const char *SharedMemoryNameForMultiTenantMonitor =
"Shared memory for multi tenant monitor";
char *tenantTrancheName = "Tenant Tranche";
char *monitorTrancheName = "Multi Tenant Monitor Tranche";

@@ -2283,6 +2304,37 @@ RegisterCitusConfigVariables(void)
GUC_STANDARD,
NULL, NULL, NULL);

DefineCustomEnumVariable(
"citus.stat_tenants_track",
gettext_noop("enable disable"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The descriptions of these GUCs should be improved

gettext_noop("monitor period"),
NULL,
&CitusStatsTenantsPeriod,
60, 1, 1000000000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max seems quite excessively high. I think 1 day is probably the most that it useful to anyone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially made this 60 * 60, so 1 hour. But in testing I needed to make sure a period doesn't pass when I don't want it to. That's why I increased it to this number.
What do you think I can do instead?

Copy link
Contributor

@JelteF JelteF Apr 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

60*60*24 seems like a good maximum

@halilozanakgul halilozanakgul force-pushed the multi-tenant-monitoring branch from 3e2cf73 to e302be1 Compare April 3, 2023 15:14
@halilozanakgul halilozanakgul marked this pull request as ready for review April 4, 2023 10:52
@halilozanakgul halilozanakgul force-pushed the multi-tenant-monitoring branch from aeb604b to 14ad3ef Compare April 4, 2023 10:54
halilozanakgul and others added 12 commits April 5, 2023 17:06
Add a view that collects statistics from all nodes
Creates log message for citus_stats_tenants
This pull request modifies our query annotation and parsing logic by
using a JSON-structured annotation string. Basically, it prepends a JSON
string in a multiline comment that contains `tenantId` and
`colocationId` to a query string to be able to track query statistics on
the worker nodes. It also parses the received annotation in the query
string and sets the relevant tenantId and colocationId on the worker
nodes.

---------

Co-authored-by: Jelte Fennema <github-tech@jeltef.nl>
Adds a GUC for turning tenant monitoring on and off
#6812)

This PR revokes permissions from stats views and functions from the
public role and grants `SELECT` and `EXECUTE` to pg_monitor.
Rename some objects
`*citus_stats_tenants*` to `*citus_stat_tenants*`
`citus_stats_tenants*` and `citus_stat_tenants*` GUCs to `stat_tenants*`
`MultiTenantMonitoringLogLevel` to `CitusStatTenantsLogLevel`
`multi_tenant_monitoring_log_level` to `citus.stat_tenants_log_level`
`attribute` files to `citus_stat_tenants`
Turn it back on for in tests
Also address some reviews
@halilozanakgul halilozanakgul force-pushed the multi-tenant-monitoring branch from 3df5a55 to 83bfe90 Compare April 5, 2023 14:06
@halilozanakgul halilozanakgul merged commit 52ad2d0 into main Apr 5, 2023
@halilozanakgul halilozanakgul deleted the multi-tenant-monitoring branch April 5, 2023 14:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants