-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Usage Counters] Enhancements to the APIs #187665
[Usage Counters] Enhancements to the APIs #187665
Conversation
9596133
to
e2eb96d
Compare
usage-counters
e2eb96d
to
9385678
Compare
usage-counters
Pinging @elastic/kibana-core (Team:Core) |
c99c82a
to
f0904a4
Compare
@@ -24,6 +24,9 @@ | |||
"@kbn/logging", | |||
"@kbn/ebt", | |||
"@kbn/core-saved-objects-server", | |||
"@kbn/core-saved-objects-api-server-internal", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this internal Core module import (@kbn/core-saved-objects-api-server-internal
)? Is that only to stub getCurrentTime
during the integration tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I'm afraid so, it bothers me too.
Perhaps I can try to find a cleaner way to insert old counters for testing purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In f90542c I used the standard incrementCounter
to create the counters, and then I used esClient.updateByQuery()
to modify their updated_at
dates. Not ideal, but cleaner than the mock IMO, and addresses your feedback above.
a115044
to
dfd4335
Compare
) => Collector<TFetchReturn, ExtraOptions>; | ||
} | ||
/** Plugin's setup API **/ | ||
export type UsageCollectionSetup = ICollectorSet & UsageCountersServiceSetup; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f9c4b4f
to
f90542c
Compare
@@ -128,11 +127,6 @@ export class KibanaUsageCollectionPlugin implements Plugin { | |||
|
|||
registerUiCountersUsageCollector(usageCollection, this.logger); | |||
|
|||
registerUsageCountersRollups( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the rollups
logic (aka delete counters older than 5 days) to the usage_collection
plugin, IMO it makes more sense to have it there:
- It is specific of usage counters,
kibana_usage_collection
handles plenty of other collectors. - If someone disables
kibana_usage_collection
, usage counters would be captured and persisted indefinitely.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me! Thanks!
8937136
to
b185053
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation looks fine to me, but I'm not the one with the most knowledge of this area, so a second review would probably make sense
public search = async ( | ||
params: UsageCountersSearchParams, | ||
options: UsageCountersSearchOptions = {} | ||
): Promise<UsageCountersSearchResult> => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: we're not following our service pattern here, this method shouldn't be public / used directly. But it's not really that significant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ ATM I am working on some enhancements that include making this private.
UPDATE: Addressed with fe9fb62
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. I added a few comments that I'd like to discuss before approving.
@@ -128,11 +127,6 @@ export class KibanaUsageCollectionPlugin implements Plugin { | |||
|
|||
registerUiCountersUsageCollector(usageCollection, this.logger); | |||
|
|||
registerUsageCountersRollups( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me! Thanks!
search: () => { | ||
throw new Error('Usage Counters are not enabled.'); | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: how about returning an empty response instead? I wonder if throwing this error will create the need for plugins to check if it's enabled (and we don't provide an API to share if it's enabled or not).
Alternatively, we could allow searching, only that, when enabled: false
, we don't store more data. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is I only obtain the search method when calling the service.start()
. Either:
- I call start (and we'll have a few RxJS timers running without buffering any events)
- Or I make the
search
method public (we break consistency with other services). - Or I expose the
search
in the response of thestop()
hook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose the 3rd option. Fixed in f58e23b
await internalRepository.find<UsageCountersSavedObjectAttributes>({ | ||
type: USAGE_COUNTERS_SAVED_OBJECT_TYPE, | ||
namespaces: ['*'], | ||
perPage: 1000, // Process 1000 at a time as a compromise of speed and overload | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I don't think this approach is valid anymore...
If we have 12 dashboards viewed every day during the last 90 days, they'll show up in the first page and we won't remove others...
I wonder if we should change this to group per retention days (domainId: 'dashboard' and updated_at < retentionPeriodDays
vs. not domainId: 'dashboard' and updated_at < USAGE_COUNTERS_KEEP_DOCS_FOR_DAYS
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! we'll have to think about a better strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After discussion, we can do a search by domainId, and filter by updated_at < (now - retentionPeriodDays)
.
We can then loop through the different domain IDs.
Will assess bulkDelete
following @TinaHeiligers's latest comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? internalRepository.delete(type, id, { namespace: namespaces[0] }) | ||
: internalRepository.delete(type, id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can do it in a follow up: but we can now use the bulkDelete API :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll see about that, cause I believe the bulkDelete
does not allow deleting from multiple namespaces
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will when you add 'force'. Please check that the option hasn't changed.
perPage, | ||
page, | ||
}; | ||
const res = await repository.find<UsageCountersSavedObjectAttributes>(findParams); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we use PIT search?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally, added with #187665 (comment)
/** | ||
* Defines custom retention period for the counters under this domain. | ||
* This is the number of days worth of counters that must be kept in the system indices. | ||
* Defaults to 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't see us defaulting it anywhere here.
How about defaulting retentionPeriodDays = USAGE_COUNTERS_KEEP_DOCS_FOR_DAYS
if not provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's defaulted in the rollup logic, this way the in-memory UsageCounter
's are lighter (no need to store the property if it matches the default value).
const res = await repository.find<UsageCountersSavedObjectAttributes>(findParams); | ||
|
||
const countersMap = new Map<string, UsageCounterSnapshot>(); | ||
res.saved_objects.forEach(({ attributes, updated_at: updatedAt, namespaces }) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, aggregations might better achieve what we're after here.
I imagine an API like:
usageCounters.search({
filters: {
domainId,
counterName,
counterType,
source,
timestamp: { [lt|lte|gt|gte]: Date }
},
aggregation_keys: [
'domainId',
'counterName',
'counterType',
'source',
'timestamp'
]
});
That we internally map to an aggregation call and return flattened.
So... for a counter with this structure:
{
"domainId": "dashboard",
"counterName": "<dashboard-id>",
"counterType": "views",
"source": "server"
}
- If I only care about the grand total of my N-day retention period for all my dashboards, I'd call the API like
usageCounters.search({
filters: {
domainId: "dashboard",
counterType: "views",
source: "server",
},
aggregation_keys: []
});
And I'd get a response like
{
counters: [
{
domainId: "dashboard",
counterType: "views",
source: "server",
count: 9999999,
}
]
}
- If I want the grand total for each dashboard, I'd call the API like
usageCounters.search({
filters: {
domainId: "dashboard",
counterType: "views",
source: "server",
},
aggregation_keys: [
'counterName'
]
});
And I'd get a response like
{
counters: [
{
domainId: "dashboard",
counterType: "views",
source: "server",
counterName: "dashboard-1"
count: 10,
},
{
domainId: "dashboard",
counterType: "views",
source: "server",
counterName: "dashboard-2"
count: 9999989,
}
]
}
- If I want the histogram for a specific dashboard and time range, I'd call the API like
usageCounters.search({
filters: {
domainId: "dashboard",
counterType: "views",
counterName: "dashboard-2",
source: "server",
timestamp: {
gte: "2024-07-01T00:00:00.000Z",
lte: "2024-07-05T00:00:00.000Z",
}
},
aggregation_keys: [
'timestamp'
]
});
And I'd get a response like
{
counters: [
{
domainId: "dashboard",
counterType: "views",
counterName: "dashboard-2"
source: "server",
timestamp: "2024-07-01T00:00:00.000Z"
count: 10,
},
{
domainId: "dashboard",
counterType: "views",
counterName: "dashboard-2"
source: "server",
timestamp: "2024-07-03T00:00:00.000Z"
count: 9999989,
}
]
}
The benefits of the aggregations are:
- No pagination required
- Can be requested at any level
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an interesting proposal!
In the current use case, Anton still has to retrieve individual days counters to show them in the UI, so I went for the simplest approach possible, and let him aggregate counts on his side (saving 1 call in the process).
But I agree that it seems desirable to perform the aggregations on our side long term, makes for a more elegant API. Regarding pagination, when retrieving individual results you might still have plenty, but we currently circumvent this by allowing the from: string
parameter. This way we can filter and only get counters that are more recent than a certain date (e.g. now - 90d
).
Let's discuss this offline!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding pagination, when retrieving individual results you might still have plenty, but we currently circumvent this by allowing the
from: string
parameter. This way we can filter and only get counters that are more recent than a certain date (e.g.now - 90d
).
AFAIK, the recommended way to paginate is via PIT for various reasons:
- The
from
+size
technique is limited to 10_000 entries: https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html (no matter if they are 10 pages of 1000 or 1000 pages of 10) - If updates occur in the process, the list will be resorted, so getting the 2nd+ pages may return previously fetched documents if new documents are indexed during the pagination.
AFAIK, we'll always retrieve all values queried, since the intention is not to show these values in a table that the user can paginate. So I don't think we're saving ourselves from any potential issues.
Anton still has to retrieve individual days counters to show them in the UI
@Dosant, just FMI, will you retrieve all days for the histogram, and add them up to get the total? Or will the histogram be of the last 7 days and the total will account for the entire retention period (30d? 90d?)
Let's discuss this offline!
Sure! Happy to meet when you're back :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
54c6a12
to
0240e57
Compare
💛 Build succeeded, but was flaky
Failed CI StepsMetrics [docs]Public APIs missing comments
Public APIs missing exports
History
|
0240e57
to
f58e23b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -32,7 +32,7 @@ interface CloudUsage { | |||
} | |||
|
|||
export function createCloudUsageCollector( | |||
usageCollection: UsageCollectionSetup, | |||
usageCollection: ICollectorSet, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think this is still UsageCollectionSetup
. It's getting the entire plugin contract.
Prob what triggered is the name ICollectorSet
. It sounds a bit confusing for a CollectorSet to be needed to create a usage collector...
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that for a while, and I think the root problem is "naming is hard".
IMO it would make more sense to name it CollectorManager
instead of ICollectorSet
.
Then, UsageCollectionSetup
would have both the CollectorManager & UsageCountersServiceSetup
.
NB the code above does not need anything about the UsageCounters
.
The problem is that lots of plugins are already using the global UsageCollectionSetup
, so changing all the references to the more specific CollectorManager
would take some work + codeowners approvals.
I only changed this one cause it falls under our ownership.
I can rollback these changes and we can tackle this on a separate PR.
UPDATE: rolled back with b8abddb
if (toDelete.length === ROLLUP_BATCH_SIZE) { | ||
// we found a lot of old Usage Counters, put the counter back in the queue, as there might be more | ||
counterQueue.push(counter); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧡
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Data Discovery changes LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x-pack/plugins/reporting/server/routes/
change due to interface changes lgtm.
I haven't rebased my frontend work yet to check if everything is still looking good, I'll try today or tomorrow. but don't want to block, up to you if you'd like to wait Looks good!
💛 Build succeeded, but was flaky
Failed CI StepsTest Failures
Metrics [docs]Public APIs missing comments
Public APIs missing exports
History
To update your PR or re-run it, just comment with: |
Summary
Part of #186530
Follow-up of #187064
The goal of this PR is to provide the necessary means to allow implementing the Counting views part of the Dashboards++ initiative.
We do this by extending the capabilities of the usage counters APIs: