Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage Counters] Enhancements to the APIs #187665

Merged
merged 20 commits into from
Aug 5, 2024

Conversation

gsoldevila
Copy link
Contributor

@gsoldevila gsoldevila commented Jul 5, 2024

Summary

Part of #186530
Follow-up of #187064

The goal of this PR is to provide the necessary means to allow implementing the Counting views part of the Dashboards++ initiative.
We do this by extending the capabilities of the usage counters APIs:

  • We support custom retention periods. Currently data is only kept in SO indices for 5 days. Having 90 days worth of counting was required for Dashboards++.
  • We expose a Search API that will allow retrieving persisted counters.

@gsoldevila gsoldevila added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Feature:Telemetry release_note:skip Skip the PR/issue when compiling release notes backport:skip This commit does not require backporting v8.16.0 labels Jul 5, 2024
@gsoldevila gsoldevila force-pushed the kbn-usage-counters-search-api branch 2 times, most recently from 9596133 to e2eb96d Compare July 8, 2024 11:04
@gsoldevila gsoldevila changed the title Draft [Usage Counters] Add API to support searching / retrieving persisted usage-counters Jul 8, 2024
@gsoldevila gsoldevila force-pushed the kbn-usage-counters-search-api branch from e2eb96d to 9385678 Compare July 8, 2024 14:56
@gsoldevila gsoldevila changed the title [Usage Counters] Add API to support searching / retrieving persisted usage-counters [Usage Counters] Enhancements to the APIs Jul 9, 2024
@gsoldevila gsoldevila marked this pull request as ready for review July 9, 2024 14:24
@gsoldevila gsoldevila requested a review from a team as a code owner July 9, 2024 14:24
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@gsoldevila gsoldevila force-pushed the kbn-usage-counters-search-api branch from c99c82a to f0904a4 Compare July 10, 2024 10:34
@@ -24,6 +24,9 @@
"@kbn/logging",
"@kbn/ebt",
"@kbn/core-saved-objects-server",
"@kbn/core-saved-objects-api-server-internal",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this internal Core module import (@kbn/core-saved-objects-api-server-internal)? Is that only to stub getCurrentTime during the integration tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'm afraid so, it bothers me too.
Perhaps I can try to find a cleaner way to insert old counters for testing purposes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In f90542c I used the standard incrementCounter to create the counters, and then I used esClient.updateByQuery() to modify their updated_at dates. Not ideal, but cleaner than the mock IMO, and addresses your feedback above.

@gsoldevila gsoldevila force-pushed the kbn-usage-counters-search-api branch 2 times, most recently from a115044 to dfd4335 Compare July 12, 2024 12:26
) => Collector<TFetchReturn, ExtraOptions>;
}
/** Plugin's setup API **/
export type UsageCollectionSetup = ICollectorSet & UsageCountersServiceSetup;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afharo @Bamieh I have extracted all of the CollectorSet methods into a separate interface ICollectorSet, in an effort to make it cleaner / clearer that the usage-collection plugin offers 2 different things in the setup contract.

@gsoldevila gsoldevila force-pushed the kbn-usage-counters-search-api branch 4 times, most recently from f9c4b4f to f90542c Compare July 16, 2024 16:27
@@ -128,11 +127,6 @@ export class KibanaUsageCollectionPlugin implements Plugin {

registerUiCountersUsageCollector(usageCollection, this.logger);

registerUsageCountersRollups(
Copy link
Contributor Author

@gsoldevila gsoldevila Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the rollups logic (aka delete counters older than 5 days) to the usage_collection plugin, IMO it makes more sense to have it there:

  • It is specific of usage counters, kibana_usage_collection handles plenty of other collectors.
  • If someone disables kibana_usage_collection, usage counters would be captured and persisted indefinitely.

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me! Thanks!

@gsoldevila gsoldevila force-pushed the kbn-usage-counters-search-api branch 2 times, most recently from 8937136 to b185053 Compare July 17, 2024 07:42
Copy link
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation looks fine to me, but I'm not the one with the most knowledge of this area, so a second review would probably make sense

Comment on lines 209 to 231
public search = async (
params: UsageCountersSearchParams,
options: UsageCountersSearchOptions = {}
): Promise<UsageCountersSearchResult> => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: we're not following our service pattern here, this method shouldn't be public / used directly. But it's not really that significant.

Copy link
Contributor Author

@gsoldevila gsoldevila Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ ATM I am working on some enhancements that include making this private.
UPDATE: Addressed with fe9fb62

Copy link
Member

@afharo afharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. I added a few comments that I'd like to discuss before approving.

@@ -128,11 +127,6 @@ export class KibanaUsageCollectionPlugin implements Plugin {

registerUiCountersUsageCollector(usageCollection, this.logger);

registerUsageCountersRollups(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me! Thanks!

Comment on lines 107 to 109
search: () => {
throw new Error('Usage Counters are not enabled.');
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: how about returning an empty response instead? I wonder if throwing this error will create the need for plugins to check if it's enabled (and we don't provide an API to share if it's enabled or not).

Alternatively, we could allow searching, only that, when enabled: false, we don't store more data. WDYT?

Copy link
Contributor Author

@gsoldevila gsoldevila Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is I only obtain the search method when calling the service.start(). Either:

  • I call start (and we'll have a few RxJS timers running without buffering any events)
  • Or I make the search method public (we break consistency with other services).
  • Or I expose the search in the response of the stop() hook.

Copy link
Contributor Author

@gsoldevila gsoldevila Aug 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose the 3rd option. Fixed in f58e23b

Comment on lines 33 to 37
await internalRepository.find<UsageCountersSavedObjectAttributes>({
type: USAGE_COUNTERS_SAVED_OBJECT_TYPE,
namespaces: ['*'],
perPage: 1000, // Process 1000 at a time as a compromise of speed and overload
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I don't think this approach is valid anymore...

If we have 12 dashboards viewed every day during the last 90 days, they'll show up in the first page and we won't remove others...

I wonder if we should change this to group per retention days (domainId: 'dashboard' and updated_at < retentionPeriodDays vs. not domainId: 'dashboard' and updated_at < USAGE_COUNTERS_KEEP_DOCS_FOR_DAYS).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! we'll have to think about a better strategy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion, we can do a search by domainId, and filter by updated_at < (now - retentionPeriodDays).
We can then loop through the different domain IDs.
Will assess bulkDelete following @TinaHeiligers's latest comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afharo updated with 357b474

Comment on lines 52 to 53
? internalRepository.delete(type, id, { namespace: namespaces[0] })
: internalRepository.delete(type, id)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can do it in a follow up: but we can now use the bulkDelete API :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll see about that, cause I believe the bulkDelete does not allow deleting from multiple namespaces.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will when you add 'force'. Please check that the option hasn't changed.

perPage,
page,
};
const res = await repository.find<UsageCountersSavedObjectAttributes>(findParams);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use PIT search?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally, added with #187665 (comment)

/**
* Defines custom retention period for the counters under this domain.
* This is the number of days worth of counters that must be kept in the system indices.
* Defaults to 5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I don't see us defaulting it anywhere here.

How about defaulting retentionPeriodDays = USAGE_COUNTERS_KEEP_DOCS_FOR_DAYS if not provided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's defaulted in the rollup logic, this way the in-memory UsageCounter's are lighter (no need to store the property if it matches the default value).

const res = await repository.find<UsageCountersSavedObjectAttributes>(findParams);

const countersMap = new Map<string, UsageCounterSnapshot>();
res.saved_objects.forEach(({ attributes, updated_at: updatedAt, namespaces }) => {
Copy link
Member

@afharo afharo Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, aggregations might better achieve what we're after here.

I imagine an API like:

usageCounters.search({
  filters: {
    domainId,
    counterName,
    counterType,
    source,
    timestamp: { [lt|lte|gt|gte]: Date }
  },
  aggregation_keys: [
    'domainId',
    'counterName',
    'counterType',
    'source',
    'timestamp'
  ]
});

That we internally map to an aggregation call and return flattened.

So... for a counter with this structure:

{
  "domainId": "dashboard",
  "counterName": "<dashboard-id>",
  "counterType": "views",
  "source": "server"
}
  1. If I only care about the grand total of my N-day retention period for all my dashboards, I'd call the API like
usageCounters.search({ 
  filters: {
    domainId: "dashboard",
    counterType: "views",
    source: "server",
  },
  aggregation_keys: []
});

And I'd get a response like

{
  counters: [
    { 
      domainId: "dashboard",
      counterType: "views",
      source: "server",
      count: 9999999,
    }
  ]
}
  1. If I want the grand total for each dashboard, I'd call the API like
usageCounters.search({ 
  filters: {
    domainId: "dashboard",
    counterType: "views",
    source: "server",
  },
  aggregation_keys: [
    'counterName'
  ]
});

And I'd get a response like

{
  counters: [
    { 
      domainId: "dashboard",
      counterType: "views",
      source: "server",
      counterName: "dashboard-1"
      count: 10,
    },
    { 
      domainId: "dashboard",
      counterType: "views",
      source: "server",
      counterName: "dashboard-2"
      count: 9999989,
    }
  ]
}
  1. If I want the histogram for a specific dashboard and time range, I'd call the API like
usageCounters.search({ 
  filters: {
    domainId: "dashboard",
    counterType: "views",
    counterName: "dashboard-2",
    source: "server",
    timestamp: {
      gte: "2024-07-01T00:00:00.000Z",
      lte: "2024-07-05T00:00:00.000Z",
    }
  },
  aggregation_keys: [
    'timestamp'
  ]
});

And I'd get a response like

{
  counters: [
    { 
      domainId: "dashboard",
      counterType: "views",
      counterName: "dashboard-2"
      source: "server",
      timestamp: "2024-07-01T00:00:00.000Z"
      count: 10,
    },
    { 
      domainId: "dashboard",
      counterType: "views",
      counterName: "dashboard-2"
      source: "server",
      timestamp: "2024-07-03T00:00:00.000Z"
      count: 9999989,
    }
  ]
}

The benefits of the aggregations are:

  • No pagination required
  • Can be requested at any level

WDYT?

Copy link
Contributor Author

@gsoldevila gsoldevila Jul 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an interesting proposal!

In the current use case, Anton still has to retrieve individual days counters to show them in the UI, so I went for the simplest approach possible, and let him aggregate counts on his side (saving 1 call in the process).

But I agree that it seems desirable to perform the aggregations on our side long term, makes for a more elegant API. Regarding pagination, when retrieving individual results you might still have plenty, but we currently circumvent this by allowing the from: string parameter. This way we can filter and only get counters that are more recent than a certain date (e.g. now - 90d).

Let's discuss this offline!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding pagination, when retrieving individual results you might still have plenty, but we currently circumvent this by allowing the from: string parameter. This way we can filter and only get counters that are more recent than a certain date (e.g. now - 90d).

AFAIK, the recommended way to paginate is via PIT for various reasons:

  1. The from + size technique is limited to 10_000 entries: https://www.elastic.co/guide/en/elasticsearch/reference/current/paginate-search-results.html (no matter if they are 10 pages of 1000 or 1000 pages of 10)
  2. If updates occur in the process, the list will be resorted, so getting the 2nd+ pages may return previously fetched documents if new documents are indexed during the pagination.

AFAIK, we'll always retrieve all values queried, since the intention is not to show these values in a table that the user can paginate. So I don't think we're saving ourselves from any potential issues.

Anton still has to retrieve individual days counters to show them in the UI

@Dosant, just FMI, will you retrieve all days for the histogram, and add them up to get the total? Or will the histogram be of the last 7 days and the total will account for the entire retention period (30d? 90d?)

Let's discuss this offline!

Sure! Happy to meet when you're back :)

Copy link
Contributor

@Dosant Dosant Jul 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@afharo, I retrieve last 90 days using from. Sum them up to get a "total" to display "Views in last 90 days" and bucket into weeks to display a weekly histogram #187993

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussion with @afharo, we agreed that:

  • We can leave aggregations for a later phase. I prepared the ground by encapsulating current filters under filters property.
  • We must implement PIT search.

These changes have been implemented in d8e86e4

@gsoldevila gsoldevila force-pushed the kbn-usage-counters-search-api branch from 54c6a12 to 0240e57 Compare July 19, 2024 15:35
@elasticmachine
Copy link
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
usageCollection 16 14 -2

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
usageCollection 2 4 +2
Unknown metric groups

API count

id before after diff
usageCollection 56 50 -6

History

  • 💚 Build #222027 succeeded 54c6a1252d2fc537072540eacedd2634a64a1b47
  • 💚 Build #221977 succeeded b18505344cc7c115388daa9a5bb2dff9253c081b
  • 💔 Build #221842 failed 27676f964e053139b5abb1ecca44df3eb525b3d4
  • 💚 Build #221380 succeeded f9c4b4fcdee5c35a085e7a9c3e11c2cda4aae1ec
  • 💔 Build #221330 failed f4e8e60e6bb69907d8b88cbdcdb428ea4fc759c4
  • 💔 Build #221124 failed bd5f093367e32b29e1acc80b3c98096cb623d6fe

@gsoldevila gsoldevila force-pushed the kbn-usage-counters-search-api branch from 0240e57 to f58e23b Compare August 1, 2024 11:57
Copy link
Member

@afharo afharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -32,7 +32,7 @@ interface CloudUsage {
}

export function createCloudUsageCollector(
usageCollection: UsageCollectionSetup,
usageCollection: ICollectorSet,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think this is still UsageCollectionSetup. It's getting the entire plugin contract.
Prob what triggered is the name ICollectorSet. It sounds a bit confusing for a CollectorSet to be needed to create a usage collector...

WDYT?

Copy link
Contributor Author

@gsoldevila gsoldevila Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that for a while, and I think the root problem is "naming is hard".

IMO it would make more sense to name it CollectorManager instead of ICollectorSet.
Then, UsageCollectionSetup would have both the CollectorManager & UsageCountersServiceSetup.
NB the code above does not need anything about the UsageCounters.

The problem is that lots of plugins are already using the global UsageCollectionSetup, so changing all the references to the more specific CollectorManager would take some work + codeowners approvals.
I only changed this one cause it falls under our ownership.

I can rollback these changes and we can tackle this on a separate PR.

UPDATE: rolled back with b8abddb

@gsoldevila gsoldevila requested a review from a team as a code owner August 2, 2024 08:10
@gsoldevila gsoldevila requested a review from a team as a code owner August 2, 2024 10:27
Comment on lines +88 to +91
if (toDelete.length === ROLLUP_BATCH_SIZE) {
// we found a lot of old Usage Counters, put the counter back in the queue, as there might be more
counterQueue.push(counter);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧡

Copy link
Contributor

@davismcphee davismcphee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data Discovery changes LGTM

@gsoldevila gsoldevila enabled auto-merge (squash) August 5, 2024 09:01
Copy link
Contributor

@Dosant Dosant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x-pack/plugins/reporting/server/routes/ change due to interface changes lgtm.

I haven't rebased my frontend work yet to check if everything is still looking good, I'll try today or tomorrow. but don't want to block, up to you if you'd like to wait Looks good!

@gsoldevila gsoldevila disabled auto-merge August 5, 2024 09:10
@gsoldevila gsoldevila enabled auto-merge (squash) August 5, 2024 13:11
@kibana-ci
Copy link
Collaborator

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #19 / Dataset Quality Dataset quality table filters shows full dataset names when toggled

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
usageCollection 16 14 -2

Public APIs missing exports

Total count of every type that is part of your API that should be exported but is not. This will cause broken links in the API documentation system. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats exports for more detailed information.

id before after diff
usageCollection 2 4 +2
Unknown metric groups

API count

id before after diff
usageCollection 56 51 -5

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@gsoldevila gsoldevila merged commit d9c1f97 into elastic:main Aug 5, 2024
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting Feature:Telemetry release_note:skip Skip the PR/issue when compiling release notes Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants