Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ES|QL] CSV Reporting in Discover #174511

Merged
merged 20 commits into from
Jan 17, 2024
Merged

Conversation

Dosant
Copy link
Contributor

@Dosant Dosant commented Jan 9, 2024

Summary

close #173390
This PR enables CSV report generation with ES|QL in Discover.

Pre this PR, there are two report types for generating CSV reports with Discover:

Since we plan to migrate to the v2 report type and search source is not needed for es|ql, this PR implements es|ql csv reporting based on the v2 report type.

Initially, I wanted to create new new report type similar to v2 just for es|ql, but it turned out to be a lot more boilerplate code without a significant benefit. you can see it in the PR #174448. So I changed my mind and the current PR adds es|ql capabilities inside the existing csv_v2 report. This is convenient as the input is the same (Discover locator), the output is the same csv file and meta information, and telemetry is also the same.

As of this PR, the ES|QL report is capable of:

  • Using es|ql query from the locator
  • Adding time range filter if present in the locator. time field is picked from the data view (it is available in the locator, but otherwise is not used for es|ql). Other filters are also passed if available in the locator
  • Keeps into account "columns" from the locator.
  • Similar to current non-es|ql reports from discover UI, it doesn't use saved searches but only relies on the state from the URL. This probably will be improved in [Discover/CSV Reporting] Use the new CSV export endpoint in Discover UI #151190 to support both.
  • Uses different CSV settings for functionality like checking formulas, escapes, bom, max size, etc...
  • Keeps regular CSV features like cancelation and giving event loop a break (even though those are not needed for now for es|ql since the number of results is limited)

Some notable differences compared to regular discover search / csv reports:

  • A lot simpler, as it doesn't use search source and field formats
  • No pagination, less CPU heavy as esql responses are limited to 10000 results and a single request

Questions

esql timeouts?

Esql search strategy has custom default timeout logic. But csv reporting has a config for request timeout overrides. Do we need to use the reporting overrides or should we keep default esql timeout logic?

@Dosant
Copy link
Contributor Author

Dosant commented Jan 9, 2024

/ci

@@ -665,3 +665,13 @@ export interface ESQLSearchReponse {
columns: ESQLColumn[];
values: ESQLRow[];
}

export interface ESQLSearchParams {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was just moved from common/search/expressions/esql

uiSettings: IUiSettingsClient;
}

export class CsvESQLGenerator {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a copy of CsvGenerator with changes for es|ql generation. I kept most of the CSV building-specific things like maxSizeBytes, bom, escapeFormulaValues, cancelation, and CPU breaks, and reused the same CSV settings. Inputs/Outputs/errors/warnings are mostly the same

// https://github.com/elastic/elasticsearch/pull/102767
// timezone
// TODO: pass locale from browser?
// locale
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wasn't available in the locator, I wasn't eager to add it as wasn't sure if this was significant. Happy to add if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would locale be used for? Is it different than timezone?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I wonder if we should not use field formatters for the non-ES|QL CSV export as well. Do you know if any plans are changing with field formatters?

Copy link
Contributor Author

@Dosant Dosant Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if any plans are changing with field formatters?

don't know :(

I wonder if we should not use field formatters for the non-ES|QL CSV export as well.

I thought that the idea was to try to match the Discover table as closely as possible. I think it makes sense to keep the formatters for non-ESQL until they are there in the Discover table.

@@ -183,7 +183,9 @@ export const getTopNavLinks = ({
objectId: savedSearch.id,
objectType: 'search',
sharingData: {
...sharingData,
isTextBased,
locatorParams: [{ id: locator.id, params }],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

locatorParams will be used for es|ql report (csv_v2), searchSourceSharingData is still used for other reports.
Hopefully we will only need locatorParams when fully migrated to csv_v2 #151190

@@ -251,7 +251,8 @@ class ReportingPanelContentUi extends Component<Props, State> {
case PDF_REPORT_TYPE_V2:
return 'PDF';
case CSV_REPORT_TYPE:
return 'csv';
case CSV_REPORT_TYPE_V2:
return 'CSV';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixing #173837

@@ -87,14 +87,10 @@ export class CsvV2ExportType extends ExportType<
throw Boom.badRequest('Invalid Job params: must contain a single Discover App locator');
}

if (!params || !params.savedSearchId || typeof params.savedSearchId !== 'string') {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the non es|ql reports csv_v2 still needs the savedSearchId. this is asserted deeper inside the code branch.

es|ql reports are working without savedSearchId

full support and alignment for saved and non saved searches should be implemented in #151190

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ on the idea to implement ES|QL CSV export within the csv_v2 export type.

@Dosant Dosant changed the title esql csv 2 [ES|QL] CSV Reporting in Discover Jan 9, 2024
const escape = this.escapeValues(settings);

for (const column of columns) {
rowDefinition.push(escape(`${dataTableRow[column]}`));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe as you had it was intentional, but this looks the same as:

Suggested change
rowDefinition.push(escape(`${dataTableRow[column]}`));
rowDefinition.push(escape(dataTableRow[column]));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

safe typecasting from unknown to string + if there is anything something then string, would be safely converted to string

@@ -8,7 +8,7 @@
import { i18n } from '@kbn/i18n';
import React from 'react';

import { CSV_JOB_TYPE } from '@kbn/reporting-export-types-csv-common';
import { CSV_JOB_TYPE, CSV_JOB_TYPE_V2 } from '@kbn/reporting-export-types-csv-common';
Copy link
Member

@tsullivan tsullivan Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just like to give a head's up that this file might be changing quite a bit in #171253

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😢

@Dosant Dosant marked this pull request as ready for review January 11, 2024 10:49
@Dosant Dosant requested review from a team as code owners January 11, 2024 10:49
@Dosant Dosant added (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead release_note:feature Makes this part of the condensed release notes Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience) Feature:ES|QL ES|QL related features in Kibana labels Jan 11, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/appex-sharedux (Team:SharedUX)

@Dosant Dosant requested a review from stratoula January 11, 2024 10:49
@stratoula
Copy link
Contributor

/ci

@stratoula
Copy link
Contributor

From my tests so far it works great! About your questions:

  • Locale is used as of now by some date functions eg. date_format() to localize the output. I am passing the i18n locale so we could do the same here.

Esql search strategy has custom default timeout logic. But csv reporting has a config for request timeout overrides. Do we need to use the reporting overrides or should we keep default esql timeout logic?

This is a very good question 🤔 I vote for keeping the default esql timeout logic. @lukasolson wdyt?

A question from my side. We are going to use async query in Discover #172131. Does it affect the reporting service for ES|QL? And if yes, how?

Copy link
Contributor

@jloleysens jloleysens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core changes LGTM!

@Dosant
Copy link
Contributor Author

Dosant commented Jan 11, 2024

@stratoula,

A question from my side. We are going to use async query in Discover #172131. Does it affect the reporting service for ES|QL? And if yes, how?

Currently, for regular searches, we use async search in Discover, but sync search in reporting. I think this is mostly for historical reasons and it wasn't considered if it is worth switching to async search there, but I am not sure.
In theory, we can switch to async search in reporting and then use reporting-specific timeouts or task manager's task cancelation to cancel the query. Still, I am not sure what the benefits of those are compared to using sync search + requestTimeout.

I think we should, indeed, discuss this when migrating to async search for esql in Discover

@stratoula
Copy link
Contributor

@Dosant got it. @ppisljar is currently working on it. I hope that we are going to merge it in 8.13. Maybe we need to sync online before?

@Dosant Dosant requested a review from tsullivan January 11, 2024 15:55
@tsullivan
Copy link
Member

tsullivan commented Jan 11, 2024

@Dosant @stratoula

Async search topic

In theory, we can switch to async search in reporting and then use reporting-specific timeouts or task manager's task cancelation to cancel the query.

For non-ES|QL CSV export, I investigated using async search (as an alternative to paging) and found it not to be suitable for export use cases, due to there being a size limit of results within Elasticsearch. The size limit would be a hurdle for Reporting since it is common to export multi-gigabytes of data. I'm not sure this would affect ES|QL searches since the size of exported data will be similar to what the user has already seen loaded into their browser.

Timeouts override topic

This is a very good question 🤔 I vote for keeping the default esql timeout logic. @lukasolson wdyt?

I don't know how the built-in timeout logic for ES|QL queries works, but I'd like to offer advice here. We should assume that some users will use the Reporting APIs directly and craft report job parameters, such as the time range, for things like monthly reports. If frozen tiers are involved when the query goes back that far it will be slower by magnitudes. Therefore, there could be higher pressure on the timeouts for reports, because reports may querying data more expansively than data analysts would in a browser.

I think we do want those timeout overrides to be available for any kind of report.

@stratoula
Copy link
Contributor

/ci

Copy link
Contributor

@stratoula stratoula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I tested it locally and works as expected! Thanx @Dosant ❤️

@Dosant
Copy link
Contributor Author

Dosant commented Jan 16, 2024

@elasticmachine merge upstream

Copy link
Contributor

@jughosta jughosta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, @Dosant thanks! Adding some question below.

  1. I wonder whether it's a temporary solution that we are not using formatters for the ES|QL CVS report? The response from ES request has types for columns which we could use.

  2. Noticed that for non-timebased index patterns, the CSV report still highlights the first column. Is it expected? 3. Also, is it expected to have "null" for missing values?

Screenshot 2024-01-16 at 11 55 47
  1. Custom sorting on ES|QL columns in Discover seems to be ignored in the generated CSV reports. Not sure if it's a blocker for this PR or it can be addressed later. cc @stratoula

Regarding the changes to search strategy, it would be great to get @lukasolson review too.

@stratoula
Copy link
Contributor

@jughosta

I wonder whether it's a temporary solution that we are not using formatters for the ES|QL CVS report? The response from ES request has types for columns which we could use.

We don't use them in Discover table either so I guess this is ok.

Noticed that for non-timebased index patterns, the CSV report still highlights the first column. Is it expected? 3. Also, is it expected to have "null" for missing values?

Nice catch, it would be great if we could address it

Custom sorting on ES|QL columns in Discover seems to be ignored in the generated CSV reports. Not sure if it's a blocker for this PR or it can be addressed later.

We don't save it on the saved search either so I guess we could address it later. I leave this to Anton to decide.

@Dosant
Copy link
Contributor Author

Dosant commented Jan 16, 2024

@jughosta, thank you!

I wonder whether it's a temporary solution that we are not using formatters for the ES|QL CVS report? The response from ES request has types for columns which we could use.

Same what @stratoula said

Noticed that for non-timebased index patterns, the CSV report still highlights the first column. Is it expected?

I think the highlighting you mean is just how by default when you opened the CSV in Numbers apps it applied the template where the first column is highlighted. Is this what you mean? The csv itself doesn't have any formatting.

Also, is it expected to have "null" for missing values?

Agree, I'll try to improve it now

Custom sorting on ES|QL columns in Discover seems to be ignored in the generated CSV reports. Not sure if it's a blocker for this PR or it can be addressed later.
We don't save it on the saved search either so I guess we could address it later. I leave this to Anton to decide.

Right, the thing is that it isn't part of the state or locator or a saved search, I didn't want to change it for this pr. But I don't think this is that important in an CSV as it can be done with one click in Excel/Sheets

@stratoula
Copy link
Contributor

I think the highlighting you mean is just how by default when you opened the CSV in Numbers apps it applied the template where the first column is highlighted. Is this what you mean? The csv itself doesn't have any formatting.

Aaa yes Anton! I am also always confused with this but this is a numbers "feature" :D

@Dosant Dosant requested review from jughosta and lukasolson January 16, 2024 12:29
@Dosant
Copy link
Contributor Author

Dosant commented Jan 16, 2024

/ci

Copy link
Contributor

@jughosta jughosta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM 👍

Copy link
Member

@lukasolson lukasolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Search changes LGTM

@Dosant
Copy link
Contributor Author

Dosant commented Jan 17, 2024

@elasticmachine merge upstream

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
@kbn/es-types 26 30 +4
@kbn/generate-csv 11 25 +14
data 2579 2582 +3
discover 91 95 +4
total +25

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
discover 564.1KB 564.2KB +49.0B
reporting 71.2KB 71.2KB +9.0B
total +58.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
data 414.7KB 415.0KB +297.0B
reporting 43.6KB 43.6KB +65.0B
total +362.0B
Unknown metric groups

API count

id before after diff
@kbn/es-types 26 30 +4
@kbn/generate-csv 11 25 +14
data 3230 3233 +3
discover 137 141 +4
total +25

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@Dosant Dosant merged commit f38f87b into elastic:main Jan 17, 2024
19 checks passed
@kibanamachine kibanamachine added v8.13.0 backport:skip This commit does not require backporting labels Jan 17, 2024
@Dosant Dosant self-assigned this Jan 17, 2024
CoenWarmer pushed a commit to CoenWarmer/kibana that referenced this pull request Feb 15, 2024
## Summary

close elastic#173390
This PR enables CSV report generation with ES|QL in Discover. 

Pre this PR, there are two report types for generating CSV reports with
Discover:
-
https://github.com/elastic/kibana/blob/main/packages/kbn-reporting/export_types/csv/csv_searchsource.ts
- old deprecated report type that relies on
`SerializedSearchSourceFields`. This type is still currently used by
Discover UI and we plan to migrate away from it
elastic#151190
-
https://github.com/elastic/kibana/blob/main/packages/kbn-reporting/export_types/csv/csv_v2.ts
- new report type that relies on Discover locator. Currently, it can
only generate reports from searches backed by saved search and this
report type was implemented as a public-facing API for simple report
generation outside Kibana.

Since we plan to migrate to the v2 report type and search source is not
needed for es|ql, this PR implements es|ql csv reporting based on the v2
report type.

Initially, I wanted to create new new report type similar to v2 just for
es|ql, but it turned out to be a lot more boilerplate code without a
significant benefit. you can see it in the PR
elastic#174448. So I changed my mind and
the current PR adds es|ql capabilities inside the existing csv_v2
report. This is convenient as the input is the same (Discover locator),
the output is the same csv file and meta information, and telemetry is
also the same.

As of this PR, the ES|QL report is capable of: 
- Using es|ql query from the locator 
- Adding time range filter if present in the locator. time field is
picked from the data view (it is available in the locator, but otherwise
is not used for es|ql). Other filters are also passed if available in
the locator
- Keeps into account "columns" from the locator.
- Similar to current non-es|ql reports from discover UI, it doesn't use
saved searches but only relies on the state from the URL. This probably
will be improved in elastic#151190 to
support both.
- Uses different CSV settings for functionality like checking formulas,
escapes, bom, max size, etc...
- Keeps regular CSV features like cancelation and giving event loop a
break (even though those are not needed for now for es|ql since the
number of results is limited)


Some notable differences compared to regular discover search / csv
reports:
- A lot simpler, as it doesn't use search source and field formats 
- No pagination, less CPU heavy as esql responses are limited to 10000
results and a single request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting (Deprecated) Feature:Reporting Use Reporting:Screenshot, Reporting:CSV, or Reporting:Framework instead Feature:ES|QL ES|QL related features in Kibana release_note:feature Makes this part of the condensed release notes Team:SharedUX Team label for AppEx-SharedUX (formerly Global Experience) v8.13.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[ES|QL] CSV Reporting in Discover
10 participants