-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ES|QL] CSV Reporting in Discover #174511
Conversation
/ci |
@@ -665,3 +665,13 @@ export interface ESQLSearchReponse { | |||
columns: ESQLColumn[]; | |||
values: ESQLRow[]; | |||
} | |||
|
|||
export interface ESQLSearchParams { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was just moved from common/search/expressions/esql
uiSettings: IUiSettingsClient; | ||
} | ||
|
||
export class CsvESQLGenerator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a copy of CsvGenerator
with changes for es|ql generation. I kept most of the CSV building-specific things like maxSizeBytes, bom, escapeFormulaValues, cancelation, and CPU breaks, and reused the same CSV settings. Inputs/Outputs/errors/warnings are mostly the same
// https://github.com/elastic/elasticsearch/pull/102767 | ||
// timezone | ||
// TODO: pass locale from browser? | ||
// locale |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This wasn't available in the locator, I wasn't eager to add it as wasn't sure if this was significant. Happy to add if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would locale
be used for? Is it different than timezone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I wonder if we should not use field formatters for the non-ES|QL CSV export as well. Do you know if any plans are changing with field formatters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if any plans are changing with field formatters?
don't know :(
I wonder if we should not use field formatters for the non-ES|QL CSV export as well.
I thought that the idea was to try to match the Discover table as closely as possible. I think it makes sense to keep the formatters for non-ESQL until they are there in the Discover table.
@@ -183,7 +183,9 @@ export const getTopNavLinks = ({ | |||
objectId: savedSearch.id, | |||
objectType: 'search', | |||
sharingData: { | |||
...sharingData, | |||
isTextBased, | |||
locatorParams: [{ id: locator.id, params }], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
locatorParams will be used for es|ql report (csv_v2), searchSourceSharingData is still used for other reports.
Hopefully we will only need locatorParams when fully migrated to csv_v2 #151190
@@ -251,7 +251,8 @@ class ReportingPanelContentUi extends Component<Props, State> { | |||
case PDF_REPORT_TYPE_V2: | |||
return 'PDF'; | |||
case CSV_REPORT_TYPE: | |||
return 'csv'; | |||
case CSV_REPORT_TYPE_V2: | |||
return 'CSV'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixing #173837
@@ -87,14 +87,10 @@ export class CsvV2ExportType extends ExportType< | |||
throw Boom.badRequest('Invalid Job params: must contain a single Discover App locator'); | |||
} | |||
|
|||
if (!params || !params.savedSearchId || typeof params.savedSearchId !== 'string') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the non es|ql reports csv_v2 still needs the savedSearchId. this is asserted deeper inside the code branch.
es|ql reports are working without savedSearchId
full support and alignment for saved and non saved searches should be implemented in #151190
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ on the idea to implement ES|QL CSV export within the csv_v2 export type.
const escape = this.escapeValues(settings); | ||
|
||
for (const column of columns) { | ||
rowDefinition.push(escape(`${dataTableRow[column]}`)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe as you had it was intentional, but this looks the same as:
rowDefinition.push(escape(`${dataTableRow[column]}`)); | |
rowDefinition.push(escape(dataTableRow[column])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
safe typecasting from unknown to string + if there is anything something then string, would be safely converted to string
@@ -8,7 +8,7 @@ | |||
import { i18n } from '@kbn/i18n'; | |||
import React from 'react'; | |||
|
|||
import { CSV_JOB_TYPE } from '@kbn/reporting-export-types-csv-common'; | |||
import { CSV_JOB_TYPE, CSV_JOB_TYPE_V2 } from '@kbn/reporting-export-types-csv-common'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just like to give a head's up that this file might be changing quite a bit in #171253
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😢
Pinging @elastic/appex-sharedux (Team:SharedUX) |
/ci |
From my tests so far it works great! About your questions:
This is a very good question 🤔 I vote for keeping the default esql timeout logic. @lukasolson wdyt? A question from my side. We are going to use async query in Discover #172131. Does it affect the reporting service for ES|QL? And if yes, how? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Core changes LGTM!
Currently, for regular searches, we use async search in Discover, but sync search in reporting. I think this is mostly for historical reasons and it wasn't considered if it is worth switching to async search there, but I am not sure. I think we should, indeed, discuss this when migrating to async search for esql in Discover |
Async search topic
For non-ES|QL CSV export, I investigated using async search (as an alternative to paging) and found it not to be suitable for export use cases, due to there being a size limit of results within Elasticsearch. The size limit would be a hurdle for Reporting since it is common to export multi-gigabytes of data. I'm not sure this would affect ES|QL searches since the size of exported data will be similar to what the user has already seen loaded into their browser. Timeouts override topic
I don't know how the built-in timeout logic for ES|QL queries works, but I'd like to offer advice here. We should assume that some users will use the Reporting APIs directly and craft report job parameters, such as the time range, for things like monthly reports. If frozen tiers are involved when the query goes back that far it will be slower by magnitudes. Therefore, there could be higher pressure on the timeouts for reports, because reports may querying data more expansively than data analysts would in a browser. I think we do want those timeout overrides to be available for any kind of report. |
/ci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I tested it locally and works as expected! Thanx @Dosant ❤️
…to d/2024-01-09-esql-csv-2
@elasticmachine merge upstream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, @Dosant thanks! Adding some question below.
-
I wonder whether it's a temporary solution that we are not using formatters for the ES|QL CVS report? The response from ES request has types for columns which we could use.
-
Noticed that for non-timebased index patterns, the CSV report still highlights the first column. Is it expected? 3. Also, is it expected to have "null" for missing values?
- Custom sorting on ES|QL columns in Discover seems to be ignored in the generated CSV reports. Not sure if it's a blocker for this PR or it can be addressed later. cc @stratoula
Regarding the changes to search strategy, it would be great to get @lukasolson review too.
We don't use them in Discover table either so I guess this is ok.
Nice catch, it would be great if we could address it
We don't save it on the saved search either so I guess we could address it later. I leave this to Anton to decide. |
@jughosta, thank you!
Same what @stratoula said
I think the highlighting you mean is just how by default when you opened the CSV in Numbers apps it applied the template where the first column is highlighted. Is this what you mean? The csv itself doesn't have any formatting.
Agree, I'll try to improve it now
Right, the thing is that it isn't part of the state or locator or a saved search, I didn't want to change it for this pr. But I don't think this is that important in an CSV as it can be done with one click in Excel/Sheets |
Aaa yes Anton! I am also always confused with this but this is a numbers "feature" :D |
…to d/2024-01-09-esql-csv-2
/ci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Search changes LGTM
@elasticmachine merge upstream |
💚 Build Succeeded
Metrics [docs]Public APIs missing comments
Async chunks
Page load bundle
Unknown metric groupsAPI count
History
To update your PR or re-run it, just comment with: |
## Summary close elastic#173390 This PR enables CSV report generation with ES|QL in Discover. Pre this PR, there are two report types for generating CSV reports with Discover: - https://github.com/elastic/kibana/blob/main/packages/kbn-reporting/export_types/csv/csv_searchsource.ts - old deprecated report type that relies on `SerializedSearchSourceFields`. This type is still currently used by Discover UI and we plan to migrate away from it elastic#151190 - https://github.com/elastic/kibana/blob/main/packages/kbn-reporting/export_types/csv/csv_v2.ts - new report type that relies on Discover locator. Currently, it can only generate reports from searches backed by saved search and this report type was implemented as a public-facing API for simple report generation outside Kibana. Since we plan to migrate to the v2 report type and search source is not needed for es|ql, this PR implements es|ql csv reporting based on the v2 report type. Initially, I wanted to create new new report type similar to v2 just for es|ql, but it turned out to be a lot more boilerplate code without a significant benefit. you can see it in the PR elastic#174448. So I changed my mind and the current PR adds es|ql capabilities inside the existing csv_v2 report. This is convenient as the input is the same (Discover locator), the output is the same csv file and meta information, and telemetry is also the same. As of this PR, the ES|QL report is capable of: - Using es|ql query from the locator - Adding time range filter if present in the locator. time field is picked from the data view (it is available in the locator, but otherwise is not used for es|ql). Other filters are also passed if available in the locator - Keeps into account "columns" from the locator. - Similar to current non-es|ql reports from discover UI, it doesn't use saved searches but only relies on the state from the URL. This probably will be improved in elastic#151190 to support both. - Uses different CSV settings for functionality like checking formulas, escapes, bom, max size, etc... - Keeps regular CSV features like cancelation and giving event loop a break (even though those are not needed for now for es|ql since the number of results is limited) Some notable differences compared to regular discover search / csv reports: - A lot simpler, as it doesn't use search source and field formats - No pagination, less CPU heavy as esql responses are limited to 10000 results and a single request
Summary
close #173390
This PR enables CSV report generation with ES|QL in Discover.
Pre this PR, there are two report types for generating CSV reports with Discover:
SerializedSearchSourceFields
. This type is still currently used by Discover UI and we plan to migrate away from it [Discover/CSV Reporting] Use the new CSV export endpoint in Discover UI #151190Since we plan to migrate to the v2 report type and search source is not needed for es|ql, this PR implements es|ql csv reporting based on the v2 report type.
Initially, I wanted to create new new report type similar to v2 just for es|ql, but it turned out to be a lot more boilerplate code without a significant benefit. you can see it in the PR #174448. So I changed my mind and the current PR adds es|ql capabilities inside the existing csv_v2 report. This is convenient as the input is the same (Discover locator), the output is the same csv file and meta information, and telemetry is also the same.
As of this PR, the ES|QL report is capable of:
Some notable differences compared to regular discover search / csv reports:
Questions
esql timeouts?
Esql search strategy has custom default timeout logic. But csv reporting has a config for request timeout overrides. Do we need to use the reporting overrides or should we keep default esql timeout logic?