-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add export capabilities to MSQ with SQL syntax #15689
Conversation
sql/src/main/java/org/apache/druid/sql/calcite/parser/ExternalDestinationSqlIdentifier.java
Fixed
Show fixed
Hide fixed
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/sql/MSQTaskSqlEngine.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/export/ExportDestination.java
Outdated
Show resolved
Hide resolved
...tage-query/src/main/java/org/apache/druid/msq/indexing/destination/MSQSelectDestination.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/parser/ExternalDestinationSqlIdentifier.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/parser/ExternalDestinationSqlIdentifier.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/parser/ExternalDestinationSqlIdentifier.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/apache/druid/catalog/model/table/export/ExportDestination.java
Outdated
Show resolved
Hide resolved
sql/src/main/java/org/apache/druid/sql/calcite/parser/ExternalDestinationSqlIdentifier.java
Fixed
Show fixed
Hide fixed
...re/s3-extensions/src/main/java/org/apache/druid/storage/s3/output/S3StorageExportConfig.java
Fixed
Show fixed
Hide fixed
...re/s3-extensions/src/main/java/org/apache/druid/storage/s3/output/S3StorageExportConfig.java
Fixed
Show fixed
Hide fixed
...nsions/src/main/java/org/apache/druid/storage/s3/output/S3ExportStorageConnectorFactory.java
Fixed
Show fixed
Hide fixed
...nsions/src/main/java/org/apache/druid/storage/s3/output/S3ExportStorageConnectorFactory.java
Fixed
Show fixed
Hide fixed
integration-tests-ex/cases/src/test/java/org/apache/druid/testsEx/msq/ITMultiStageQuery.java
Dismissed
Show dismissed
Hide dismissed
processing/src/main/java/org/apache/druid/storage/local/LocalFileExportStorageProvider.java
Fixed
Show fixed
Hide fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for incorporating my feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made some copyedits to the docs as suggestions. They can either be merged as part of this PR, or I can open a followup PR with the changes.
This variation of EXTERN requires one argument, the details of the destination as specified below. | ||
This variation additionally requires an `AS` clause to specify the format of the exported rows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variation of EXTERN requires one argument, the details of the destination as specified below. | |
This variation additionally requires an `AS` clause to specify the format of the exported rows. | |
This variation of EXTERN has two required parts: an argument that details the destination and an `AS` clause to specify the format of the exported rows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AS
clause would not be an argument to extern, it's present elsewhere in the query. Would it be confusing to call it an argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about the change I just made?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments around removal of dead code and error messages.
} else { | ||
throw new ISE("Unsupported destination [%s]", querySpec.getDestination()); | ||
shuffleSpecFactory = querySpec.getDestination() | ||
.getShuffleSpecFactory(MultiStageQueryContext.getRowsPerPage(querySpec.getQuery().context())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the refactor. Its much cleaner now.
We should add a comment saying all select partitions are controlled by a context value rowsPerPage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean a comment every where the function is being called? We don't pass the whole context to getShuffleSpecFactory(), just the integer, so would this need to be specifically mentioned somewhere?
if (Intervals.ONLY_ETERNITY.equals(exportMSQDestination.getReplaceTimeChunks())) { | ||
StorageConnector storageConnector = storageConnectorProvider.get(); | ||
try { | ||
storageConnector.deleteRecursively(""); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I think code flow wise, make query definition may not be the correct place to delete the file.
Maybe it can be done after we create the query definition object. (Clear files if needed)
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java
Outdated
Show resolved
Hide resolved
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/sql/MSQTaskQueryMaker.java
Outdated
Show resolved
Hide resolved
return; | ||
} | ||
} | ||
throw DruidException.forPersona(DruidException.Persona.USER) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error is user facing error. Please mention that the user should reach out to the cluster admin for the paths for export. The paths are controlled via xxx property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the error be better addressed for Persona.ADMIN
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message is more likely to be due to user error in specifying the path than a permission issue, so keeping it as user makes sense.
processing/src/main/java/org/apache/druid/storage/local/LocalFileExportStorageProvider.java
Show resolved
Hide resolved
processing/src/main/java/org/apache/druid/storage/local/LocalFileExportStorageProvider.java
Fixed
Show fixed
Hide fixed
* Add test * Parser changes to support export statements * Fix builds * Address comments * Add frame processor * Address review comments * Fix builds * Update syntax * Webconsole workaround * Refactor * Refactor * Change export file path * Update docs * Remove webconsole changes * Fix spelling mistake * Parser changes, add tests * Parser changes, resolve build warnings * Fix failing test * Fix failing test * Fix IT tests * Add tests * Cleanup * Fix unparse * Fix forbidden API * Update docs * Update docs * Address review comments * Address review comments * Fix tests * Address review comments * Fix insert unparse * Add external write resource action * Fix tests * Add resource check to overlord resource * Fix tests * Add IT * Update syntax * Update tests * Update permission * Address review comments * Address review comments * Address review comments * Add tests * Add check for runtime parameter for bucket and path * Add check for runtime parameter for bucket and path * Add tests * Update docs * Fix NPE * Update docs, remove deadcode * Fix formatting
Support for exporting msq results to gcs bucket. This is essentially copying the logic of s3 export for gs, originally done by @adarshsanjeev in this PR - #15689
Problem
Druid currently does not allow export of tables in a programmatic manner. While is is possible to download results from a SELECT query, this relies on writing the results to a single query report, which cannot support large datasets. An export syntax which writes the results in a desired format directly to an external location (such as s3 or hdfs) would be useful.
For example: A statement to export all rows from a table into S3 as CSV files would look like
Initially, only CSV is supported as an export format, but this can be expanded to support other formats easily.
Release note
Key changed/added classes in this PR
sql/src/main/codegen/includes/common.ftl
sql/src/main/codegen/includes/replace.ftl
IngestHandler
This PR has: