-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Motivation
Doris' current data export method is relatively simple, and data can only be exported through the Export feature. The Export feature can only export data in partition level, and does not support advanced SQL functions such as Expressions, Joins, and Group By.
In some cases, Doris users will have the function to export the query results:
- Export the query results, provide other users to download.
- Export query results for further processing by other systems.
- Bitmap type export: Sometimes users store a large number of ids (100 million +) through bitmap, and need to return these ids. At present, Doris does not support UDTF. And If it is returned through the MySQL protocol, it will not be processed because the result set is too large.
Alternatives
Support SELECT ... FROM tbl INTO OUTFILE xxx ...
This is the most intuitive and user-friendly way. The grammar is referenced from the MySQL grammar manual. In the implementation of Mysql, the exported file will be written to the local file system of the MySQL server. Since Doris is a distributed system, query results may be produced by any node. So here are two options to export:
-
Write the result to the local file system of a BE. And provide users with an http connection for downloading.
This solution is relatively simple. But some problems are involved:
- File cleanup problem. As a result, the file may be very large and it will take up too much BE disk space. And the file cannot exist for a long time, it needs a mechanism to clean it regularly.
- Connection problems. The user may not have permission or cannot directly access a BE node, so the file cannot be downloaded.
For the first problem, it can be satisfied by limiting the size of a single file, monitoring disk space, and increasing the file timeout setting and regularly cleaning up. The second problem cannot be solved, but it can be used as a temporary solution to meet the needs of some users.
-
Write the result to a remote storage system, such as hdfs.
This solution can use the Broker to write files to the remote system. This solves the two problems in Option 1.
I plan to implement Option 1 in the first phase so that users who do not have remote storage in time can use this feature. And then only need to make less changes, add different write file interface, we can support Option 2.
I plan to implement Option 2. Because for local file, we have to care about all messy things, such as the authentication, GC, disk space, etc.. And for remote storage, all things become simple.
Other system
For AWS Redshift, it provides a UPLOAD command to satisfy this requirement. And it acts same as SELECT INTO OUTFILE in MySQL.
For Vertica and Snowflake, they provides shell tools to export the query result. Vertica, Snowflake. And Snowflake also support a Copy Into command.
Export format
CSV is a common way to export the query result. But some columns are in binary type, such as BITMAP_UNION and HLL. So that we should find a way to save these kind of types.
Parquet, I think, is a good format to do this. And for binary type, we can use BINARY_ARRAY type in Parquet to save.
But Parquet does not support Largeint. I will think about it later. But for most case, Parquet works.
Implementation
Syntax
query_stmt
INTO OUFILE "file:///path/to/file_prefix"
FORMAT AS CSV|PARQUET
WITH BROKER broker_name
(broker_properties)
PROPERTIES
(other_properties);
Example:
SELECT a, sum(b) FROM tbl1 GROUP BY a LIMIT 10
INTO OUFILE "file:///path/to/file_prefix"
FORMAT AS CSV
WITH BROKER my_broker
("username" = "user", "password" = "pwd")
PROPERTIES
("column_separator" = ",");
In MySQL, the grammar is slightly different that it has, for example,FIELD TERMINATED BY... clause at the end. But I think this is not good for further extension. So I decide to use a property map instead.
The file_prefix is the prefix of the exported files. The final file name will be assembled as
prefix + seq no + format suffix. Like
file_prefix_0.csv
file_prefix_1.csv
file_prefix_2.csv
file_prefix_0.parquet
file_prefix_1.parquet
file_prefix_2.parquet
Result Sink
The ResultSink class in BE now has a ResultWriter and a Sender inside it. the ResultWriter is to writer the batch of results to Sender, And the Sender is to send the batch of results back to FE client.
I will make ResultWriter an abstract class, and implements 2 subclass: MysqlResultWriter and FileResultWriter.
The MysqlResultWriter act same as the old ResultWriter. And the FileResultWriter is a new class. This class is to writer the results to file.
Query Plan
There is almost no need to change the query plan, just add additional information to the TResultSink to explain what ResultWriter to use.
Statement return value
In MySQL, the SELECT INTO OUTFILE will return with number of rows exported.
mysql> select * from tbl1 into outfile "/tmp/file_prefix";
Query OK, 100 row affected (0.06 sec)
We keep consistent with it.
Options
Some options can be specified in properties of the statement:
-
column_separatorThe column separator of the output format. Only for CSV format.
-
line_delimiterThe line delimiter of the output format. Only for CSV format.
3. split_size_bytes
3. max_file_size_bytes
If the size of export file is large, it will be splited by this size.
Remaining problem
-
Whether to support asynchronous export operations
Some queries may take a long time or have a large result set, which may cause the request to occupy the connection for a long time.
We can consider supporting asynchronous commands. And through theSession Variableto control whether to use asynchronous commands.
Scheduling Plan
- Implement export file with CSV format
- Support PARQUET format
- Support split files