-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature][ConnectorV2]add file excel sink #2585
Conversation
@TyrantLucifer Hi, PTAL |
@Test | ||
public void testFakeSourceToHdfsFileExcel() throws IOException, InterruptedException { | ||
Container.ExecResult execResult = executeSeaTunnelSparkJob("/file/fakesource_to_hdfs_excel.conf"); | ||
Assertions.assertEquals(0, execResult.getExitCode()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
validate sink data row & datatypes?
name = "string" | ||
age = "int" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test all datatypes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test string,boolean,tinyint,smallint,int,bigint,float,double,null
is ok
@Test | ||
public void testFakeSourceToHdfsFileExcel() throws IOException, InterruptedException { | ||
Container.ExecResult execResult = executeSeaTunnelFlinkJob("/file/fakesource_to_hdfs_excel.conf"); | ||
Assertions.assertEquals(0, execResult.getExitCode()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uncommited fakesource_to_local_excel.conf
、fakesource_to_hdfs_excel.conf
?
} | ||
|
||
public void writeData(SeaTunnelRow seaTunnelRow) { | ||
Sheet st = wb.getSheet("Sheet1"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about treated st
as a class attribute, when a SeaTunnelRow needs be consumed createSheet
will be invoked. Of course, I have no detailed source code understanding, in each write whether to obtain a sheet object needs a double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ,Let me modify that
pom.xml
Outdated
@@ -758,6 +760,19 @@ | |||
<artifactId>aliyun-sdk-datahub</artifactId> | |||
<version>${datahub.version}</version> | |||
</dependency> | |||
|
|||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, please move connector dependency to connector's pom.xml. Reference: #2630
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, they have been moved to connector's pom.xml
cell.setCellValue((int) value); | ||
cell.setCellStyle(wholeNumberCellStyle); | ||
} else if (BasicType.LONG_TYPE.equals(type) || type.getSqlType().equals(SqlType.TIMESTAMP)) { | ||
cell.setCellValue((long) value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value
type is LocalDateTime?
cell.setCellValue(JsonUtils.toJsonString(value)); | ||
cell.setCellStyle(stringCellStyle); | ||
} else if (type.getSqlType().equals(SqlType.DATE) || type.getSqlType().equals(SqlType.TIME)) { | ||
cell.setCellValue((String) value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value
type is LocalTime or LocalDate?
} else if (BasicType.DOUBLE_TYPE.equals(type)) { | ||
cell.setCellValue((double) value); | ||
cell.setCellStyle(wholeNumberCellStyle); | ||
} else if (type.getSqlType().equals(SqlType.BYTES) || type.getSqlType().equals(SqlType.ARRAY)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARRAY should use json?
...l-e2e/seatunnel-flink-connector-v2-e2e/src/test/resources/file/fakesource_to_hdfs_excel.conf
Show resolved
Hide resolved
...-e2e/seatunnel-flink-connector-v2-e2e/src/test/resources/file/fakesource_to_local_excel.conf
Show resolved
Hide resolved
@@ -0,0 +1,82 @@ | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,82 @@ | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public ExcelGenerator(List<Integer> sinkColumnsIndexInRow, SeaTunnelRowType seaTunnelRowType) { | ||
this.sinkColumnsIndexInRow = sinkColumnsIndexInRow; | ||
this.seaTunnelRowType = seaTunnelRowType; | ||
wb = new XSSFWorkbook(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XSSFWorkbook
store data to memory, is this acceptable? you consider using SXSSFWorkbook
? SXSSFWorkbook
supports streaming data writing
@TyrantLucifer @Hisoka-X @CalvinKirs @EricJoy2048
reference
https://github.com/apache/poi/blob/trunk/poi-ooxml/src/main/java/org/apache/poi/xssf/streaming/SXSSFWorkbook.java
https://github.com/apache/poi/blob/trunk/poi-ooxml/src/main/java/org/apache/poi/xssf/usermodel/XSSFWorkbook.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using SXSSFWorkbook
to create a sheet,local operation is normal, but E2E will report an error.
using openjdk
will report a null pointer exception because there is no font.
reference:
https://blog.csdn.net/progammer10086/article/details/107154814?spm=1001.2101.3001.6650.3&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-3-107154814-blog-116048731.t0_edu_mlt&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-3-107154814-blog-116048731.t0_edu_mlt&utm_relevant_index=6
|
Don't worry, just resolve conflicts and retry the CI, Thanks. |
@@ -86,6 +87,10 @@ The separator between columns in a row of data. Only needed by `text` and `csv` | |||
|
|||
The separator between rows in a file. Only needed by `text` and `csv` file format. | |||
|
|||
### max_rows_in_memory [int] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (textFileSinkConfig.getMaxRowsInMemory() > 0) { | ||
wb = new SXSSFWorkbook(textFileSinkConfig.getMaxRowsInMemory()); | ||
} else { | ||
wb = new XSSFWorkbook(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MaxRowsInMemory
default value is Long.MAX_VALUE ?
You can choose from the following:
- It is stated in the document that if
MaxRowsInMemory
is not set, all data will be buffered to memory - Set default value xxx for
MaxRowsInMemory
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'm going to set a default
@@ -0,0 +1,70 @@ | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,69 @@ | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name = "string" | ||
age = "int" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test all data type write to excel file
name = "string" | |
age = "int" | |
c_map = "map<string, string>" | |
c_array = "array<int>" | |
c_string = string | |
c_boolean = boolean | |
c_tinyint = tinyint | |
c_smallint = smallint | |
c_int = int | |
c_bigint = bigint | |
c_float = float | |
c_double = double | |
c_bytes = bytes | |
c_date = date | |
c_decimal = "decimal(38, 18)" | |
c_timestamp = timestamp | |
c_row = { | |
c_map = "map<string, string>" | |
c_array = "array<int>" | |
c_string = string | |
c_boolean = boolean | |
c_tinyint = tinyint | |
c_smallint = smallint | |
c_int = int | |
c_bigint = bigint | |
c_float = float | |
c_double = double | |
c_bytes = bytes | |
c_date = date | |
c_decimal = "decimal(38, 18)" | |
c_timestamp = timestamp |
name = "string" | ||
age = "int" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test all data type write to excel file
name = "string" | |
age = "int" | |
c_map = "map<string, string>" | |
c_array = "array<int>" | |
c_string = string | |
c_boolean = boolean | |
c_tinyint = tinyint | |
c_smallint = smallint | |
c_int = int | |
c_bigint = bigint | |
c_float = float | |
c_double = double | |
c_bytes = bytes | |
c_date = date | |
c_decimal = "decimal(38, 18)" | |
c_timestamp = timestamp | |
c_row = { | |
c_map = "map<string, string>" | |
c_array = "array<int>" | |
c_string = string | |
c_boolean = boolean | |
c_tinyint = tinyint | |
c_smallint = smallint | |
c_int = int | |
c_bigint = bigint | |
c_float = float | |
c_double = double | |
c_bytes = bytes | |
c_date = date | |
c_decimal = "decimal(38, 18)" | |
c_timestamp = timestamp |
is_partition_field_write_in_file=true | ||
file_name_expression="${transactionId}_${now}" | ||
file_format="excel" | ||
sink_columns=["name","age"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
sink_columns=["name","age"] |
transform { | ||
sql { | ||
sql = "select name,age from fake" | ||
} | ||
|
||
# If you would like to get more information about how to configure seatunnel and see full list of transform plugins, | ||
# please go to https://seatunnel.apache.org/docs/flink/configuration/transform-plugins/Sql | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
transform { | |
sql { | |
sql = "select name,age from fake" | |
} | |
# If you would like to get more information about how to configure seatunnel and see full list of transform plugins, | |
# please go to https://seatunnel.apache.org/docs/flink/configuration/transform-plugins/Sql | |
} |
is_partition_field_write_in_file=true | ||
file_name_expression="${transactionId}_${now}" | ||
file_format="excel" | ||
sink_columns=["name","age"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
sink_columns=["name","age"] |
transform { | ||
sql { | ||
sql = "select name,age from fake" | ||
} | ||
|
||
# If you would like to get more information about how to configure seatunnel and see full list of transform plugins, | ||
# please go to https://seatunnel.apache.org/docs/transform/sql | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
transform { | |
sql { | |
sql = "select name,age from fake" | |
} | |
# If you would like to get more information about how to configure seatunnel and see full list of transform plugins, | |
# please go to https://seatunnel.apache.org/docs/transform/sql | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi, I need to install fonts to use SXSSFWorkbook, how do I install them in the test container? For example, the following commandRUN apk add --update font-adobe-100dpi ttf-dejavu fontconfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you need to write dockerfile to do that.Add the required content based on this image, then generate your own image and push it to docker hub.
@Bingz2 Will this pr go ahead? If not, I would like to take over and finish it |
ok tks |
Purpose of this pull request
add [File]excel sink
#1946
Check list
New License Guide