[FLINK-35167][cdc-connector] Introduce MaxCompute pipeline DataSink #3254

dingxin-tech · 2024-04-25T04:24:31Z

No description provided.

...ute/src/main/java/org/apache/flink/cdc/connectors/maxcompute/sink/MaxComputeEventWriter.java

loserwang1024 · 2024-05-16T06:03:06Z

I am wondering how a commercial database sink like MaxCompute to do e2e test?

dingxin-tech · 2024-05-21T03:18:44Z

I am wondering how a commercial database sink like MaxCompute to do e2e test?

I will soon be working on creating a Docker image for a MaxCompute Emulator that launches a mocked version of MaxCompute. This will allow for end-to-end testing to be performed by initializing this image prior to regression testing.

dingxin-tech · 2024-05-23T12:22:19Z

hi @loserwang1024 , I recently completed the development and release of a Docker image for the MaxCompute Emulator, and I have also added some related regression cases. Could you help me review the code in this pull request?

lvyanquan

Thanks for your contribution, I've left some comments.

lvyanquan · 2024-06-12T05:44:40Z

docs/content.zh/docs/connectors/maxcompute.md

+ <td>optional</td>
+ <td style="word-wrap: break-word;">16</td>
+ <td>Integer</td>
+ <td>自动创建 MaxCompute Transaction 表时使用的桶数。使用方式可以参考 <a href="ttps://help.aliyun.com/zh/maxcompute/user-guide/table-data-format">


Invalid link.

lvyanquan · 2024-06-13T07:32:18Z

...mpute/src/main/java/org/apache/flink/cdc/connectors/maxcompute/sink/MaxComputeEventSink.java

+ // "PostPartition",
+ // new EventTypeInfo(),
+ // new PostPartitionOperator(stream.getParallelism()))
+ // .name("PartitionByBucket");


So PartitionOperator is actually unused?
I've found your jira about this and I think that it's a more versatile and scalable solution, so we can wait for that jira completed and base on that.

Sorry, I comment out this part of code for debug use but forget to un-comment.
I think this code will stay here until the runtime optimization your mentioned is passed, and then I will re-implement this feature in that way.

lvyanquan · 2024-06-13T08:43:09Z

...src/test/java/org/apache/flink/cdc/connectors/maxcompute/utils/SchemaEvolutionUtilsTest.java

+import org.junit.jupiter.api.Test;
+
+/** e2e test of SchemaEvolutionUtils. */
+public class SchemaEvolutionUtilsTest {


This test class does not actually take effect, can we use maxcompute/maxcompute-emulator:v0.0.3 image to test it?

lvyanquan · 2024-06-13T09:07:17Z

...compute/src/main/java/org/apache/flink/cdc/connectors/maxcompute/utils/TypeConvertUtils.java

+ case CHAR:
+ case VARCHAR:
+ case TIME_WITHOUT_TIME_ZONE:
+ return TypeInfoFactory.STRING;


DataType includes information of nullable/notNull, do we lost this information here?

Yes, you are correct.

Additionally, I discovered that the MaxCompute SDK has an issue with creating tables based on primary keys. This issue results in ignoring the user-specified primary key order during table creation. I plan to fix this next week.

lvyanquan · 2024-06-13T09:30:56Z

...axcompute/src/main/resources/META-INF/services/org.apache.flink.cdc.common.factories.Factory

+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+org.apache.flink.cdc.connectors.maxcompute.MaxComputeDataSinkFactory


It's better to add a log4j2-test.properties file under resources for debug or test purpose like other connector.

lvyanquan · 2024-06-13T09:59:35Z

...ute/src/main/java/org/apache/flink/cdc/connectors/maxcompute/sink/MaxComputeEventWriter.java

+
+ @Override
+ public void write(Event element, Context context) throws IOException {
+ LOG.info("Sink writer {} write {}.", this.context.getSubtaskId(), element);


It's unnecessary to create so many logs.

Sure, I forget to remove this log for debug use.

lvyanquan · 2024-06-23T09:47:53Z

...src/main/java/org/apache/flink/cdc/connectors/maxcompute/utils/SessionCommitCoordinator.java

+ * completion status of an executor, allowing the system to determine whether all relevant sessions
+ * have been processed.
+ */
+public class SessionCommitCoordinator {


Is it better for us to change it to Manager or Helper to distinguish it from Flink's OperatorCoordinator

Sure, I didn't think of such a suitable name when I was naming it.

lvyanquan · 2024-06-23T13:30:32Z

.../main/java/org/apache/flink/cdc/connectors/maxcompute/coordinator/SessionManageOperator.java

+ @Override
+ public void snapshotState(StateSnapshotContext context) throws Exception {
+ super.snapshotState(context);
+ sessionCache.clear();


So, do we need to request a new session ID after each checkpoint, which may have a performance impact? It is expected that the checkpoint interval will need to be larger, right？

Yes, we do.
Each session can no longer be used after being committed, so we must re-create a session and request a new sessionID.
And indeed, the checkpoint interval will need to be larger.

This reverts commit 1080a59

leonardBang

Thanks @dingxin-tech for the nice work, the CI wasn't triggered properly, we need adjust the CI setting[1] when new connector or new module joining.

[1]https://github.com/apache/flink-cdc/blob/master/.github/workflows/flink_cdc.yml

dingxin-tech · 2024-07-25T05:33:49Z

Thanks @dingxin-tech for the nice work, the CI wasn't triggered properly, we need adjust the CI setting[1] when new connector or new module joining.

[1]https://github.com/apache/flink-cdc/blob/master/.github/workflows/flink_cdc.yml

Hi, I added tests for MaxCompute in the CI file. Additionally, I refactored the code to apply the newly released DataSink feature, which allows specifying a HashFunction, and upgraded the ODPS SDK. Could you please review this pr again?
@leonardBang @lvyanquan

kevinwangcs reviewed Apr 25, 2024

View reviewed changes

...ute/src/main/java/org/apache/flink/cdc/connectors/maxcompute/sink/MaxComputeEventWriter.java Show resolved Hide resolved

dingxin-tech force-pushed the maxcompute branch from 8a4c725 to 9e30416 Compare May 6, 2024 09:41

github-actions bot added the docs Improvements or additions to documentation label May 6, 2024

dingxin-tech force-pushed the maxcompute branch from 9e30416 to f165281 Compare May 10, 2024 09:05

github-actions bot added the composer label May 10, 2024

github-actions bot added the e2e-tests label May 23, 2024

dingxin-tech force-pushed the maxcompute branch from 761239f to 114bb8b Compare May 30, 2024 07:32

lvyanquan reviewed Jun 13, 2024

View reviewed changes

dingxin-tech mentioned this pull request Jun 17, 2024

[FLINK-35237] Allow Sink to Choose HashFunction in PrePartitionOperator #3414

Merged

lvyanquan reviewed Jun 23, 2024

View reviewed changes

dingxin-tech added 17 commits July 22, 2024 14:15

feat: support maxcompute connector

ed4e940

feat: support commit session in order

5615338

feat: add fill schema topology to SessionManageOperator

4160abb

feat: support append table without pk

6494a9e

fix: change options name

2e11056

doc: add maxcompute connector doc

abd02f3

fix: flink cdc some logic error

a98b184

feat：support file cache mode

1c5b6bd

bugfix

d01af86

feat: add partition by maxcompute hashfunciton topology

9fa5f46

feat: minor bugfix and add e2e test case

6db19b3

minor bugfix

2e4590c

Revert "feat：support file cache mode"

d75b451

This reverts commit 1080a59

Revert "feat：support file cache mode"

6b3909e

This reverts commit 1080a59

test: update e2e test

fdb5bff

test: update e2e test

8a0274e

revert some bugfix

c812dff

dingxin-tech added 5 commits July 22, 2024 14:15

add log4j2-test.properties

f272edf

style: checkstyle

0aea076

update sdk version to 0.48.5 and change pk logic

3b11f13

add pk test and use jupiter 5 assert

279c018

change SessionCommitCoordinator name to SessionCommitCoordinateHelper

dfd5c42

leonardBang reviewed Jul 24, 2024

View reviewed changes

dingxin-tech added 3 commits July 25, 2024 11:38

refactor: update sdk version and reformat the code

f8a1741

doc: update document

216cb30

ci: add maxcompute connector ci

9e7ae06

dingxin-tech force-pushed the maxcompute branch from f37d593 to 9e7ae06 Compare July 25, 2024 03:43

github-actions bot added build and removed composer labels Jul 25, 2024

dingxin-tech added 2 commits July 25, 2024 16:56

fix: cp error

24734bf

feat(TypeConvertUtils): 添加对主键非空转换支持

68f4d9a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-35167][cdc-connector] Introduce MaxCompute pipeline DataSink #3254

[FLINK-35167][cdc-connector] Introduce MaxCompute pipeline DataSink #3254

dingxin-tech commented Apr 25, 2024

loserwang1024 commented May 16, 2024

dingxin-tech commented May 21, 2024

dingxin-tech commented May 23, 2024

lvyanquan left a comment

lvyanquan Jun 12, 2024

dingxin-tech Jun 14, 2024

lvyanquan Jun 13, 2024

dingxin-tech Jun 14, 2024

lvyanquan Jun 13, 2024

dingxin-tech Jun 14, 2024

lvyanquan Jun 13, 2024

dingxin-tech Jun 14, 2024

lvyanquan Jun 13, 2024

dingxin-tech Jun 14, 2024

lvyanquan Jun 13, 2024

dingxin-tech Jun 14, 2024

lvyanquan Jun 23, 2024

dingxin-tech Jun 24, 2024

lvyanquan Jun 23, 2024 •

edited

Loading

dingxin-tech Jun 24, 2024

leonardBang left a comment

dingxin-tech commented Jul 25, 2024

[FLINK-35167][cdc-connector] Introduce MaxCompute pipeline DataSink #3254

Are you sure you want to change the base?

[FLINK-35167][cdc-connector] Introduce MaxCompute pipeline DataSink #3254

Conversation

dingxin-tech commented Apr 25, 2024

loserwang1024 commented May 16, 2024

dingxin-tech commented May 21, 2024

dingxin-tech commented May 23, 2024

lvyanquan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lvyanquan Jun 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leonardBang left a comment

Choose a reason for hiding this comment

dingxin-tech commented Jul 25, 2024

lvyanquan Jun 23, 2024 •

edited

Loading