[FLINK-28897] [TABLE-SQL] Fail to use udf in added jar when enabling checkpoint #25656

ammu20-dev · 2024-11-14T10:59:28Z

What is the purpose of the change

This pull request fixes the class loading issues when using udf in add jar and enabling checkpointing.

Brief change log

Pulled in the FlinkUserCodeClassLoader for UDF jar loading from the resource manager

Verifying this change

Please make sure both new and modified tests in this PR follow the conventions for tests defined in our code quality guide.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end deployment with large payloads (100MB)
Extended integration test for recovery after master (JobManager) failure
Added test that validates that TaskInfo is transferred only once across recoveries
Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @Public(Evolving): ( no)
The serializers: (don't know)
The runtime per-record code paths (performance sensitive): (no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
The S3 file system connector: (no)

Documentation

Does this pull request introduce a new feature? (no)
If yes, how is the feature documented? (not applicable)

flinkbot · 2024-11-14T11:02:18Z

CI report:

98b5f15 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

davidradl · 2024-11-21T08:53:03Z

...k-table-api-java/src/main/java/org/apache/flink/table/api/internal/TableEnvironmentImpl.java

@@ -1029,6 +1029,8 @@ private TableResultInternal executeInternal(
                        defaultJobName,
                        jobStatusHookList);
        try {
+            ClassLoader userClassLoader = Thread.currentThread().getContextClassLoader();


please could you change the variable name to be something like originalContextClassLoader

Changed the variable name to contextClassLoader .

davidradl · 2024-11-21T08:53:49Z

...k-table-api-java/src/main/java/org/apache/flink/table/api/internal/TableEnvironmentImpl.java

@@ -1029,6 +1029,8 @@ private TableResultInternal executeInternal(
                        defaultJobName,
                        jobStatusHookList);
        try {
+            ClassLoader userClassLoader = Thread.currentThread().getContextClassLoader();


Please add coments and refer to the v2 implementation in the comments and that the v2 refactor is not going to be backported to 1.20.

Added comments on the need for this change. The latest FLIP implementation to introduce a stream graph based job submission moved the StreamGraph module to flink runtime and changed the job submission logic by directly submitting a StreamGraph to the job manager.
Ref FLIP: https://cwiki.apache.org/confluence/display/FLINK/FLIP-468%3A+Introducing+StreamGraph-Based+Job+Submission
Related JIRA: https://issues.apache.org/jira/browse/FLINK-36065
As a result of these changes this issue seems to be fixed for flink v2 as I was not able to reproduce it with the latest main. Hence limiting this particular change to 1.20 versions.

davidradl · 2024-11-21T08:54:04Z

...k-table-api-java/src/main/java/org/apache/flink/table/api/internal/TableEnvironmentImpl.java

@@ -1069,8 +1072,11 @@ private TableResultInternal executeQueryOperation(

        Pipeline pipeline = generatePipelineFromQueryOperation(operation, transformations);
        try {
+            ClassLoader userClassLoader = Thread.currentThread().getContextClassLoader();


please add unit tests

davidradl

please change the variable name , add a comment and add unit tests (or provide a reason why unit tests cannot be added) .

davidradl · 2024-11-21T08:55:29Z

Reviewed by Chi on 21/11/24. Asked submitter questions.

fix: class-loading-issue

eafd41f

flinkbot added the component=TableSQL/Runtime label Nov 14, 2024

Ammu Parvathy added 2 commits November 18, 2024 12:22

fix: class-loading-issue

2b43fc6

fix: class-loading-issue

da5e273

davidradl reviewed Nov 21, 2024

View reviewed changes

davidradl suggested changes Nov 21, 2024

View reviewed changes

fix: class-loading-issue

98b5f15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-28897] [TABLE-SQL] Fail to use udf in added jar when enabling checkpoint #25656

[FLINK-28897] [TABLE-SQL] Fail to use udf in added jar when enabling checkpoint #25656

ammu20-dev commented Nov 14, 2024 •

edited

Loading

flinkbot commented Nov 14, 2024 •

edited

Loading

davidradl Nov 21, 2024

ammu20-dev Nov 21, 2024

davidradl Nov 21, 2024

ammu20-dev Nov 21, 2024

davidradl Nov 21, 2024

davidradl left a comment

davidradl commented Nov 21, 2024

[FLINK-28897] [TABLE-SQL] Fail to use udf in added jar when enabling checkpoint #25656

Are you sure you want to change the base?

[FLINK-28897] [TABLE-SQL] Fail to use udf in added jar when enabling checkpoint #25656

Conversation

ammu20-dev commented Nov 14, 2024 • edited Loading

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented Nov 14, 2024 • edited Loading

CI report:

davidradl Nov 21, 2024

Choose a reason for hiding this comment

ammu20-dev Nov 21, 2024

Choose a reason for hiding this comment

davidradl Nov 21, 2024

Choose a reason for hiding this comment

ammu20-dev Nov 21, 2024

Choose a reason for hiding this comment

davidradl Nov 21, 2024

Choose a reason for hiding this comment

davidradl left a comment

Choose a reason for hiding this comment

davidradl commented Nov 21, 2024

ammu20-dev commented Nov 14, 2024 •

edited

Loading

flinkbot commented Nov 14, 2024 •

edited

Loading