-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Description
Task Description
What needs to be done:
Reduce unnecessary timeline loading on the Flink-TM side
Why this task is needed:
Currently, in the Flink-TM logic when write to hudi table, there are some logics for creating HoodieFlinkTable objects.
such as:
However, the current implementation will immediately load the active timeline after creating this object.
When there are a particularly large number of transactions on the active timeline, for instance, in our scenario, there might be tens of thousands of instants in active timeline.
At this point, loading the timeline will become extremely heavy.
Moreover, many subsequent logics do not rely on the relevant information of the active timeline.
That is to say, immediately loading the timeline in these logics is an unnecessary operation.
As the checkpoint interval of our real-time tasks becomes shorter, this performance impact becomes more obvious.
So I think it is possible to optimize and reduce these unnecessary timeline loads
Task Type
Performance optimization
Related Issues
Parent feature issue: (if applicable )
Related issues:
NOTE: Use Relationships button to add parent/blocking issues after issue is created.