Skip to content

Reduce unnecessary timeline loading on the Flink-TM side #17761

@TheR1sing3un

Description

@TheR1sing3un

Task Description

What needs to be done:
Reduce unnecessary timeline loading on the Flink-TM side

Why this task is needed:
Currently, in the Flink-TM logic when write to hudi table, there are some logics for creating HoodieFlinkTable objects.
such as:

Image

However, the current implementation will immediately load the active timeline after creating this object.

Image

When there are a particularly large number of transactions on the active timeline, for instance, in our scenario, there might be tens of thousands of instants in active timeline.
At this point, loading the timeline will become extremely heavy.
Moreover, many subsequent logics do not rely on the relevant information of the active timeline.
That is to say, immediately loading the timeline in these logics is an unnecessary operation.
As the checkpoint interval of our real-time tasks becomes shorter, this performance impact becomes more obvious.
So I think it is possible to optimize and reduce these unnecessary timeline loads

Task Type

Performance optimization

Related Issues

Parent feature issue: (if applicable )
Related issues:
NOTE: Use Relationships button to add parent/blocking issues after issue is created.

Metadata

Metadata

Assignees

No one assigned

    Labels

    type:devtaskDevelopment tasks and maintenance work

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions