Skip to content

Conversation

@eldenmoon
Copy link
Member

@eldenmoon eldenmoon commented Jun 13, 2023

If this is a Two-Phase read query, and we need to delay the release of Rowset by row->update_delayed_expired_timestamp() to expand the lifespan of rowsets. This is necessary to avoid data loss during the second phase reading, where some stale rowsets may be swept and result in missing data.

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@eldenmoon
Copy link
Member Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@eldenmoon
Copy link
Member Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@eldenmoon
Copy link
Member Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

If this is a Two-Phase read query, and we need to delay the release of Rowset by row->update_delayed_expired_timestamp() to expand the lifespan of rowsets. This is necessary to avoid data loss during the second phase reading, where some stale rowsets may be swept and result in missing data.

For rowsets that have been moved to the unused rowsets, they are also needed in second phase reading.
@eldenmoon
Copy link
Member Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@eldenmoon
Copy link
Member Author

run buildall

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 14, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@qidaye qidaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eldenmoon eldenmoon merged commit 0f470fe into apache:master Jun 14, 2023
eldenmoon added a commit to eldenmoon/incubator-doris that referenced this pull request Jul 12, 2023
…his can result in data query misses in the second phase of a two-phase query.

related pr apache#20732

There are two reasons for moving the logic of delayed deletion from the Tablet to the StorageEngine. The first reason is to consolidate the logic and unify the delayed operations. The second reason is that delayed garbage collection during queries can cause rowsets to remain in the "stale rowsets" state, preventing the timely deletion of rowset metadata, It may cause rowset metadata too large.
eldenmoon added a commit to eldenmoon/incubator-doris that referenced this pull request Jul 12, 2023
…his can result in data query misses in the second phase of a two-phase query.

related pr apache#20732

There are two reasons for moving the logic of delayed deletion from the Tablet to the StorageEngine. The first reason is to consolidate the logic and unify the delayed operations. The second reason is that delayed garbage collection during queries can cause rowsets to remain in the "stale rowsets" state, preventing the timely deletion of rowset metadata, It may cause rowset metadata too large.
eldenmoon added a commit to eldenmoon/incubator-doris that referenced this pull request Jul 12, 2023
…his can result in data query misses in the second phase of a two-phase query.

related pr apache#20732

There are two reasons for moving the logic of delayed deletion from the Tablet to the StorageEngine. The first reason is to consolidate the logic and unify the delayed operations. The second reason is that delayed garbage collection during queries can cause rowsets to remain in the "stale rowsets" state, preventing the timely deletion of rowset metadata, It may cause rowset metadata too large.
eldenmoon added a commit to eldenmoon/incubator-doris that referenced this pull request Jul 12, 2023
…his can result in data query misses in the second phase of a two-phase query.

related pr apache#20732

There are two reasons for moving the logic of delayed deletion from the Tablet to the StorageEngine. The first reason is to consolidate the logic and unify the delayed operations. The second reason is that delayed garbage collection during queries can cause rowsets to remain in the "stale rowsets" state, preventing the timely deletion of rowset metadata, It may cause rowset metadata too large.
dataroaring pushed a commit that referenced this pull request Jul 13, 2023
…his can result in data query misses in the second phase of a two-phase query. (#21741)

* [Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query.

related pr #20732

There are two reasons for moving the logic of delayed deletion from the Tablet to the StorageEngine. The first reason is to consolidate the logic and unify the delayed operations. The second reason is that delayed garbage collection during queries can cause rowsets to remain in the "stale rowsets" state, preventing the timely deletion of rowset metadata, It may cause rowset metadata too large.

* not use unused rowsets
xiaokang pushed a commit that referenced this pull request Jul 13, 2023
…his can result in data query misses in the second phase of a two-phase query. (#21741)

* [Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query.

related pr #20732

There are two reasons for moving the logic of delayed deletion from the Tablet to the StorageEngine. The first reason is to consolidate the logic and unify the delayed operations. The second reason is that delayed garbage collection during queries can cause rowsets to remain in the "stale rowsets" state, preventing the timely deletion of rowset metadata, It may cause rowset metadata too large.

* not use unused rowsets
BiteTheDDDDt pushed a commit to BiteTheDDDDt/incubator-doris that referenced this pull request Jul 14, 2023
…his can result in data query misses in the second phase of a two-phase query. (apache#21741)

* [Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query.

related pr apache#20732

There are two reasons for moving the logic of delayed deletion from the Tablet to the StorageEngine. The first reason is to consolidate the logic and unify the delayed operations. The second reason is that delayed garbage collection during queries can cause rowsets to remain in the "stale rowsets" state, preventing the timely deletion of rowset metadata, It may cause rowset metadata too large.

* not use unused rowsets
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants