Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: When expiring historic data,the latest snapshot generated by amoro should not be used #2555

Closed
2 tasks done
Tracked by #2176
rfyu opened this issue Feb 20, 2024 · 0 comments · Fixed by #2558 or #3058
Closed
2 tasks done
Tracked by #2176
Labels
type:bug Something isn't working

Comments

@rfyu
Copy link
Contributor

rfyu commented Feb 20, 2024

What happened?

For tables with the following configurations, data that should have been retained will be cleared after a long period of no ingestion, because the latest snapshot generated by data-expire operation is used when expiring historic data.

'data-expire.enabled' = 'true',
'data-expire.field' = 'd',
'data-expire.datetime-string-pattern' = 'yyyy-MM-dd',
'data-expire.retention-time' = '3d',
'data-expire.base-on-rule'='LAST_COMMIT_TIME'

Affects Versions

master

What engines are you seeing the problem on?

No response

How to reproduce

  1. prepare an iceberg table and make sure the last commit is a week ago
  2. add properties: 'data-expire.enabled' = 'true', 'data-expire.field' = 'd', 'data-expire.datetime-string-pattern' = 'yyyy-MM-dd', 'data-expire.retention-time' = '3d'
  3. after the second data expiration, you will find that all data in this table has been cleared.

Relevant log output

No response

Anything else

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

  • I agree to follow this project's Code of Conduct
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment