-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prune checkpoints in Lambda #4777
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bug remains, only one source is created:
"checkpoint": {
"ingest-lambda-source-1711043830": {
"s3://mockdatastack-sourcemockdata26422bfc-mpua0jb4rrh1/mock-sales/1711043826.gz": "00000000000016044634",
"s3://mockdatastack-sourcemockdata26422bfc-mpua0jb4rrh1/mock-sales/1711044126.gz": "00000000000016043012",
"s3://mockdatastack-sourcemockdata26422bfc-mpua0jb4rrh1/mock-sales/1711044426.gz": "00000000000016038941",
"s3://mockdatastack-sourcemockdata26422bfc-mpua0jb4rrh1/mock-sales/1711044726.gz": "00000000000016041053",
"s3://mockdatastack-sourcemockdata26422bfc-mpua0jb4rrh1/mock-sales/1711045026.gz": "00000000000016041903",
"s3://mockdatastack-sourcemockdata26422bfc-mpua0jb4rrh1/mock-sales/1711045326.gz": "00000000000016044080",
"s3://mockdatastack-sourcemockdata26422bfc-mpua0jb4rrh1/mock-sales/1711045626.gz": "00000000000016041526",
"s3://mockdatastack-sourcemockdata26422bfc-mpua0jb4rrh1/mock-sales/1711045925.gz": "00000000000016042481",
"s3://mockdatastack-sourcemockdata26422bfc-mpua0jb4rrh1/mock-sales/1711046225.gz": "00000000000016044227",
"s3://mockdatastack-sourcemockdata26422bfc-mpua0jb4rrh1/mock-sales/1711046526.gz": "00000000000016042532",
"s3://mockdatastack-sourcemockdata26422bfc-mpua0jb4rrh1/mock-sales/1711046825.gz": "00000000000016043210",
"s3://mockdatastack-sourcemockdata26422bfc-mpua0jb4rrh1/mock-sales/1711047126.gz": "00000000000016042689",
"s3://mockdatastack-sourcemockdata26422bfc-mpua0jb4rrh1/mock-sales/1711047426.gz": "00000000000016042582",
"s3://mockdatastack-sourcemockdata26422bfc-mpua0jb4rrh1/mock-sales/1711047726.gz": "00000000000016042450"
}
},
"create_timestamp": 1711039331,
"sources": [
{
"version": "0.8",
"source_id": "ingest-lambda-source-1711043830",
"num_pipelines": 1,
"enabled": true,
"source_type": "file",
"params": {
"filepath": "s3://mockdatastack-sourcemockdata26422bfc-mpua0jb4rrh1/mock-sales/1711043826.gz"
},
"input_format": "json"
}
]
},
4f9cde0
to
a3de045
Compare
4d0f3aa
to
f6065a4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd rather we always keep the last X checkpoints so as to make sure we are resilient to redundant notification, but i'm not sure how that can be done (embed a timestamp in the partition id, after a # maybe?). Anyway that's definitely an improvement
Yes, couldn't agree more. I logged in #4613 a few things I tried to achieve that and why they failed. We could definitively come up with a solution but it would require a more massive rewrite on the metastore or the source. |
d82b649
to
b352bd2
Compare
Description
Closes #4613
Avoid accumulating file sources when running Lambda indexer
How was this PR tested?
Describe how you tested this PR.