Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor away from the .update file #174

Open
joshgarde opened this issue Jun 21, 2024 · 2 comments
Open

Refactor away from the .update file #174

joshgarde opened this issue Jun 21, 2024 · 2 comments

Comments

@joshgarde
Copy link
Member

joshgarde commented Jun 21, 2024

Issue
The current solution for maintaining the latest timestamp within a directory is via the .update hidden file. While this works, the solution is not portable or self evident to users.

Solution
Refactor data-subscriber to instead utilize file metadata within the directory to determine the next start datetime to fetch from. This solution removes the need to maintain a .update file which may disappear if the user copies the granules from one directory to another without noticing the .update file. Potential issues that may arise is if the user is utilizing the directory for other work and adding additional files after subscriber runs or if the user is subscribing to multiple granules into the same directory.

An alternative solution may be to perform granule downloads in descending order of timestamps such that any granule that's not found already in the directory is downloaded, but once the subscriber hits a granule that does exist (implying that was the last stop point), it ends its execution. This solution would skip the need to look for file metadata which may change unbeknownst to the user and may be inconsistent across filesystems. It would also enable support for subscribing to multiple datasets within the same directory.

@mike-gangl
Copy link
Contributor

it's been a while since i worked on this, but wanted to confirm- is this change only for the "downloader" tool, or is it for the subscribe tool as well? i'd be weary of changing the subscription feature because it's very purpose built- it's not meant to get data from the past (only data that are newly ingested, which could be "in the past" but has been recently updated". If you want to download various temporality, can't we just use the "data downloader" tool?

@joshgarde joshgarde changed the title Refactor timestamp mechanism in to better support mixed data in existing folders Refactor away from the .update file Jun 21, 2024
@joshgarde
Copy link
Member Author

Reworked the ticket to something I think is more workable for subscriber specifically. Lmk your thoughts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants