Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover from stream failure with backfill #477

Open
edsu opened this issue Jun 12, 2021 · 2 comments
Open

Recover from stream failure with backfill #477

edsu opened this issue Jun 12, 2021 · 2 comments
Labels

Comments

@edsu
Copy link
Member

edsu commented Jun 12, 2021

This came up when addressing #462 and is worth recording as an enhancement idea. To guarantee that data is not lost during streaming it ought to be possible to keep track of the last tweet id and then use it to fetch data that was missed.

  1. Record the ID of the last tweet received before the interruption.
  2. Record the ID of the first tweet received after the interruption.
  3. Spawn a new thread that uses search to read in tweets with IDs between the “before” and “after” IDs.

One little wrinkle here is that the search query that is used will need to be constructed on the fly using the current stream rules. There is also a question of where to write the additional data.

@igorbrigadir
Copy link
Contributor

igorbrigadir commented Jun 29, 2021

@igorbrigadir igorbrigadir changed the title Recover from stream failure with a complementary search Recover from stream failure with backfill Jun 30, 2021
@igorbrigadir
Copy link
Contributor

#549 added the extra parameter to the client2 already, so we need the part that takes a command line parameter and or checks the output file and resumes. I'm inclined to say we should always grab the max replay duration, everything else can deal with duplicates downstream. But something smarter might be good.

@igorbrigadir igorbrigadir added the good first issue Good for newcomers label Mar 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants