Resuming collection after interruption #589

igorbrigadir · 2022-01-27T21:33:37Z

It should be possible to pick up where you left off, for a long running search - if you read the last request and use the pagination token. As far as i can tell, these pagination tokens don't expire. Will try and see how long they last (in case they do). This could also be used to resume searches across months - if you run out of your monthly quota or something.

Ideally it will be a --resume option - with a @resumable decorator on the command line options that can read an existing file, and continue appending and paginating given an existing pagination token. The client code should already support this, so these changes are mostly in the command line tool.

The text was updated successfully, but these errors were encountered:

Michael-Gauthier · 2022-01-28T13:18:49Z

Hello @igorbrigadir, and everyone else! This is pretty crazy, because I came here to post the exact same question because of disconnection problems I randomly have in my office... So I do not really understand: is it a feature you are planning to add, or is it already possible to resume collections after interruptions? Thanks a lot in advance, as usual! : )

edsu · 2022-01-28T13:39:59Z

I agree it would be a nice feature to have (it doesn't exist yet), and shouldn't be too tricky given next_token is already persisted to the output?

Michael-Gauthier · 2022-01-28T13:52:42Z

Gods, when it is implemented, I'll send you guys chocolates or whatever, that will be so useful! ^^

Thanks again for your hard work and efforts to keep improving the tool by the way! : )

igorbrigadir · 2022-01-28T13:59:14Z

is it a feature you are planning to add, or is it already possible to resume collections after interruptions?

Kinda both, it doesn't exist in the command line twarc yet, but it's possible to do this with the library, https://twittercommunity.com/t/pulling-a-large-data-set-with-twarc2-client/165685/6?u=igorbrigadir all you need is to extract a next_token and specify it as a parameter into the function and it should continue - but i need to double check that, and make sure that using old next_token values actually works like this.

Michael-Gauthier · 2022-01-28T14:08:40Z

Thanks for the feedback, and for the link to the thread! So, if I understand correctly, I would need to find the next_token, add this to my "normal" twarc2 query program, and it should resume where it stopped?

igorbrigadir · 2022-01-28T14:15:57Z

Yes, except there's no way to add the next_token to twarc2 yet - i'll have add that part in.

Michael-Gauthier · 2022-01-28T14:23:03Z

Ok, thanks a lot for your reactivity and your clarification! : )

SamHames · 2022-09-16T12:04:22Z

Thinking about this because of #656 - I think there's two layers to this:

The client methods need to consistently take the pagination token as a keyword argument everywhere (this is the easy bit)
For the command line, it's actually a bit harder when we consider the bulk commands that are actually the biggest targets for resuming an interrupted operation.
- At a minimum to resume in these cases we'd have to think about reading the last page of results, confirming whether we need to resume that last collected search based on the presence/absence of a next_page token, then find the appropriate point in the input file to resume from.
- We'd also have to think about the actual file writing workflow, because we'd have to consider whether we resume from and continue appending to the same file, or do something else. My suggestion is that if the --resume argument is passed, then it makes sense to read from the output file to sync the state, then continue appending to that file.

igorbrigadir added the enhancement label Jan 27, 2022

igorbrigadir added the good first issue Good for newcomers label Mar 20, 2022

SamHames mentioned this issue Sep 16, 2022

User ID causes 503 error and trips timelines command #656

Closed

SamHames removed the good first issue Good for newcomers label Sep 16, 2022

SamHames mentioned this issue Dec 18, 2022

Is it possible to use "next_token"? #676

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resuming collection after interruption #589

Resuming collection after interruption #589

igorbrigadir commented Jan 27, 2022

Michael-Gauthier commented Jan 28, 2022

edsu commented Jan 28, 2022 •

edited

Loading

Michael-Gauthier commented Jan 28, 2022

igorbrigadir commented Jan 28, 2022

Michael-Gauthier commented Jan 28, 2022 •

edited

Loading

igorbrigadir commented Jan 28, 2022

Michael-Gauthier commented Jan 28, 2022

SamHames commented Sep 16, 2022

Resuming collection after interruption #589

Resuming collection after interruption #589

Comments

igorbrigadir commented Jan 27, 2022

Michael-Gauthier commented Jan 28, 2022

edsu commented Jan 28, 2022 • edited Loading

Michael-Gauthier commented Jan 28, 2022

igorbrigadir commented Jan 28, 2022

Michael-Gauthier commented Jan 28, 2022 • edited Loading

igorbrigadir commented Jan 28, 2022

Michael-Gauthier commented Jan 28, 2022

SamHames commented Sep 16, 2022

edsu commented Jan 28, 2022 •

edited

Loading

Michael-Gauthier commented Jan 28, 2022 •

edited

Loading