-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[QUESTION]A question about 'Resume' #677
Comments
config = twint.Config()
config.Limit = 100000
config.Store_csv = True
config.Search = 'China'
config.Since = '2019-12-1'
config.Until = '2020-1-1'
config.Lang = 'en'
config.Output = '/root/datasets/unprocessed/China12.csv'
config.Resume = "my_search_id_.txt"
# config.Min_likes = 20
twint.run.Search(config) To resume the scrape, you have to provide the scroll id. This ID is stored in the requests made, so either you use
|
May I request that the "Resume" entry on the Configuration wiki page be edited to reflect the information above? Currently the former reads:
This made me think Resume was looking for a file full of tweet IDs, not scroll IDs. It wasn't until I read this question that I realized what the issue was. This is an incredibly important feature of which more Twint users should be aware! Thanks for all your work. |
My bad, that piece of wiki is outdated. 🙏 |
Updated |
Hi, |
I wanted to get tweeters from different regions here in Brazil. Would geo be the parameter I could use to do this search? Thanks. |
you might check the near arguments too |
Issue Template
Please use this template!
Initial Check
pip3 install --user --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint
;Command Ran
I got this message:
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
Description of Issue
I'm having the same problem as #670 . So I won't repeat the details. I guess it is because Twitter has updated its anti-scrawler system. I'm actually doing research on data science and seeking a way to retrieve a large amount of tweets efficiently. When I set Limit to 100,000, the retrieval process always stops at around 20,000.
So I consider using 'Resume' as an alternative to deal with this situation. But I can hardly find any information on your page about how to use it. The doc only mentions providing the path of a file containing the scroll id. But what is that file? Is that the csv file created by twint in the last loop? And, what is that scroll id? How could I get the so-call scroll id from the last loop and store it to a file? I suggest you provide a more specific explaination about it in the doc.
I really appreciate your work. Thank you.
Environment Details
Centos
The text was updated successfully, but these errors were encountered: