[QUESTION]A question about 'Resume' #677

AnthonyChouGit · 2020-03-05T14:08:17Z

Issue Template

Please use this template!

Initial Check

If the issue is a request please specify that it is a request in the title (Example: [REQUEST] more features). If this is a question regarding 'twint' please specify that it's a question in the title (Example: [QUESTION] What is x?). Please only submit issues related to 'twint'. Thanks.

Make sure you've checked the following:

[] Python version is 3.6;
[] Updated Twint with pip3 install --user --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint;
[] I have searched the issues and there are no duplicates of this issue/question/request.

Command Ran

Please provide the exact command ran including the username/search/code so I may reproduce the issue.

config = twint.Config()
config.Limit = 100000
config.Store_csv = True
config.Search = 'China'
config.Since = '2019-12-1'
config.Until = '2020-1-1'
config.Lang = 'en'
config.Output = '/root/datasets/unprocessed/China12.csv'
# config.Min_likes = 20
twint.run.Search(config)

I got this message:
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)

Description of Issue

Please use as much detail as possible.

I'm having the same problem as #670 . So I won't repeat the details. I guess it is because Twitter has updated its anti-scrawler system. I'm actually doing research on data science and seeking a way to retrieve a large amount of tweets efficiently. When I set Limit to 100,000, the retrieval process always stops at around 20,000.
So I consider using 'Resume' as an alternative to deal with this situation. But I can hardly find any information on your page about how to use it. The doc only mentions providing the path of a file containing the scroll id. But what is that file? Is that the csv file created by twint in the last loop? And, what is that scroll id? How could I get the so-call scroll id from the last loop and store it to a file? I suggest you provide a more specific explaination about it in the doc.
I really appreciate your work. Thank you.

Environment Details

Using Windows, Linux? What OS version? Running this in Anaconda? Jupyter Notebook? Terminal?

Centos

The text was updated successfully, but these errors were encountered:

pielco11 · 2020-03-05T17:31:13Z

config = twint.Config()
config.Limit = 100000
config.Store_csv = True
config.Search = 'China'
config.Since = '2019-12-1'
config.Until = '2020-1-1'
config.Lang = 'en'
config.Output = '/root/datasets/unprocessed/China12.csv'

config.Resume = "my_search_id_.txt"

# config.Min_likes = 20
twint.run.Search(config)

To resume the scrape, you have to provide the scroll id. This ID is stored in the requests made, so either you use config.Debug and extract the latest scroll id from twint-request_urls.log or you specify a custom file (in the example, my_search_id_.txt) and let twint do it for you.

config.Limit is just on twint side, Twitter is not aware of such param.

#604

dfreelon · 2020-03-11T15:43:23Z

May I request that the "Resume" entry on the Configuration wiki page be edited to reflect the information above? Currently the former reads:

Resume (string) - Resume from the latest scraped tweet ID, specify the filename that contains the ID.

This made me think Resume was looking for a file full of tweet IDs, not scroll IDs. It wasn't until I read this question that I realized what the issue was. This is an incredibly important feature of which more Twint users should be aware! Thanks for all your work.

pielco11 · 2020-03-12T09:43:18Z

My bad, that piece of wiki is outdated. 🙏

pielco11 · 2020-03-12T09:47:47Z

Updated

vidyap-xgboost · 2020-07-07T19:14:43Z

Updated

@pielco11

Hi,
Can you link the wiki page to this issue? That'd be really helpful! TIA.

jonatapaulino · 2021-10-05T12:53:28Z

I wanted to get tweeters from different regions here in Brazil. Would geo be the parameter I could use to do this search? Thanks.

leul12 · 2021-10-11T11:32:42Z

I wanted to get tweeters from different regions here in Brazil. Would geo be the parameter I could use to do this search? Thanks.

you might check the near arguments too

pielco11 added the question label Mar 5, 2020

pielco11 closed this as completed Mar 12, 2020

anarkia7115 mentioned this issue Sep 12, 2021

[REQUEST] get Tweets sent after a tweetid #1227

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION]A question about 'Resume' #677

[QUESTION]A question about 'Resume' #677

AnthonyChouGit commented Mar 5, 2020 •

edited

Loading

pielco11 commented Mar 5, 2020

dfreelon commented Mar 11, 2020

pielco11 commented Mar 12, 2020

pielco11 commented Mar 12, 2020

vidyap-xgboost commented Jul 7, 2020 •

edited

Loading

jonatapaulino commented Oct 5, 2021

leul12 commented Oct 11, 2021

[QUESTION]A question about 'Resume' #677

[QUESTION]A question about 'Resume' #677

Comments

AnthonyChouGit commented Mar 5, 2020 • edited Loading

Issue Template

Initial Check

Command Ran

Description of Issue

Environment Details

pielco11 commented Mar 5, 2020

dfreelon commented Mar 11, 2020

pielco11 commented Mar 12, 2020

pielco11 commented Mar 12, 2020

vidyap-xgboost commented Jul 7, 2020 • edited Loading

jonatapaulino commented Oct 5, 2021

leul12 commented Oct 11, 2021

AnthonyChouGit commented Mar 5, 2020 •

edited

Loading

vidyap-xgboost commented Jul 7, 2020 •

edited

Loading