Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

503 Error with search/all #449

Closed
edsu opened this issue Apr 29, 2021 · 18 comments
Closed

503 Error with search/all #449

edsu opened this issue Apr 29, 2021 · 18 comments

Comments

@edsu
Copy link
Member

edsu commented Apr 29, 2021

I noticed when searching with --archive that I occasionally am getting this in the log:

2021-04-29 10:14:07,737 WARNING 503 from Twitter API, sleeping 60

I think this is related?

https://twittercommunity.com/t/perpetual-503-v2-full-archive-search-default-vs-all-fields/151441

Hopefully it's a bug that will be fixed on Twitter's end. But if not we may need a way to dial down the max_results.

@edsu
Copy link
Member Author

edsu commented Apr 29, 2021

I've confirmed that others (Jeffrey Sauer) are seeing this behavior too. I'm going to test lowering to 100 to see if that helps.

edsu added a commit that referenced this issue Apr 29, 2021
This commit adds the --max-results option to search which can control
the max_results parameter that is sent to the search/recent and
search/all endpoints. Ordinarily everyone would want to request the
maximum number of tweets per request, so as to maximize the 15 minute
request quota limit. But some occasional 503 errors have been observed
coming from the search/all endpoint and a forum post has suggested that
dialing max_results down to 100 can help.

Refs #449
@edsu
Copy link
Member Author

edsu commented Apr 29, 2021

I'm going to leave this open to track what's going on Twitter. But there is a --max-results option for search in v2.0.10. It does seem that dialing that back to 100 when using --archive works.

@edsu
Copy link
Member Author

edsu commented May 26, 2021

I haven't been seeing this recently, so I'm gonna close it and hopefully it won't pop up again.

@edsu edsu closed this as completed May 26, 2021
@krzysztoffiok
Copy link

krzysztoffiok commented Jun 22, 2021

Hi,

With twarc below 2.2.0 no matter the max results parameter I got:
WARNING 503 from Twitter API, sleeping 60 and later longer times

Now, today I tried again with twarc 2.2.0 and tested max results 500 which failed, max results 100 seemed to work but after waiting some time failed as well.

This is all so funny with Twitter, a month and a half ago it worked great with max results 500...

Each time I try, it downloads some tweets, for instance 50 MB, but then all this rubbish happens.

2021-06-22 05:35:22,025 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'media.fields': 'duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'max_results': 100, 'query': '"xxx" lang:en', 'start_time': '2019-01-01T00:00:00+00:00', 'end_time': '2020-04-21T00:00:00+00:00', 'next_token': 'b26v89c19zqg8o3fo7adz05wkk8462ooyc4m3o6pqvkot'}}
2021-06-22 05:35:26,545 WARNING 503 from Twitter API, sleeping 60
2021-06-22 05:36:26,605 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'media.fields': 'duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'max_results': 100, 'query': '"xxx" lang:en', 'start_time': '2019-01-01T00:00:00+00:00', 'end_time': '2020-04-21T00:00:00+00:00', 'next_token': 'b26v89c19zqg8o3fo7adz05wkk8462ooyc4m3o6pqvkot'}}
2021-06-22 05:36:31,618 WARNING 503 from Twitter API, sleeping 120
2021-06-22 05:38:31,718 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'media.fields': 'duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'max_results': 100, 'query': '"xxx" lang:en', 'start_time': '2019-01-01T00:00:00+00:00', 'end_time': '2020-04-21T00:00:00+00:00', 'next_token': 'b26v89c19zqg8o3fo7adz05wkk8462ooyc4m3o6pqvkot'}}
2021-06-22 05:38:36,729 WARNING 503 from Twitter API, sleeping 180
2021-06-22 05:41:36,829 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'media.fields': 'duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'max_results': 100, 'query': '"xxx" lang:en', 'start_time': '2019-01-01T00:00:00+00:00', 'end_time': '2020-04-21T00:00:00+00:00', 'next_token': 'b26v89c19zqg8o3fo7adz05wkk8462ooyc4m3o6pqvkot'}}
2021-06-22 05:41:41,836 WARNING 503 from Twitter API, sleeping 240
2021-06-22 05:45:41,937 INFO getting ('https://api.twitter.com/2/tweets/search/all',) {'params': {'expansions': 'author_id,in_reply_to_user_id,referenced_tweets.id,referenced_tweets.id.author_id,entities.mentions.username,attachments.poll_ids,attachments.media_keys,geo.place_id', 'user.fields': 'created_at,description,entities,id,location,name,pinned_tweet_id,profile_image_url,protected,public_metrics,url,username,verified,withheld', 'tweet.fields': 'attachments,author_id,context_annotations,conversation_id,created_at,entities,geo,id,in_reply_to_user_id,lang,public_metrics,text,possibly_sensitive,referenced_tweets,reply_settings,source,withheld', 'media.fields': 'duration_ms,height,media_key,preview_image_url,type,url,width,public_metrics', 'poll.fields': 'duration_minutes,end_datetime,id,options,voting_status', 'place.fields': 'contained_within,country,country_code,full_name,geo,id,name,place_type', 'max_results': 100, 'query': '"xxx" lang:en', 'start_time': '2019-01-01T00:00:00+00:00', 'end_time': '2020-04-21T00:00:00+00:00', 'next_token': 'b26v89c19zqg8o3fo7adz05wkk8462ooyc4m3o6pqvkot'}}
2021-06-22 05:45:46,945 WARNING 503 from Twitter API, sleeping 300

@edsu
Copy link
Member Author

edsu commented Jun 22, 2021

Yes, I think this has been observed over in the Twitter Forum:

https://twittercommunity.com/t/full-archive-search-returns-503-error-randomly-with-different-sets-of-parameters/154764

I think it might be worth commenting there so Twitter improve their service. For the moment you could lower your requested records to 100 and see if that fares any better.

twarc2 search politics --archive --max-results 100 

I wonder also if it might make sense for twarc to sleep less time when this error occurs.

@krzysztoffiok
Copy link

@edsu thanks for commenting, I've tried already various max-results values, 100, 50 both didn't work as well.

@edsu
Copy link
Member Author

edsu commented Jun 22, 2021

I'm trying right now with a query of xxx and it seems to be working with --max-results 100?

@krzysztoffiok
Copy link

i didn't do xxx, i just deleted my original query (sorry). Also, my query worked for like 10 minutes and failed later on.

@edsu
Copy link
Member Author

edsu commented Jun 22, 2021

ok, i'll let this run for a bit. xxx does return things of course :-)

Was your query fairly simple or did it involve different filters and logic?

Of course, Twitter's API should work as advertised and this sort of noodling around is kind of ridiculous.

@krzysztoffiok
Copy link

it was just a single word.

@edsu
Copy link
Member Author

edsu commented Jun 22, 2021

Does it help at all if you define a --start-time?

@krzysztoffiok
Copy link

i do define both start time, end time, max results. Please give me a sec, I'll copy the code:

   filename = f'{word}_{start_time[:10]}_{end_time[:10]}.json'

    bashCommand = f"twarc2 search --archive --start-time {start_time} --end-time {end_time} --max-results 100"
    bashCommand = bashCommand.split()
    bashCommand.append(query)
    bashCommand.append(filename)

@edsu
Copy link
Member Author

edsu commented Jun 22, 2021

Ahh, my --max-results 100 query just got a 503 error too, but then picked back up again after 60 seconds. That initial sleep could be shorter I suppose...

@krzysztoffiok
Copy link

in my case it sometimes picked up, but later got stuck again. and i tried really many times. well, I'll just hope they'll fix it.

p.s.
how can I limit the tweet fields I download with twarc? I don't need all the info and maybe that would help?

@edsu
Copy link
Member Author

edsu commented Jun 23, 2021

@krzysztoffiok is it working better for you today? The search/all endpoint seems a bit more stable for me. Admittedly this feels a bit like asking what the weather is going to be like today...

@krzysztoffiok
Copy link

@edsu it worked. Thank you for your interest!
I had to wait quite a while, but in the end I got what I wanted. So probably --max-results 100 increases the probability that one gets what he wants, but owing to Twitter this probability is only close to 1 and depends on pure luck. One should not get discouraged after initial failure, only try for a couple of days.

@edsu
Copy link
Member Author

edsu commented Jun 28, 2021

Thanks for the follow up information. Just out of curiosity how many tweets did you end up collecting?

@krzysztoffiok
Copy link

@edsu in this particular search it was 5273637 tweets in a two year time period.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants