Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[twitter] 403 Forbidden during pagination caused by expired guest token #3445

Closed
ClosedPort22 opened this issue Dec 22, 2022 · 4 comments
Closed
Labels

Comments

@ClosedPort22
Copy link
Contributor

ClosedPort22 commented Dec 22, 2022

I recently noticed that sometimes a download job would terminate with a 403 Forbidden error during pagination. I scraped around 10 large accounts (>3200 tweets) in the past few months and got 7 errors. Only one of them came from the UserTweetsAndReplies endpoint, while the others were from the search endpoint. What's interesting is that instead of the generic Forbidden. message, the former API endpoint reported Bad guest token. Also, it's possible to manually "resume" the search by using the same max_id as the failed request, so I don't think it's because of the API's limits.

@ClosedPort22
Copy link
Contributor Author

Oh, I forgot to mention an important detail: I use long delays between downloads. I'll try reducing sleep and see if the problem persists.

@Hrxn
Copy link
Contributor

Hrxn commented Dec 23, 2022

Not sure...
Anyway, some indication as how to potentially reproduce this would be nice, I guess? 😃

@ClosedPort22
Copy link
Contributor Author

ClosedPort22 commented Dec 23, 2022

Not sure... Anyway, some indication as how to potentially reproduce this would be nice, I guess? 😃

The errors occurred around 3 hours after activating the guest token. I'm running gallery-dl --ignore-config --verbose -o sleep=11400 -o cache.file=tempcache.sqlite3 --no-download https://twitter.com/search?q=from:elonmusk. tempcache.sqlite3 is used to bypass the cached token. Will update after 3 hours and 10 minutes.

@ClosedPort22
Copy link
Contributor Author

ClosedPort22 commented Dec 23, 2022

I have successfully reproduced the issue:

> gallery-dl --ignore-config --verbose -o sleep=11400 -o cache.file=tempcache.sqlite3 --no-download https://twitter.com/search?q=from:elonmusk. tempcache.sqlite3
[gallery-dl][debug] Version 1.24.2 - Git HEAD: 2d7d80d3
[gallery-dl][debug] Python 3.10.0 - Windows-10-10.0.19044-SP0
[gallery-dl][debug] requests 2.26.0 - urllib3 1.26.7
[gallery-dl][debug] Configuration Files []
[gallery-dl][debug] Starting DownloadJob for 'https://twitter.com/search?q=from:elonmusk'
[twitter][debug] Using TwitterSearchExtractor for 'https://twitter.com/search?q=from:elonmusk'
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): api.twitter.com:443
[urllib3.connectionpool][debug] https://api.twitter.com:443 "POST /1.1/guest/activate.json HTTP/1.1" 200 62
[urllib3.connectionpool][debug] Starting new HTTPS connection (1): twitter.com:443
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/graphql/7mjxD3-C6BxitPMVQ6w0-Q/UserByScreenName?variables=%7B%22screen_name%22%3A%22elonmusk%22%2C%22withSafetyModeUserFields%22%3Atrue%2C%22withSuperFollowsUserFields%22%3Atrue%7D HTTP/1.1" 200 800
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&send_error_codes=true&simple_quoted_tweet=true&count=100&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2CsuperFollowMetadata&q=from%3Aelonmusk&tweet_search_mode=live&query_source=typed_query&pc=1&spelling_corrections=1 HTTP/1.1" 200 7617
[twitter][debug] Skipping 1605995788613476355 (quoted tweet)
[twitter][debug] Skipping 1605948247012306944 (quoted tweet)
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&send_error_codes=true&simple_quoted_tweet=true&count=100&cursor=scroll%3AthGAVUV0VFVBaAwKih0a7AySwWgMCi7a6tmcosEnEV8IV6FYCJehgHREVGQVVMVDUBFQAVAAA%3D&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2CsuperFollowMetadata&q=from%3Aelonmusk&tweet_search_mode=live&query_source=typed_query&pc=1&spelling_corrections=1 HTTP/1.1" 200 8999
[twitter][debug] Skipping 1605651865818914816 (quoted tweet)
[twitter][debug] Skipping 1605349852199936000 (quoted tweet)
[twitter][debug] Skipping 1605590368841388033 (quoted tweet)
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&send_error_codes=true&simple_quoted_tweet=true&count=100&cursor=scroll%3AthGAVUV0VFVBaAgKr1tY3mxywWgMCi7a6tmcosEnEV4IJ6FYCJehgHREVGQVVMVDUBFQIVAAA%3D&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2CsuperFollowMetadata&q=from%3Aelonmusk&tweet_search_mode=live&query_source=typed_query&pc=1&spelling_corrections=1 HTTP/1.1" 200 5947
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&send_error_codes=true&simple_quoted_tweet=true&count=100&cursor=scroll%3AthGAVUV0VFVBaAwKDdtty0xywWgMCi7a6tmcosEnEV0P95FYCJehgHREVGQVVMVDUBFQQVAAA%3D&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2CsuperFollowMetadata&q=from%3Aelonmusk&tweet_search_mode=live&query_source=typed_query&pc=1&spelling_corrections=1 HTTP/1.1" 200 6951
[twitter][debug] Sleeping 11400.00 seconds (download)
* .\gallery-dl\twitter\elonmusk\1605366000022564864_1.jpg
[urllib3.connectionpool][debug] Resetting dropped connection: twitter.com
[urllib3.connectionpool][debug] Starting new HTTPS connection (2): twitter.com:443
[urllib3.connectionpool][debug] https://twitter.com:443 "GET /i/api/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&send_error_codes=true&simple_quoted_tweet=true&count=100&cursor=scroll%3AthGAVUV0VFVBaEgLPxq9CWxywWgMCi7a6tmcosEnEVwPx5FYCJehgHREVGQVVMVDUBFQYVAAA%3D&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2CsuperFollowMetadata&q=from%3Aelonmusk&tweet_search_mode=live&query_source=typed_query&pc=1&spelling_corrections=1 HTTP/1.1" 403 73
[twitter][error] 403 Forbidden (Forbidden.)

Judging from the logs of the past download jobs, if there were any API calls in between, they would not have reset the timer.

@ClosedPort22 ClosedPort22 changed the title [twitter] 403 Forbidden during pagination: guest token expired? [twitter] 403 Forbidden during pagination caused by expired guest token Dec 23, 2022
ClosedPort22 added a commit to ClosedPort22/gallery-dl that referenced this issue Dec 24, 2022
@mikf mikf added the bug label Jan 1, 2023
@mikf mikf closed this as completed Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants