Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScraperException #841

Closed
topecole opened this issue Apr 20, 2023 · 2 comments
Closed

ScraperException #841

topecole opened this issue Apr 20, 2023 · 2 comments
Labels
duplicate This issue or pull request already exists

Comments

@topecole
Copy link

topecole commented Apr 20, 2023

Describe the bug

I got the below error when i attempted to scrape using the query - query = f'{topic}) near:"{location}" within:10km lang:en since:{start_date} until:{end_date} -filter:links -filter:retweet'

ERROR:snscrape.base:Error retrieving https://api.twitter.com/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&include_ext_is_blue_verified=1&include_ext_verified_type=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_ext_limited_action_results=false&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_ext_collab_control=true&include_ext_views=true&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&include_ext_trusted_friends_metadata=true&send_error_codes=true&simple_quoted_tweet=true&q=+%22+LP+%22+OR+%22+Peter+Obi+%22+OR+%22+PO+%22%29+near%3A%22Lagos%22+within%3A10km+lang%3Aen+since%3A2023-01-01+until%3A2023-04-14+-filter%3Alinks+-filter%3Aretweet&tweet_search_mode=live&count=20&query_source=spelling_expansion_revert_click&pc=1&spelling_corrections=1&include_ext_edit_control=true&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2Cenrichments%2CsuperFollowMetadata%2CunmentionInfo%2CeditControl%2Ccollab_control%2Cvibe: non-200 status code (401)
CRITICAL:snscrape.base:4 requests to https://api.twitter.com/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&include_ext_is_blue_verified=1&include_ext_verified_type=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_ext_limited_action_results=false&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_ext_collab_control=true&include_ext_views=true&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&include_ext_trusted_friends_metadata=true&send_error_codes=true&simple_quoted_tweet=true&q=+%22+LP+%22+OR+%22+Peter+Obi+%22+OR+%22+PO+%22%29+near%3A%22Lagos%22+within%3A10km+lang%3Aen+since%3A2023-01-01+until%3A2023-04-14+-filter%3Alinks+-filter%3Aretweet&tweet_search_mode=live&count=20&query_source=spelling_expansion_revert_click&pc=1&spelling_corrections=1&include_ext_edit_control=true&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2Cenrichments%2CsuperFollowMetadata%2CunmentionInfo%2CeditControl%2Ccollab_control%2Cvibe failed, giving up.
CRITICAL:snscrape.base:Errors: non-200 status code (401), non-200 status code (401), non-200 status code (401), non-200 status code (401)
---------------------------------------------------------------------------
ScraperException                          Traceback (most recent call last)
[<ipython-input-8-e1606628dfba>](https://localhost:8080/#) in <cell line: 8>()
      6 
      7     # Use snscrape to scrape tweets
----> 8 for i, tweet in enumerate(sntwitter.TwitterSearchScraper(query).get_items()):
      9       if i > max_tweets:
     10           break

4 frames
[/usr/local/lib/python3.9/dist-packages/snscrape/modules/twitter.py](https://localhost:8080/#) in get_items(self)
   1659                 del params['cursor']
   1660 
-> 1661                 for obj in self._iter_api_data('https://api.twitter.com/2/search/adaptive.json', _TwitterAPIType.V2, params, paginationParams, cursor = self._cursor):
   1662                         yield from self._v2_timeline_instructions_to_tweets_or_users(obj)
   1663 

[/usr/local/lib/python3.9/dist-packages/snscrape/modules/twitter.py](https://localhost:8080/#) in _iter_api_data(self, endpoint, apiType, params, paginationParams, cursor, direction)
    759                 while True:
    760                         _logger.info(f'Retrieving scroll page {cursor}')
--> 761                         obj = self._get_api_data(endpoint, apiType, reqParams)
    762                         yield obj
    763 

[/usr/local/lib/python3.9/dist-packages/snscrape/modules/twitter.py](https://localhost:8080/#) in _get_api_data(self, endpoint, apiType, params)
    725                 if apiType is _TwitterAPIType.GRAPHQL:
    726                         params = urllib.parse.urlencode({k: json.dumps(v, separators = (',', ':')) for k, v in params.items()}, quote_via = urllib.parse.quote)
--> 727                 r = self._get(endpoint, params = params, headers = self._apiHeaders, responseOkCallback = self._check_api_response)
    728                 try:
    729                         obj = r.json()

[/usr/local/lib/python3.9/dist-packages/snscrape/base.py](https://localhost:8080/#) in _get(self, *args, **kwargs)
    249 
    250         def _get(self, *args, **kwargs):
--> 251                 return self._request('GET', *args, **kwargs)
    252 
    253         def _post(self, *args, **kwargs):

[/usr/local/lib/python3.9/dist-packages/snscrape/base.py](https://localhost:8080/#) in _request(self, method, url, params, data, headers, timeout, responseOkCallback, allowRedirects, proxies)
    245                         _logger.fatal(msg)
    246                         _logger.fatal(f'Errors: {", ".join(errors)}')
--> 247                         raise ScraperException(msg)
    248                 raise RuntimeError('Reached unreachable code')
    249 

ScraperException: 4 requests to https://api.twitter.com/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&include_ext_is_blue_verified=1&include_ext_verified_type=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_ext_limited_action_results=false&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_ext_collab_control=true&include_ext_views=true&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&include_ext_trusted_friends_metadata=true&send_error_codes=true&simple_quoted_tweet=true&q=+%22+LP+%22+OR+%22+Peter+Obi+%22+OR+%22+PO+%22%29+near%3A%22Lagos%22+within%3A10km+lang%3Aen+since%3A2023-01-01+until%3A2023-04-14+-filter%3Alinks+-filter%3Aretweet&tweet_search_mode=live&count=20&query_source=spelling_expansion_revert_click&pc=1&spelling_corrections=1&include_ext_edit_control=true&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2Cenrichments%2CsuperFollowMetadata%2CunmentionInfo%2CeditControl%2Ccollab_control%2Cvibe failed, giving up.

How to reproduce

location = 'Lagos'
start_date = '2023-01-01'
end_date = '2023-04-14'
max_tweets = 1000
topic = 'LP OR Peter Obi OR PO'

# Create query for snscrape
query = f'{topic}) near:"{location}" within:10km lang:en since:{start_date} until:{end_date} -filter:links -filter:retweet'

# Create empty list to store tweets
tweets_list = []

    # Use snscrape to scrape tweets
for i, tweet in enumerate(sntwitter.TwitterSearchScraper(query).get_items()):
      if i > max_tweets:
          break
      tweets_list.append([tweet.date, tweet.rawContent, tweet.user.username, tweet.viewCount])
        
# Create a Pandas DataFrame from the list of tweets
tweets_df = pd.DataFrame(tweets_list, columns=['Date', 'Text', 'Username', 'Views'])
tweets_df['Date'] = tweets_df['Date'].dt.date

Expected behaviour

I expect a data frame named - tweets_df containing 1000 tweets with the keywords 'LP OR Peter Obi OR PO' within:10km of 'Lagos' between the period '2023-01-01' and '2023-04-14'

Screenshots and recordings

No response

Operating system

Google Collab

Python version: output of python3 --version

Python 3.9.16

snscrape version: output of snscrape --version

snscrape 0.6.2.20230320

Scraper

TwitterSearchScraper

How are you using snscrape?

Module (import snscrape.modules.something in Python code)

Backtrace

No response

Log output

snscrape.base.ScraperException: 4 requests to https://api.twitter.com/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&include_ext_is_blue_verified=1&include_ext_verified_type=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_ext_limited_action_results=false&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_ext_collab_control=true&include_ext_views=true&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&include_ext_trusted_friends_metadata=true&send_error_codes=true&simple_quoted_tweet=true&q=NHS%29+near%3A%22London%22+within%3A10km+lang%3Aen+since%3A2023-01-21+until%3A2023-01-30+-filter%3Alinks+-filter%3Aretweet&tweet_search_mode=live&count=20&query_source=spelling_expansion_revert_click&pc=1&spelling_corrections=1&include_ext_edit_control=true&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2Cenrichments%2CsuperFollowMetadata%2CunmentionInfo%2CeditControl%2Ccollab_control%2Cvibe failed, giving up.

Dump of locals

No response

Additional context

No response

@topecole topecole added the bug Something isn't working label Apr 20, 2023
@TheTechRobo
Copy link
Contributor

#834 ?

@JustAnotherArchivist JustAnotherArchivist added duplicate This issue or pull request already exists and removed bug Something isn't working labels Apr 20, 2023
@JustAnotherArchivist JustAnotherArchivist closed this as not planned Won't fix, can't repro, duplicate, stale Apr 20, 2023
@feusagittaire
Copy link

I think that's a new problem indeed. I faced the same issue today, and only today bc yesterday snscrape was working usually. To fix it, i included a try/except with a time.wait(0.5) argument, and then a new requisition is made. Apparently Twitter is limiting the requisitions within a certain timeframe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

4 participants