You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have been trying to scrape Twitter, it worked just fine up until yesterday. My version of snscrape is updated, but it still gives me this error after 8 seconds.
I know this has been raised before, but I cannot find a solution from what was already mentioned in the past questions at the moment.
import snscrape.modules.twitter as sntwitter
import pandas as pd
import re
import string
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
import time
# define the search query
query = "(plastic OR environment OR pollution OR packaging OR waste OR climate OR sustainability) (@Unilever) until:2019-10-07 since:2019-09-29"
# define a list of stopwords
stop_words = set(stopwords.words('english'))
# define the list of tweets
tweets = []
limit=500
# loop through the search results and clean the text of each tweet
for i, tweet in enumerate(sntwitter.TwitterSearchScraper(query).get_items()):
if i == limit:
break
else:
# clean the text of the tweet
text = tweet.content.lower()
text = re.sub(r'http\S+', '', text) # remove URLs
text = re.sub(r'@\w+', '', text) # remove mentions
hashtags = re.findall(r'#\w+', text) # find hashtags
cleaned_hashtags = [re.sub(r'#', '', hashtag) for hashtag in hashtags] # remove hash from hashtags
text = re.sub(r'#\w+', ' '.join(cleaned_hashtags), text) # replace hashtags with cleaned hashtags
text = text.translate(str.maketrans('', '', string.punctuation)) # remove punctuation
words = [word for word in text.split() if word not in stop_words] # remove stop words
cleaned_text = ' '.join(words)
# check if the cleaned text is empty, and skip the tweet if it is
if not cleaned_text:
continue
# add the cleaned text and other tweet data to the list
tweets.append([tweet.date, tweet.username, cleaned_text])
# pause the code for 1 second before making the next request
time.sleep(3)
# create a dataframe from the list of tweets and save to CSV
df = pd.DataFrame(tweets, columns=['Date', 'User', 'Tweet'])
df.to_csv('unilever_tweets.csv', index=False)
The text was updated successfully, but these errors were encountered:
I have been trying to scrape Twitter, it worked just fine up until yesterday. My version of snscrape is updated, but it still gives me this error after 8 seconds.
I know this has been raised before, but I cannot find a solution from what was already mentioned in the past questions at the moment.
ScraperException: 4 requests to https://api.twitter.com/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&include_ext_is_blue_verified=1&include_ext_verified_type=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_ext_limited_action_results=false&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_ext_collab_control=true&include_ext_views=true&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&include_ext_trusted_friends_metadata=true&send_error_codes=true&simple_quoted_tweet=true&q=%28plastic+OR+environment+OR+pollution+OR+packaging+OR+waste+OR+climate+OR+sustainability%29+%28%40Unilever%29+until%3A2019-10-07+since%3A2019-09-29&tweet_search_mode=live&count=20&query_source=spelling_expansion_revert_click&pc=1&spelling_corrections=1&include_ext_edit_control=true&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2Cenrichments%2CsuperFollowMetadata%2CunmentionInfo%2CeditControl%2Ccollab_control%2Cvibe failed, giving up.
This is my code:
The text was updated successfully, but these errors were encountered: