All Twitter scrapes are failing: `blocked (404)` #996

JustAnotherArchivist · 2023-06-30T19:12:17Z

With the exception of twitter-trends, all Twitter scrapes are failing since sometime in the past hour. This is likely connected to Twitter as a whole getting locked behind a login wall since earlier today. There is no known workaround at this time, and it's not known whether this will be fixable.

The text was updated successfully, but these errors were encountered:

yeahjack · 2023-06-30T19:26:54Z

So sad :-(
My research project is strongly related to this lib, and pay tribute to your effort in maintaining this.

viktorzen · 2023-06-30T21:05:01Z

Twitter disabled their public web site today (2023-06-30) and require users to login, twitter used to be public prior to this date. Would it be possible to automate the login as well providing a username and pw to snscrape, i.e. before calling a graphql api to login to twitter and simulate a logged-in session?

yeahjack · 2023-06-30T21:13:17Z

I do not think the developer would do this, as he said that auth would never be added into features: see #270 .
Let's see what our great developers' solution, hope it would not take long.

midnightmagic · 2023-06-30T21:45:06Z

Please consider deleting my prior off-topic comment.

Don't nuke this one as off-topic: A Twitter employee says it's temporary:

https://twitter.com/AqueelMiq/status/1674843555486134272
"this is a temporary restriction, we will re-enable logged out twitter access in the near future"

Wouze · 2023-07-01T01:25:13Z

Elon talked about it too 💀
https://twitter.com/elonmusk/status/1674942336583757825

khorg0sh · 2023-07-01T05:40:16Z

Elon talked about it too 💀 https://twitter.com/elonmusk/status/1674942336583757825

Musk referred to EXTREME scraping, indicating that scrapers may no longer be functional post changes. Let's see how it is done.

Benniepie · 2023-07-01T09:39:16Z

Hello,

This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com):
https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505

Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web/*/https://twitter.com/tesla/status*)

If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.

Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated

Ben

arfathyahiya · 2023-07-01T14:18:40Z

Hello,

This may or may not help. Here's a route to access Tweets without logging in (contains further iframe to platform.twitter.com): https://cdn.embedly.com/widgets/media.html?type=text%2Fhtml&key=a19fcc184b9711e1b4764040d3dc5c07&schema=twitter&url=https://twitter.com/elonmusk/status/1674865731136020505

Would combining this with a pre-existing list of Tweets allow data scraping to continue? Alternatively users could build the tweet list using google search, e.g. for Tesla tweets: "site:twitter.com/tesla/status" or via another cached list (e.g. Waybackmachine - https://web.archive.org/web//https://twitter.com/tesla/status)

If I'm off the mark, I apologise but thought I'd pass this on, on the off chance it may help at least as a temporary measure.

Just a note to @JustAnotherArchivist - thank you for the hard work you have put into this library - it is very much appreciated

Ben

URL: https://cdn.syndication.twimg.com/tweet-result

CODE:

import requests

url = "https://cdn.syndication.twimg.com/tweet-result"

querystring = {"id":"1652193613223436289","lang":"en"}

payload = ""
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Origin": "https://platform.twitter.com",
    "Connection": "keep-alive",
    "Referer": "https://platform.twitter.com/",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "cross-site",
    "Pragma": "no-cache",
    "Cache-Control": "no-cache",
    "TE": "trailers"
}

response = requests.request("GET", url, data=payload, headers=headers, params=querystring)

print(response.text)

Generated by Insomnia

Write · 2023-07-01T17:22:41Z

https://twitter.com/elonmusk/status/1675187969420828672

😂

@ElonMusk
To address extreme levels of data scraping & system manipulation, we’ve applied the following temporary limits:

Verified accounts are limited to reading 6000 posts/day

Unverified accounts to 600 posts/day

New unverified accounts to 300/day

Fa5g · 2023-07-02T00:04:30Z

Scraping seems to be still possible, check this:

https://rss-bridge.org/bridge01/?action=display&bridge=TwitterBridge&context=By+username&u=elonmusk&format=html

https://rss-bridge.org/bridge01/?action=display&bridge=TwitterBridge&context=By+username&u=elonmusk&format=json

By https://github.com/RSS-Bridge/rss-bridge

Write · 2023-07-02T06:36:57Z

Scraping seems to be still possible, check this:

https://rss-bridge.org/bridge01/?action=display&bridge=TwitterBridge&context=By+username&u=elonmusk&format=html

https://rss-bridge.org/bridge01/?action=display&bridge=TwitterBridge&context=By+username&u=elonmusk&format=json

By https://github.com/RSS-Bridge/rss-bridge

while cool, it's using API V1 and you can't get long tweet

Nik-Kras · 2023-09-07T15:05:21Z

Hi @arfathyahiya, with this script #996 (comment) and a working token, how far back can the tweets go?

I used this script A to get the Guest Token [When you get an error Failed to fetch guest account, is your IP rate limited or so? -> Turn on / change VPN, it helped me] and then applied this token to script B to get tweets. But it doesn't work. I assume it used to work before, so leaving this comment to update you on the situation.

If it changed again - please mention.

Script A:

#!/usr/bin/env python3
import sys
import json
import textwrap
import requests

with requests.Session() as session:
    guest_token = session.post("https://api.twitter.com/1.1/guest/activate.json", headers={
        "Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAAFXzAwAAAAAAMHCxpeSDG1gLNLghVe8d74hl6k4%3DRUMF4xAQLsbeBhTSRrCiQpJtxoGWeyHrDb5te2jpGskWDFW82F",
    }).json()["guest_token"]

    flow_token_resp = session.post("https://api.twitter.com/1.1/onboarding/task.json?flow_name=welcome&api_version=1&known_device_token=&sim_country_code=us", headers={
        "Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAAFXzAwAAAAAAMHCxpeSDG1gLNLghVe8d74hl6k4%3DRUMF4xAQLsbeBhTSRrCiQpJtxoGWeyHrDb5te2jpGskWDFW82F",
        "Content-Type": "application/json",
        "User-Agent": "TwitterAndroid/9.95.0-release.0 (29950000-r-0) ONEPLUS+A3010/9 (OnePlus;ONEPLUS+A3010;OnePlus;OnePlus3;0;;1;2016)",
        "X-Twitter-API-Version": "5",
        "X-Twitter-Client": "TwitterAndroid",
        "X-Twitter-Client-Version": "9.95.0-release.0",
        "OS-Version": "28",
        "System-User-Agent": "Dalvik/2.1.0 (Linux; U; Android 9; ONEPLUS A3010 Build/PKQ1.181203.001)",
        "X-Twitter-Active-User": "yes",
        "X-Guest-Token": guest_token,
    }, data=textwrap.dedent(
        """{
            "flow_token": null,
            "input_flow_data": {
                "country_code": null,
                "flow_context": {
                    "start_location": {
                        "location": "splash_screen"
                    }
                },
                "requested_variant": null,
                "target_user_id": 0
            },
            "subtask_versions": {
                "generic_urt": 3,
                "standard": 1,
                "open_home_timeline": 1,
                "app_locale_update": 1,
                "enter_date": 1,
                "email_verification": 3,
                "enter_password": 5,
                "enter_text": 5,
                "one_tap": 2,
                "cta": 7,
                "single_sign_on": 1,
                "fetch_persisted_data": 1,
                "enter_username": 3,
                "web_modal": 2,
                "fetch_temporary_password": 1,
                "menu_dialog": 1,
                "sign_up_review": 5,
                "interest_picker": 4,
                "user_recommendations_urt": 3,
                "in_app_notification": 1,
                "sign_up": 2,
                "typeahead_search": 1,
                "user_recommendations_list": 4,
                "cta_inline": 1,
                "contacts_live_sync_permission_prompt": 3,
                "choice_selection": 5,
                "js_instrumentation": 1,
                "alert_dialog_suppress_client_events": 1,
                "privacy_options": 1,
                "topics_selector": 1,
                "wait_spinner": 3,
                "tweet_selection_urt": 1,
                "end_flow": 1,
                "settings_list": 7,
                "open_external_link": 1,
                "phone_verification": 5,
                "security_key": 3,
                "select_banner": 2,
                "upload_media": 1,
                "web": 2,
                "alert_dialog": 1,
                "open_account": 2,
                "action_list": 2,
                "enter_phone": 2,
                "open_link": 1,
                "show_code": 1,
                "update_users": 1,
                "check_logged_in_account": 1,
                "enter_email": 2,
                "select_avatar": 4,
                "location_permission_prompt": 2,
                "notifications_permission_prompt": 4
            }
        }"""
    ))

    flow_token = flow_token_resp.json()["flow_token"]

    resp = session.post("https://api.twitter.com/1.1/onboarding/task.json", headers={
        "Authorization": "Bearer AAAAAAAAAAAAAAAAAAAAAFXzAwAAAAAAMHCxpeSDG1gLNLghVe8d74hl6k4%3DRUMF4xAQLsbeBhTSRrCiQpJtxoGWeyHrDb5te2jpGskWDFW82F",
        "Content-Type": "application/json",
        "User-Agent": "TwitterAndroid/9.95.0-release.0 (29950000-r-0) ONEPLUS+A3010/9 (OnePlus;ONEPLUS+A3010;OnePlus;OnePlus3;0;;1;2016)",
        "X-Twitter-API-Version": "5",
        "X-Twitter-Client": "TwitterAndroid",
        "X-Twitter-Client-Version": "9.95.0-release.0",
        "OS-Version": "28",
        "System-User-Agent": "Dalvik/2.1.0 (Linux; U; Android 9; ONEPLUS A3010 Build/PKQ1.181203.001)",
        "X-Twitter-Active-User": "yes",
        "X-Guest-Token": guest_token,
    }, data=json.dumps({
        "flow_token": flow_token,
        "subtask_inputs": [
            {
                "open_link": {
                    "link": "next_link",
                },
                "subtask_id": "NextTaskOpenLink",
            }
        ],
        "subtask_versions": {
            "generic_urt": 3,
            "standard": 1,
            "open_home_timeline": 1,
            "app_locale_update": 1,
            "enter_date": 1,
            "email_verification": 3,
            "enter_password": 5,
            "enter_text": 5,
            "one_tap": 2,
            "cta": 7,
            "single_sign_on": 1,
            "fetch_persisted_data": 1,
            "enter_username": 3,
            "web_modal": 2,
            "fetch_temporary_password": 1,
            "menu_dialog": 1,
            "sign_up_review": 5,
            "interest_picker": 4,
            "user_recommendations_urt": 3,
            "in_app_notification": 1,
            "sign_up": 2,
            "typeahead_search": 1,
            "user_recommendations_list": 4,
            "cta_inline": 1,
            "contacts_live_sync_permission_prompt": 3,
            "choice_selection": 5,
            "js_instrumentation": 1,
            "alert_dialog_suppress_client_events": 1,
            "privacy_options": 1,
            "topics_selector": 1,
            "wait_spinner": 3,
            "tweet_selection_urt": 1,
            "end_flow": 1,
            "settings_list": 7,
            "open_external_link": 1,
            "phone_verification": 5,
            "security_key": 3,
            "select_banner": 2,
            "upload_media": 1,
            "web": 2,
            "alert_dialog": 1,
            "open_account": 2,
            "action_list": 2,
            "enter_phone": 2,
            "open_link": 1,
            "show_code": 1,
            "update_users": 1,
            "check_logged_in_account": 1,
            "enter_email": 2,
            "select_avatar": 4,
            "location_permission_prompt": 2,
            "notifications_permission_prompt": 4,
        }
    }))

    try:
        tokens = [json.dumps(resp.json()["subtasks"][i]["open_account"]["user"]["id"]) for i in range(len(resp.json()["subtasks"]))]
        print(json.dumps(resp.json()["subtasks"][0]["open_account"]))
    except KeyError:
        print("Failed to fetch guest account, is your IP rate limited or so?", file=sys.stderr)
        sys.exit(1)

print("Tokens: ", tokens)

Script B:

import requests

url = "https://cdn.syndication.twimg.com/tweet-result"
select_token = 0

search_keywords = "How much is the fish?"
params = {
    "id":tokens[select_token],
    "lang":"en",
    "keywords": search_keywords
}

payload = ""
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.5",
    "Accept-Encoding": "gzip, deflate, br",
    "Origin": "https://platform.twitter.com",
    "Connection": "keep-alive",
    "Referer": "https://platform.twitter.com/",
    "Sec-Fetch-Dest": "empty",
    "Sec-Fetch-Mode": "cors",
    "Sec-Fetch-Site": "cross-site",
    "Pragma": "no-cache",
    "Cache-Control": "no-cache",
    "TE": "trailers"
}

response = requests.request("GET", url, data=payload, headers=headers, params=params)

print(response.text)

doveppp · 2023-12-16T06:19:55Z

Now, individual tweets can be viewed without logging in, but I tried TwitterTweetScraper and it still doesn't work.

JFVefour · 2023-12-16T06:23:56Z

Hi, how did you get this information?

doveppp · 2023-12-16T06:27:55Z

Hi, how did you get this information?

No specific notification, I just opened a tweet while not logged in.

yeahjack · 2023-12-16T18:47:57Z

Confirmed too, that viewing both tweets and users without login is now successful. Maybe it is a good start.

Hi, how did you get this information?

No specific notification, I just opened a tweet while not logged in.

Demmenie · 2024-07-05T20:02:38Z

Vercel's react-tweet now has a bit of a workaround. They figured out that you can use the Twitter embed API to get data from any tweet. Usually, you'd need a special token to get any data but they reverse engineered the token and you can generate it yourself using the tweet id.

The API is at this URL: 'https://cdn.syndication.twimg.com/tweet-result'

and the token generator looks like this:

function getToken(id: string) {
  return ((Number(id) / 1e15) * Math.PI)
    .toString(6 ** 2)
    .replace(/(0+|\.)/g, '')
}

Source: https://github.com/vercel/react-tweet/blob/main/packages/react-tweet/src/api/fetch-tweet.ts

MathiasExorde · 2024-07-16T13:20:20Z

Hi everyone, I know this will sound like an ad.

I have used this library for a while back then, and waited to see if the community would manage.
Apparently it's now impossible to get tweet by simple indivudals.
I represent Exorde network (exordelabs . com) and we are collecting 6 millions tweets a day, out of 10 million posts(a day).
That's billions a year, We do it in real time, large scale, over 8000+ sources, 300k articles daily, forums blogs, etc.

We have an Insight API for aggregated metrics and a Fullstream API that output the entire annotated feed. Just reach out for trial & access. We are willing to support researchers and OSINT efforts, with have an API & can provide raw archives.
As far as we know, we're the only option for humble researchers / OSINT experts.

Just reach out on hello@exordelabs.com or visit developers.exorde.io

JustAnotherArchivist added bug Something isn't working module:twitter upstream labels Jun 30, 2023

JustAnotherArchivist pinned this issue Jun 30, 2023

JustAnotherArchivist mentioned this issue Jun 30, 2023

Twitter: UserByScreenName api fails #995

Closed

This comment was marked as off-topic.

Sign in to view

JustAnotherArchivist mentioned this issue Jul 1, 2023

Yesterday i can, but today 404 #997

Closed

JustAnotherArchivist changed the title ~~All Twitter scrapes are failing~~ All Twitter scrapes are failing: blocked (404) Jul 1, 2023

This comment was marked as duplicate.

Sign in to view

This comment was marked as off-topic.

Sign in to view

SpiderVice mentioned this issue Jul 1, 2023

Errors: blocked (404), blocked (404), blocked (404), blocked (404) #998

Closed

This comment was marked as off-topic.

Sign in to view

This comment was marked as resolved.

Sign in to view

This comment was marked as duplicate.

Sign in to view