Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

instagram scraping for likes broke #840

Closed
snarfed opened this issue Aug 19, 2018 · 4 comments
Closed

instagram scraping for likes broke #840

snarfed opened this issue Aug 19, 2018 · 4 comments
Labels

Comments

@snarfed
Copy link
Owner

snarfed commented Aug 19, 2018

first reported by @thorkon yesterday. (thank you!) started sometime in the last few weeks. looks like instagram likes now need an extra HTTP fetch to scrape.

eg if you fetch https://www.instagram.com/p/BmbU2qSFXrR/ , the embedded JSON media object has:

  "edge_media_preview_like": {
    "count": 9,
    "edges": [],
  }

that edges field used to have the individual likes, but now it's empty. if you click on the 9 likes link on the instagram UI, it GETs this URL:

https://www.instagram.com/graphql/query/?query_hash=e0f59e4a1c8d78d0161873bc2ee7ec44&variables={"shortcode":"BmmWVV9lHjI","include_reel":false,"first":24}

which returns:

{
  "status": "ok",
  "data": {
    "shortcode_media": {
      "id": "1848262920796141768",
      "shortcode": "BmmWVV9lHjI",
      "edge_liked_by": {
        "count": 14,
        "page_info": {"..."},
        "edges": [
          {
            "node": {
              "id": "1072653878",
              "username": "kaydeedubya",
              "full_name": "",
              "profile_pic_url": "https://instagram.fsnc1-1.fna.fbcdn.net/vp/2cd0c658b9123a8f67d05301aa875598/5C15E91D/t51.2885-19/s150x150/13712803_750357865105707_625900552_a.jpg",
              "is_private": false,
              "is_verified": false,
              "followed_by_viewer": false,
              "requested_by_viewer": false
            }
          },
          {
            "node": {
              "id": "185218713",
              "username": "smawson",
              "full_name": "Sven",
              "profile_pic_url": "https://instagram.fbkk1-1.fna.fbcdn.net/vp/eaf4663b993b22ab3c90681222cba10e/5C0B007A/t51.2885-19/11906329_960233084022564_1448528159_a.jpg",
              "is_private": true,
              "is_verified": false,
              "followed_by_viewer": false,
              "requested_by_viewer": false
            }
          },
          "..."
        ]
      }
    }
  }
}

i can work with that.

also note that the daily instagram_live_test.py run didn't catch this because it wasn't checking for likes. sigh. i'll fix that.

@snarfed
Copy link
Owner Author

snarfed commented Aug 20, 2018

shit. this graphql query url requires a login cookie. it 403s if you don't include one.

this may spell the death of backfeeding instagram likes. shit.

@snarfed
Copy link
Owner Author

snarfed commented Aug 20, 2018

example from the one other project i've found that implemented this: ping/instagram_private_api@78294a8 (thank you @ping!)

@snarfed
Copy link
Owner Author

snarfed commented Aug 20, 2018

current tentative plan: do all scraping with a login cookie from a throwaway account. IG does already rate limit our scraping, but i believe that's automatic, not us specifically. i expect bridgy is still small enough (5k users, <1k instagram) that IG employees haven't noticed us in particular yet.

snarfed added a commit to snarfed/granary that referenced this issue Aug 20, 2018
snarfed added a commit to snarfed/granary that referenced this issue Aug 20, 2018
snarfed added a commit to snarfed/granary that referenced this issue Aug 20, 2018
snarfed added a commit that referenced this issue Aug 20, 2018
@snarfed
Copy link
Owner Author

snarfed commented Aug 20, 2018

it's alive. god help me, it's alive.

now we wait to see how long it takes to get blocked. 😕

@snarfed snarfed closed this as completed Aug 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant