Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Twitter] "429 Too Many Requests" when providing a large list - How do I wait between urls? #788

Closed
espressoelf opened this issue May 26, 2020 · 7 comments

Comments

@espressoelf
Copy link

Hello everyone,

when updating my favorite Twitter users, I often run into HttpError "429 Too Many Requests". This totally makes sense, because I'm using a list with many users/urls and each of them only takes a short time since most content was already downloaded.

[twitter][error] HttpError: '429 Too Many Requests' for 'https://twitter.com/i/profiles/show/[SOME_USERNAME]/timeline/tweets'

Reading the doc, I found "sleep" https://github.com/mikf/gallery-dl/blob/master/docs/configuration.rst#extractorsleep, but as far as I understood, it sleeps after every image.

How do I "sleep" between the different urls provided via textfile using the -i switch to prevent error 429?

@espressoelf
Copy link
Author

Since I already had all the urls neatly in a textfile, I looked up a way to process them individually. Finding https://www.cyberciti.biz/faq/unix-howto-read-line-by-line-from-file/ I now have a temporary workaround:

#!/bin/bash
source /home/finn/py_gallery-dl/bin/activate

input="/home/finn/gallery-dl_twitter.txt"
while IFS= read -r line
do
  gallery-dl -c gallery-dl.conf $line
  sleep 10
done < "$input"

I'd prefer a solution inside gallery-dl, though. If there is any, of course.

@mikf
Copy link
Owner

mikf commented May 28, 2020

I'm afraid there isn't a way to wait in between input URLs. The only options related to waiting are sleep and the occasional wait-min/wait-max for sites that really need it.

I did plan to add an option to specify a wait time between "regular" HTTP requests, and I might as well add one to wait between input URLs.

@aenriii
Copy link

aenriii commented Jun 2, 2020

one thing that i found while coding a project of my own is that if you can detect http error 429 you can send a command to any system to reset all dchp ips and it works for me

if int(e.code) == 429:
print("Detected error code 429. Program will automatically fix.")
if platform.system() == 'Windows':
subprocess.popen('ipconfig /renew', shell=True, stdout=subprocess.DEVNULL)
elif platform.system() == "Linux":
os.system('sudo dhclient -r')
#Linux ip reset
elif platform.system() == "Darwin":
os.system('sudo SystemStarter restart Network')
#Mac ip reset
else:
print('Can not detect OS, exiting.')
import sys
return sys.exit(0)

@espressoelf
Copy link
Author

What does Twitter care about lan ips that it doesn't even see? Isn't it rather the time it takes for the renew the important factor?

@aenriii
Copy link

aenriii commented Jun 3, 2020 via email

@AlexCSDev
Copy link

AlexCSDev commented Jun 6, 2020

@mikf Is it possible to implement same thing you did for pixiv in a27f43d but for twitter? Right now twitter support is basically useless because of their new very aggressive throttling rules. It doesn't take very long to trigger 429 even if you download a single user.

@github-account1111
Copy link

@mikf sorry for the necro but does 3afd362 set a default wait interval for Twitter akin to Instagram or does it need to be set manually?
Asking because since the Instagram change I don't think I've gotten a single 429 for Instagram whereas I routinely get them for Twitter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants