Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a control for maximum download speed #35

Open
ZinRicky opened this issue May 5, 2022 · 3 comments
Open

Add a control for maximum download speed #35

ZinRicky opened this issue May 5, 2022 · 3 comments

Comments

@ZinRicky
Copy link

ZinRicky commented May 5, 2022

At the moment, the script monopolises the bandwith. It would be nice to have an optional input argument to limit how much connection is used.

@onioneffect
Copy link
Contributor

I'm working on an argument to disable multiprocessing (downloading one thread at a time) and sleeping in between every image downloaded.
Both of these options are really slow and inefficient. The best way to rate limit the downloads would be to stream the images/videos, which is not possible with the version of urllib that the program currently uses. So I'll make a branch using urllib3 or requests and that's up for @Exceen to decide if he wants to add it to the program or not, because that could bring big changes to the codebase.

onioneffect added a commit to onioneffect/4chan-downloader that referenced this issue Aug 17, 2022
Adds option to download one thread at a time. Related to Exceen#35.
@Exceen
Copy link
Owner

Exceen commented Aug 18, 2022

I don't have a problem with using urllib3 or requests but I'm not sure if it makes sense to disable multiprocessing. As you say yourself it's heavily inefficient. I think I even missed out on some threads when this script was still single-threaded years ago because it was downloading everything too slow. Is there any reason to make it single-threaded besides limiting the bandwidth?
What about having one process per (4chan-)thread which just checks for new images and instead of downloading it just puts it on a queue. And then have a separate process running which just processes the queue image after image. Maybe that way it would be more efficient than your current approach while still keeping the possibility open to limit the bandwidth? You would go a bit over the specified bandwidth limit because the page loads are not within this scope but that shouldn't be a problem I guess.

@onioneffect
Copy link
Contributor

I like you idea a lot more. My plan is to do that and prioritize downloading threads that are about to 404 or archive. Also the part about needing urllib3 or requests was my mistake. It's possible to do it without those libraries I just forgot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants