Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features: multithread and tqdm progress bar #162

Merged
merged 9 commits into from
Aug 24, 2018

Conversation

darodi
Copy link
Contributor

@darodi darodi commented Aug 5, 2018

tqdm changes based on #159
fixes #123
fixes #153

@dsanchezseco
Copy link
Collaborator

dsanchezseco commented Aug 9, 2018

@darodi same as I said in #153 but this is kinda more psychedelic when printing the page progress. I don't know if it's just on linux, on wide monitors or what.

Edit: the missing link to the output

@darodi
Copy link
Contributor Author

darodi commented Aug 9, 2018

Thanks for the feedback.
Did you install colorama?

pip install colorama

I had this kind of behavior when I didn't use it with Windows.
I'll try some tests using Linux this evening.

@dsanchezseco
Copy link
Collaborator

@darodi i've installed and there's no difference. I'll check this afternoon on Mac too

@darodi
Copy link
Contributor Author

darodi commented Aug 13, 2018

@dsanchezseco

Here is the result using ubuntu 18.04 wsl
with my last commit

https://asciinema.org/a/KKPjATHIAbCOpVb8PITz76itW

trying you example I get

https://asciinema.org/a/dBP34HkoxHvxIq5vw2Na8v1KC

It seems better.

However, doing some tests I noticed a few new problems with the multithread introduced in PR #161

I ran 2 multithread downloads on the same host for 2 different mangas.
Both downloads where running with 4 threads. The host could not handle 8 threads.
Each manga was getting 503 errors on the host.
Threads where continuing instead of stopping. Eventually, I got 2 mangas with missing pages.
I think the error handling should be changed. The chapter download should be stopped.

Another consequence of the multi thread is the fact that the abort "ctrl-c" isn't working anymore.

Regards,

darodi

@dsanchezseco
Copy link
Collaborator

dsanchezseco commented Aug 13, 2018

@darodi Hi, nice.

Maybe leaving only the overall chapter bar instead of showing too the page bar could look cleaner and avoid the strange printing.

About the run you did, did you run two python main.py -.... ? If so, yeah, for now the amount of threads that it uses is fixed to 4, I was waiting to merge to add it as a param for the different computers that don't support so many threads or can support more.

Keep in mind that having more threads can potentially trigger the bot countermeasures the pages could have(as cloudfare for example) and give you the 503. Is open on issue #134 .

Regarding the ctrl+C i'll try to enable to cut the execution when that happens.

Thanks for the work!

@darodi
Copy link
Contributor Author

darodi commented Aug 13, 2018

@dsanchezseco

new commit. I disabled the progress bars for each file.

The issue I was pointing out is the error handling.

Let's say I'm using comic-dl in auto download mode for 10 comics.
Each comic has 5 chapters to download with 10 images each.

Currently, if there is any error downloading the 3rd image of the 4th chapter of the 5th comic,
other threads will continue downloading the remaining 7 images.
The parent process, for the 4th chapter, isn't notified of the error.
comic_dl.globalFunctions.GlobalFunctions#addOne will be called
and the 5th chapter will be processed, even though there is an error and a missing image in the 4th chapter.

In that case, it would be better not to call comic_dl.globalFunctions.GlobalFunctions#addOne, stop the download before the 5th chapter and proceed with the 6th comic.

@dsanchezseco
Copy link
Collaborator

Awesome!!! https://asciinema.org/a/3L8vAgrYGKdl8Pcw86MYen4tN

I'll try to work on the variable thread number and the error handling ASAP

@dsanchezseco dsanchezseco self-requested a review August 14, 2018 06:33
Copy link
Collaborator

@dsanchezseco dsanchezseco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor of the multithread, I did it on a hurry changing all the files d'oh!

I would remove the commented code though

@Xonshiz Xonshiz merged commit 1685aa6 into Xonshiz:Development Aug 24, 2018
@darodi darodi deleted the feature_tqdm branch August 24, 2018 20:14
@darodi
Copy link
Contributor Author

darodi commented Aug 25, 2018

@dsanchezseco , @Xonshiz

For the moment, for each chapter, 4 threads are created to download the queue containing images.
For each chapter, there is a small overhead for creating those threads.
I was wondering if we couldn't improve that.

i.e.

  • Create the 4 threads on the first chapter to download.
  • feed a new queue on each chapter.
  • wait for the end of the queue on the main process
  • don't destroy the threads in the end of the chapter but continue with the next chapter and feed a new queue to the existing threads.
  • the threads will be destroyed in the end, after the last chapter of the last comic

I must admit I haven't checked the time taken to create the threads.
That proposal would also prevent to specify a different number of threads for each host/site.
I don't know if it's worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants