-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add custom progress bar that supports parallel downloads #12404
Conversation
This prevents wasted rendering and makes sorting quicker
This allows control over where to place newline
This prevents race conditions when trying to acquire the lock again to sort the tasks
1 for sequential downloads and 1 for parallel downloads.
The latest commit (390fd51) addresses the previously mentioned potential issues. The goal was to maintain the existing behavior for sequential downloads while updating the behavior of the progress bar specifically for parallel downloads. To achieve this in a clear and maintainable manner, I've broken up the new progress bar class into Modularity:
Separation of Concerns:
Isolation of Issues:With this class structure, issues or updates related to the progress bar for sequential downloads can be traced to |
FYI when I do |
Thanks for catching that. Thanks to that I also noticed I hadn't handled the -q flag yet. Most of this issue is related to changes on the rough merge of the integration branch. Just pushed a fix for that now. Also pushing a fix here to handle disabling progress in parallel downloads. |
I wanted to gently nudge this PR for a bit more attention :) , and put forward a plan for merging this feature that hopefully makes sense. This PR adds the necessary classes to add a UI for parallel downloads, and #12388 adds the user facing interface for it as well as enables parallel downloads on the
|
|
Yes a rebase is required now. |
Shouldn't be too difficult, it doesn't seem like #12586 adds any conflicting changes. I think a merge with main could work here? |
Taking a closer look at it, I realize what you mean. I'll look into how to support the flow for simple progress bars with the changes suggested by this PR. |
It wont be difficult, it just needs some identifier on the lines it writes: |
It'll also require a some type of mutex like Also the way I was combining this PR and #12388 removed the support for other types of progress bars. This is because supporting parallel progress bars requires access to the underlying rich progress bar to add downloads. So now |
I would have a thread specifically for displaying the progress bars (maybe even the main thread?), then that thread can have a mutex guarded queue that the download threads can write progress events into (via callbacks or whatever). And queue.Queue is already guarded iirc. |
Could you elaborate? |
So each thread is downloading and displaying a progress bar? Im saying its easier if you have all the display logic in a single thread which is fed progress events by the download threads. But anyway, just wanted to make sure |
Can I urge some caution here? We seem to be adding a lot of complexity (progress bar subclasses, an now locking mechanisms) for getting "prettier" output, without looking at the bigger picture of what exactly we need here. How many users really look at the output of Maybe we should take a lead from
That feels perfectly adequate to me, and well within the capabilities of the basic rich progress bar. |
I concur with @pfmoore here -- rich does support parallel progress bars already and it's unclear why we need this complexity here. See https://github.com/Textualize/rich/blob/v13.7.1/examples/dynamic_progress.py for an example. We don't need additional threads and locks etc -- that's a detail handled by rich's progress bar already (https://github.com/Textualize/rich/blob/v13.7.1/rich/progress.py#L968) and using a wrapper around that instead of subclassing it would make it easir to ensure we don't break internal expectations of that class. |
Ah I see.
That is how it is currently working with this PR.
Yep will do. I'll look into how to support this with the raw progress mode. |
Point taken. The complexity here has gotten away from me a bit
I'll have to take a closer look at uv to be sure, but if realtime progress per package download isn't a priority this can be simplified greatly |
The reason why I felt rich's default parallel progress bar was not suitable in this case was because it only supports single line progress bars, i.e. the progress bar for a task can only take a single line. When pip downloads a package its progress consists of two lines the progress bar and a log message of "Downloading .." which mentions the package name.
The subclasses introduced here rely on those, it doesn't add additional threads or locks. I had suggested additional locks for supporting the raw progress bar added by #12586 since it doesn't use a rich progress bar. But I agree with your point about complexity. I'll try to take @pfmoore 's suggestion about not needing as much detail in the progress display. If I come up with a simpler solution I'll close this PR and open a new one. |
Ah, that should be possible to achieve with https://rich.readthedocs.io/en/latest/live.html instead of using just progress directly. See https://github.com/Textualize/rich/blob/master/examples/live_progress.py which presents things via a parent renderable -- we should be able to use https://rich.readthedocs.io/en/latest/group.html to provide such a composite renderable for this purpose. |
Can someone show (even if it's just a mockup somehow) what the end goal of the display is here? I still feel like we'd be better off with just a 1-line display, showing N/M files downloaded, with maybe a list of the packages currently being downloaded (note that I said "packages", not "files" deliberately here) - for serial downloads that would just be one package, but for parallel it would be up to one per thread. If we do want to keep the "one progress bar per file downloaded" scheme that we currently have, then I think it makes more sense to simplify it to a one-line format, something like:
That could be done with just an extra column (you can link the package name (and version, if you like!) to the task, using custom field arguments when you create the task. You can then reference the fields as @pradyunsg what's your view on simplifying/compacting the download progress display like this? Is it something we should do as a standalone PR, before trying to add parallel download support? My gut instinct is that we should do that.
I'd keep the raw progress bar entirely separate here. Again, we should focus on what's actually needed, and that means getting input from maintainers of tools that use the raw progress bar, to find out what progress information they actually use. There's no point reporting per-file statistics if they summarise that into "X% complete", after all... I suggest having a separate issue/PR for the raw progress bar, and defer working on it until someone complains that parallel downloads (will) break their existing usage. |
FYI, this discussions came up before in a previous PR, but I think the OP was hoping to achieve the same display behavior as what conda has now when it downloads multiple files at once, but with rich (whereas I believe conda uses tqdm). I can make a gif later today to demonstrate this if someone doesn't easily have access to a recentish version of conda. |
Personally, I think based on that example the efforts are worth it! This is similar to what Docker CLI does for pulling and decompressing image layers. |
I agree. I have a long list of changes I'd like to make to the UI (tweaking how stuff is presented during the resolve and download process is one of them). |
I made a test integration of this PR and #12388, it looks like this
I think simplifying the package name to just the name or just the name and the version instead of the full name (like |
OK, this is purely personal preference, but I love the one-line form as shown in the conda example, and I hate the 2-line pip form. Specifically, the one-line form ends up being a tabular display of |
Nobody is using it, its not live. But i made the PR, need per-file progress information (same info as sequential). I can write parallel support if needed, integrating this PR too broadly with rich could cause issues with that (Ideally there's a nice bottleneck somewhere we can divert the progress events from rich). |
I’m sorry, but I still think we should use the features Rich has. I expressed my reservations about raw progress reporting here and the rich project seemed receptive to adding raw progress functionality in Textualize/rich#3000. I still think getting rich support first and using that is the better call here. As noted here, my support of the original PR was because the code looked simple and clean. I am a strong -1 on changing that so the raw progress support has to handle complexities like threading support, or limit how we use rich in the normal situation. If you want parallel download support for the raw progress bar, please create a PR for it. It can stand on its own merits, and not distract from this discussion. This PR won’t “break” the raw progress reporting in the sense that if tests fail, they will have to be fixed. That should be enough for now. |
I haven't had a chance to work on this in a while. I see someone else has opened a PR that solves this issue in a cleaner way, so I'm closing this PR in favour of #12925 |
This PR introduces 2 new classes
PipProgress
andPipParallelProgress
supporting parallel downloads as presented in PR #12388. These are subclasses ofrich.progress.Progress
and are designed as a replacement for the current progress bar.In the interest of not creating a single PR that does too many things, this PR only adds the above mentioned classes and does not integrate them with the
Downloader
classes. For those interested in testing the integrated version, a rough merge of the integration is available on my fork. Feedback is highly appreciated :)The need for a new subclass:
To expand on this further:
Keeping log message and progress bar together:
When you add multiple tasks to the same progress bar(as would be done for parallel download), any logged text gets pushed to the start of all the progress bars. So the log message, i.e. "Downloading xxxx.whl (yykB)" or "Using cached" needs to be part of the progress bar.
But rich.progress doesn't natively support multi-line progress bars. As per the documentation, to customize this behaviour you would need to override the
get_renderable
method, which is the approach this PR takes.Dynamically switching columns
Progress
doesn't natively support switching between columns when a progress bar has multiple tasks. pip requires this since if it can't get the total length of one the links provided it is supposed to use a different set of columns. In the case of parallel downloads, all downloads use the same progress bar object, hence it needs to be able to dynamically switch columns for a task if the total length for that task is not known. Using the subclass ofProgress
, this behaviour is implemented in this PRSorting of completed downloads
This is more of a UX thing. By default for parallel downloads the progress bars are rendered in the order that they are added. But this makes it difficult for an end user to see how many downloads are currently active. In this PR every time a download is completed, the progress bars are sorted by which ones are completed, to keep the active ones together at the bottom of the screen.
Possible Issues
If the user doesn't provide the --parallel-downloads flag (as per PR #12388 ) I've tried to keep it so that there isn't a change in behaviour or appearance. In the terminal, I think I've succeeded. But there are a few issues I was not able to address and would appreciate help with.
Rendering issues in Jupyter(Solved)Using parallel progress bars seems to cause rendering issues in Jupyter.(By rendering issues i mean the progress bar gets redrawn on a newline for each refresh) as can be see in the rough integrated version of this PR. I'm working on solving this currently and will update when I learn more :)
Current Solution:
The issue here was that in Jupyter you can update the current line, but as soon as a newline character is printed, you can no longer go back and update the previous line. In my previous solution, to allow the download description and its related progress bar to stay adjacent to each other when downloading multiple files in parallel, the description was added to the progress bar object and the progress bar was made into a 2 line progress bar. This means that the description was the first line and the progress meter was being reprinted for every update. In the current solution, to fix this for sequential downloads, the description is logged, and not made a part of the progress bar object. But parallel downloads still have a 2 line progress bar and remain broken in Jupyter. I think this is an acceptable solution as it maintains legacy behaviour and only breaks UI, when the --parallel-downloads flag is passed in jupyter.
Logging(Solved)In the case of parallel downloads in BatchDownloader, if any messages are logged while the download is active, it may get overwritten by the progress bars. I did a quick check and it doesn't appear as though there are any log messages during a call to
BatchDownloader
but I'm not sure about this.Current Solution:
To solve this, the logger is passed into PipProgress(for sequential downloads) and the description is logged when a task is added. To enable keeping description and progress bar together, this logging is disabled in PipParallelProgress and the description is added to the progress bar making a multi-line progress bar