-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revamp the progress reporting API #6556
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #6556 +/- ##
===========================================
+ Coverage 81.07% 81.10% +0.02%
===========================================
Files 363 363
Lines 39632 39644 +12
Branches 3563 3563
===========================================
+ Hits 32132 32153 +21
+ Misses 7500 7491 -9
|
f078d7c
to
e1fa4e5
Compare
@staticmethod | ||
def _uploading_task(pbar: ProgressReporter, total_size: int) -> ContextManager[None]: | ||
return pbar.task( | ||
total=total_size, desc="Uploading data", unit_scale=True, unit="B", unit_divisor=1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see several problems with this change:
- The implementation supposes that the progress reporter is tqdm
- The tqdm has quite a broad list of parameters, it will be hard to implement a custom reporter correctly
- The suffix
2
is quite cryptic and tells nothing about the differences, probably it can be replaced withLazy
orDeferred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The implementation supposes that the progress reporter is tqdm
- The tqdm has quite a broad list of parameters, it will be hard to implement a custom reporter correctly
That's not quite true, because a reporter doesn't have to support every parameter that tqdm supports. It can simply use whatever parameters it recognizes, and ignore the rest. A barebones reporter doesn't have to use any information other than total
(maybe not even total
...).
That said, I think I can improve this by explicitly listing the parameters that we actually use right now, and adding a note saying that any kwargs should be ignored. That way, users have more guidance on which parameters they should pay attention to, and there's a way for us to introduce non-tqdm parameters in the future if we need them.
The suffix
2
is quite cryptic and tells nothing about the differences, probably it can be replaced with Lazy or Deferred
The only thing that the 2
is supposed to communicate is that it's better than 1. Which, to me, seems like the main thing that users need to know in this case. If you really want to, I can use Deferred
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I think I can improve this by explicitly listing the parameters that we actually use right now, and adding a note saying that any kwargs should be ignored. That way, users have more guidance on which parameters they should pay attention to, and there's a way for us to introduce non-tqdm parameters in the future if we need them.
Sounds good. I think it's the right direction.
If you really want to, I can use Deferred instead.
I won't claim that deferred
is the best variant, but 2
feels like "TODO" for me - like if we couldn't find a good name and just left it for future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. I think it's the right direction.
Okay, I did it.
I won't claim that deferred is the best variant, but 2 feels like "TODO" for me - like if we couldn't find a good name and just left it for future.
Mind you, the good name is TqdmProgressReporter
- I just couldn't use it because it was already taken. 🙂
Well, I renamed it to DeferredTqdmProgressReporter
anyway.
There are several problems with how progress reporting is handled in the SDK, both on the interface and implementation level: * The user is supposed to know, for a given function, which units it will report progress in. This is unnecessary coupling and prevents us from switching to different units, or to have several progress bars using different units. * To create a TqdmProgressReporter, you have to create a tqdm instance, which immediately draws a progress bar. This works poorly if the function prints any log messages before the progress actually starts. * There's no easy way to automatically call `finish` on a progress bar, so some functions don't (for example, `Downloader.download_file`). This can cause unexpected output, since tqdm will refresh the progress bar in a background thread (possibly after we've already printed something else). * `iter` is basically broken, because it divides by `period`, which is 0 in all current implementations. * Even ignoring that, it's hard to use correctly, because you need to manually call `finish` in case of an exception. * `split` is not implemented and not used. * `StreamWithProgress.seek` assumes that the progress bar is at 0 at the start, and so does `ProgressReporter.iter`. The former also works incorrectly if the second argument is not `SEEK_SET`. Fix these problems by doing the following: * Add a new `start2` method which accepts arbitrary tqdm parameters. The default implementation calls `start`, so that if a user has implemented the original interface, it'll keep working. * Add a `TqdmProgressReporter2` that accepts tqdm parameters instead of a tqdm instance, and only creates an instance after `start2` is called. Use it where `TqdmProgressReporter` was used before. The old `TqdmProgressReporter` is kept for compatibility, but it doesn't support any `start2` arguments other than those supported by the original `start`. * Add a `task` context manager, which automatically calls `start2` and `finish`. Use it everywhere instead of explicit `start`/`finish` calls. Remove `start`/`finish` calls from `StreamWithProgress` and `iter`. * Implement basic assertions to ensure that `start2` and `finish` are used correctly. * Remove `period` and `split`. * Rewrite `StreamWithProgress.seek` and `ProgressReporter.iter` to use relative progress reports. These changes should be backwards compatible for users who pass predefined or custom progress reporters into SDK functions. They are not backwards compatible for users who try to use progress reporters directly (e.g. calling `start`/`finish`). I don't consider that a significant issue, since the purpose of the `ProgressReporter` interface is for the user to get progress information from the SDK, not for them to use it in their own code.
3c8c1ff
to
dfe7a94
Compare
## \[2.6.0\] - 2023-08-11 ### Added - \[SDK\] Introduced the `DeferredTqdmProgressReporter` class, which avoids the glitchy output seen with the `TqdmProgressReporter` under certain circumstances (<#6556>) - \[SDK, CLI\] Added the `cvat_sdk.auto_annotation` module, providing functionality to automatically annotate tasks by executing a user-provided function on the local machine. A corresponding CLI command (`auto-annotate`) is also available. Some predefined functions using torchvision are also available. (<#6483>, <#6649>) - Included an indication for cached frames in the interface (<#6586>) ### Changed - Raised the default guide assets limitations to 30 assets, with a maximum size of 10MB each (<#6575>) - \[SDK\] Custom `ProgressReporter` implementations should now override `start2` instead of `start` The old implementation is still supported. (<#6556>) - Improved memory optimization and code in the decoding module (<#6585>) ### Removed - Removed the YOLOv5 serverless function (<#6618>) ### Fixed - Corrected an issue where the prebuilt FFmpeg bundled in PyAV was being used instead of the custom build. - Fixed the filename for labels in the CamVid format (<#6600>)
There are several problems with how progress reporting is handled in the SDK, both on the interface and implementation level: * The user is supposed to know, for a given function, which units it will report progress in. This is unnecessary coupling and prevents us from switching to different units, or to have several progress bars using different units. * To create a TqdmProgressReporter, you have to create a tqdm instance, which immediately draws a progress bar. This works poorly if the function prints any log messages before the progress actually starts. * There's no easy way to automatically call `finish` on a progress bar, so some functions don't (for example, `Downloader.download_file`). This can cause unexpected output, since tqdm will refresh the progress bar in a background thread (possibly after we've already printed something else). * `iter` is basically broken, because it divides by `period`, which is 0 in all current implementations. * Even ignoring that, it's hard to use correctly, because you need to manually call `finish` in case of an exception. * `split` is not implemented and not used. * `StreamWithProgress.seek` assumes that the progress bar is at 0 at the start, and so does `ProgressReporter.iter`. The former also works incorrectly if the second argument is not `SEEK_SET`. Fix these problems by doing the following: * Add a new `start2` method which accepts more parameters. The default implementation calls `start`, so that if a user has implemented the original interface, it'll keep working. * Add a `DeferredTqdmProgressReporter` that accepts tqdm parameters instead of a tqdm instance, and only creates an instance after `start2` is called. Use it where `TqdmProgressReporter` was used before. The old `TqdmProgressReporter` is kept for compatibility, but it doesn't support any `start2` arguments other than those supported by the original `start`. * Add a `task` context manager, which automatically calls `start2` and `finish`. Use it everywhere instead of explicit `start`/`finish` calls. Remove `start`/`finish` calls from `StreamWithProgress` and `iter`. * Implement basic assertions to ensure that `start2` and `finish` are used correctly. * Remove `period` and `split`. * Rewrite `StreamWithProgress.seek` and `ProgressReporter.iter` to use relative progress reports. These changes should be backwards compatible for users who pass predefined or custom progress reporters into SDK functions. They are not backwards compatible for users who try to use progress reporters directly (e.g. calling `start`/`finish`). I don't consider that a significant issue, since the purpose of the `ProgressReporter` interface is for the user to get progress information from the SDK, not for them to use it in their own code. Originally developed for cvat-ai#6483.
Motivation and context
There are several problems with how progress reporting is handled in the SDK, both on the interface and implementation level:
The user is supposed to know, for a given function, which units it will report progress in. This is unnecessary coupling and prevents us from switching to different units, or to have several progress bars using different units.
To create a TqdmProgressReporter, you have to create a tqdm instance, which immediately draws a progress bar. This works poorly if the function prints any log messages before the progress actually starts.
There's no easy way to automatically call
finish
on a progress bar, so some functions don't (for example,Downloader.download_file
). This can cause unexpected output, since tqdm will refresh the progress bar in a background thread (possibly after we've already printed something else).iter
is basically broken, because it divides byperiod
, which is 0 in all current implementations.Even ignoring that, it's hard to use correctly, because you need to manually call
finish
in case of an exception.split
is not implemented and not used.StreamWithProgress.seek
assumes that the progress bar is at 0 at the start, and so doesProgressReporter.iter
. The former also works incorrectly if the second argument is notSEEK_SET
.Fix these problems by doing the following:
Add a new
start2
method which accepts arbitrary tqdm parameters. The default implementation callsstart
, so that if a user has implemented the original interface, it'll keep working.Add a
TqdmProgressReporter2
that accepts tqdm parameters instead of a tqdm instance, and only creates an instance afterstart2
is called. Use it whereTqdmProgressReporter
was used before. The oldTqdmProgressReporter
is kept for compatibility, but it doesn't support anystart2
arguments other than those supported by the originalstart
.Add a
task
context manager, which automatically callsstart2
andfinish
. Use it everywhere instead of explicitstart
/finish
calls. Removestart
/finish
calls fromStreamWithProgress
anditer
.Implement basic assertions to ensure that
start2
andfinish
are used correctly.Remove
period
andsplit
.Rewrite
StreamWithProgress.seek
andProgressReporter.iter
to use relative progress reports.These changes should be backwards compatible for users who pass predefined or custom progress reporters into SDK functions. They are not backwards compatible for users who try to use progress reporters directly (e.g. calling
start
/finish
). I don't consider that a significant issue, since the purpose of theProgressReporter
interface is for the user to get progress information from the SDK, not for them to use it in their own code.Originally developed for #6483.
How has this been tested?
Checklist
develop
branch[ ] I have updated the documentation accordingly[ ] I have increased versions of npm packages if it is necessary(cvat-canvas,
cvat-core,
cvat-data and
cvat-ui)
License
Feel free to contact the maintainers if that's a concern.