-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize GPU components #489
Conversation
r, g, b = tuple(avg_color) | ||
draw.rectangle(((x1, y1), (x2, y2)), fill=(int(r), int(g), int(b))) | ||
|
||
if cropped_image.any(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not really related to this PR but is needed to tackle edge cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @PhilippeMoussalli! We might want to create a model inference component in the future which packages the batching functionality, so users only need to implement their code per batch.
@@ -12,75 +13,97 @@ | |||
|
|||
logger = logging.getLogger(__name__) | |||
|
|||
os.environ['CUDA_LAUNCH_BLOCKING'] = "1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We no longer need this, right? It was just for debugging I think.
Same in the other components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kept it in case we run into other issues later on and we need further debugging. We can remove it later on once we've tested enough GPU components at scale and are sure that everything runs fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it helped us since are already running single-threaded. It also didn't change the stacktrace, the first one was already correct. And it seems like it really should not be used when not debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ok, I'll remove it then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
afa779d
to
0a255da
Compare
When using the default Dask scheduler (threaded), it is important to take into account that all the GPU related processing (preprocessing, inference) has to be batched to avoid running into OOM issues. To scale the model efficiently, multiple GPUs can be loaded for inference using pytorch Data Parallelism (this does not work on every model) in order to parallelize the batches across multiple GPUs. One important consideration there is to use either a single threaded scheduler (not recommended) or to limit the number of workers to be the same as the number of GPU cores In order to test and diagnose GPU components both nvtop and htop can be used to monitor GPU and CPU usage. This can help identify bottlnecks and pinpoint whether a GPU component is compute or memory bound. Further things that still need to be clarified: Whether to run a model using the processes or threaded scheduler (so far, the threaded scheduler has shown to be faster). However, most resources seem to indicate to use threads (link). |
PR that modifies all current GPU components by: