-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
@zhreshold Do you have any other comments? For the record - I think there are a few outstanding items to fully mark GPU implementation of NMS as "done", namely
That said, I will not be able to finish those 2 outstanding items before 1.6 code freeze and I believe performance improvements are enough to merge this PR and address those 2 points as part of the future PR. |
|
||
template <typename DType> | ||
__global__ void FilterAndPrepareAuxDataKernel(const DType* data, DType* out, DType* scores, | ||
index_t num_elements_per_batch, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation alignment?
|
||
template <bool check_topk, bool check_score, typename DType> | ||
__global__ void CompactDataKernel(const index_t* indices, const DType* source, | ||
DType* destination, const index_t topk, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indentation alignment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the delay, was traveling these days.
The changes lgtm!
* Adding second NMS op * NMS kernel * Removing second sort * Optimization * Adding out-of-place ability to SortByKey * Optimization pt2 * Optimizations pt3 * Do not recompute other boxes area every time * Sort only topk results during second sorting * Cleaning * Fixes from rebase * Fix lint and more fixes from rebase * Fix typo * Early exit in Triangle kernel * Fixes * Fix sort * Fix from rebase * Fix for the mixed naming convention * Fix the index_t with int comparisoon
Description
This PR significantly improves the performance of
mx.sym.contrib.box_nms
on GPU.@zhreshold @Jerryzcn FYI
Test cases:
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Comments