-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Frontend][Tensorflow] Add unique operator #7441
Conversation
Thanks, I was planning to work on unique next week, happy to collaborate. I can add TIR unqiue impl both cpu and gpu later. We can add relay boilarplate, temp impl in cpp, and tests in this PR. |
That would be great! |
@ymwangg For a general op like Numpy and PyTorch supports We can implement |
@masahi Thanks for your comment. # topi
def unique(data, data_sorted, data_argsorted):
output = [0] * len(data)
count = [0] * len(data)
first_occurrence = [len(data)] * len(data)
inverse_indices = [0] * len(data)
num_unique = 0
# ir_builder
for i in range(len(data)):
if i == 0 or data_sorted[i] != data_sorted[i-1]:
num_unique += 1
output[num_unique-1] = data_sorted[i]
first_occurrence[num_unique-1] = min(first_occurrence[num_unique-1], data_argsorted[i])
count[num_unique-1] += 1
inverse_indices[data_argsorted[i]] = num_unique - 1
return output, count, first_occurrence, inverse_indices, num_unique
# tf front end
def tf_unique(data):
output, count, first_occurrence, inverse_indices, num_unique = unique(data, np.sort(data), np.argsort(data))
sorted_occurence_indices = np.argsort(first_occurrence) # relay.argsort
new_output = [output[sorted_occurence_indices[i]] for i in range(num_unique)] # relay.take
index_converter = np.argsort(sorted_occurence_indices) # relay.argsort
new_inverse_indices = [index_converter[i] for i in inverse_indices] # relay.take
return new_output, new_inverse_indices It defines a topi function that is similar to Does this look good to you? |
It can be a lot simpler than that. Unique is basically sort + adjacent difference + exclusive scan. If you don't understand that statement, the following example should help. We have exclusive scan for CPU ( If we implement unique this way, the same code runs on both CPU and GPU.
Output:
|
Hey @masahi , can your example be extended to provide |
Yes, it's possible but a bit complicated. PyTorch also has I think for the first PR, not all options need to be implemented. We can follow up later. I'm using PyTorch GPU impl as reference, see for example below on how they support count |
I see, I was interested in |
@masahi Thanks for the explanation and it is very helpful! sorted_data = relay.sort(data)
argsort_indices = relay.argsort(data)
adj_diff = relay.adjacent_difference(sorted_data, first_value=0, "not_equal")
ex_scan = relay.cumsum(adj_diff, exclusive=True)
inverse_indices = relay.scatter(data, argsort_indices, ex_scan)
unique = relay.scatter(data, ex_scan, sorted_data)
unique_sliced = relay.strided_slice(unique, [0], relay.take(ex_scan,[-1]), slice_mode="size")
return unique_sliced, inverse_indices I saw PyTorch uses To support counting, it looks like we need to implement a |
For your first implementation, combination-based approach is ok. But So use ir builder if you are comfortable with it, otherwise combination of relay ops is fine. Performance + support for options can be done later (by me). Don't worry about |
Add unit tests for unique operator
Looks good 👍 GPU is not supported right? |
Can you also add pytorch frontend? Not all option need to be supported. Likely the same as tf conversion |
@masahi Yeah, I only added CPU version in this PR. I'm not very familiar with GPU IR now but I can do it later. If the overall structure looks good, I can add I'll add the pytorch frontend in this PR. |
I can do the GPU version. It will likely require ir builder. But let me know if you want to do GPU as well, you can certainly do it. The idea is identical with CPU version, just using different parallelization. If |
@masahi I added the I'll work on the GPU version of |
@masahi I added the GPU version and it's ready for review. |
@ymwangg @codeislife99 I found a neat trick PyTorch uses for Basically, after you get ex scan, instead of copying from the original input, you copy from an array [0, 1, 2, ....]. This will give you something like [0, 2, 5], and doing adjacent element on it directly gives the count. Does this make sense? It should be much faster than atomic. |
@masahi thanks. I'll try using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Thanks @ymwangg @codeislife99, this is really a great work! |
@masahi thanks for making this such an interesting project! |
* Initial commit of the unique operator Add unit tests for unique operator * Add tensorflow unique op * Refactor unique to use sort-based algorithm * Change relay.unique test to run only on cpu * Change topi.unique test to run only on cpu * Change range to parallel for parallelizable loops * Add return_counts option for relay.unique and topi.unique, add pytorch frontend * Fix pylint * Patch pytorch frontend * Initial support of topi.cuda.unique * Refactor to use ir_builder directly * Modularize adjacent difference * Refactor to simplify * Fix typo * Combine _unique and _unique_with_counts * Reuse indices_ptr to remove arange_ptr Co-authored-by: Yanming Wang <yanmwang@amazon.com>
* Initial commit of the unique operator Add unit tests for unique operator * Add tensorflow unique op * Refactor unique to use sort-based algorithm * Change relay.unique test to run only on cpu * Change topi.unique test to run only on cpu * Change range to parallel for parallelizable loops * Add return_counts option for relay.unique and topi.unique, add pytorch frontend * Fix pylint * Patch pytorch frontend * Initial support of topi.cuda.unique * Refactor to use ir_builder directly * Modularize adjacent difference * Refactor to simplify * Fix typo * Combine _unique and _unique_with_counts * Reuse indices_ptr to remove arange_ptr Co-authored-by: Yanming Wang <yanmwang@amazon.com>
This PR adds the tensorflow
unique
operator as described in https://www.tensorflow.org/api_docs/python/tf/unique.I'm not sure I follow the best practices. Comments and suggestions are welcome. @yongwww @kevinthesun @codeislife99