Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uses torch.einsum for quat_rotate and quat_rotate_inverse operations #900

Merged
merged 17 commits into from
Sep 9, 2024

Conversation

dxyy1
Copy link
Contributor

@dxyy1 dxyy1 commented Aug 29, 2024

Description

Extended the two functions' capability so they now can take in multidimensional tensors and no longer limited to 2D tensors of shape (B,4) and (B, 3)

Type of change

  • New feature (non-breaking change which adds functionality)

Screenshots

Before After
image image

Checklist

  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

…now can operate multidimensional tensors and not longer limited to 2D inputs of shape (B,4) or (B, 3)
@dxyy1
Copy link
Contributor Author

dxyy1 commented Aug 29, 2024

Hi there,

I have done the testing for this but not too sure where to add the test scripts to in the repo so I left it out. Please let me know if I need to add the tests and if my changes have issues.

Thanks :)

@Mayankm96
Copy link
Contributor

Thanks a lot for looking at this! Based on past experience, einsum should also be faster than batch matrix multiplication, right? would be great to get some numbers on that if possible :)

Here is where we usually put all the tests related to the math module: https://github.com/isaac-sim/IsaacLab/blob/main/source/extensions/omni.isaac.lab/test/utils/test_math.py

@dxyy1 dxyy1 requested a review from kellyguo11 as a code owner August 30, 2024 02:26
@dxyy1
Copy link
Contributor Author

dxyy1 commented Aug 30, 2024

I implemented the test in test_math.py and everything seems ok. I also timed the two implementations. The following table shows the time (in s) it took to operate on ~10 000 quat and vec, averaged over 100 tests.

Device CPU CUDA
original 0.001194 0.004874
new 0.000426 0.000113

0.001194 / 0.000426 = 2.8
0.004874 / 0.000113 = 43
It seems that on GPU, the new implementation is about 43 times faster than the original, at ~10 000 operations.
Please take a look at the test to make sure that I didn't implement it wrong, since this is my first time contributing to Isaaclab.

@Mayankm96
Copy link
Contributor

Mayankm96 commented Aug 30, 2024

Hi @dxyy1

Thank you for writing the test. I checked it locally and it works fine :)

I made some modifications to it to make sure the numbers are correct. The time module doesn't make it fair to compare different torch operations due to threading. Instead it is recommended to use the torch.utils.benchmark module. Also the old implementations in your test weren't using JIT compiling. I saw a small performance improvement by having JIT compilation on them.

I made those changes in the test below:

https://github.com/isaac-sim/IsaacLab/blob/feature/improve-quat-rotate/source/extensions/omni.isaac.lab/test/utils/test_math.py#L232

Based on above, I get the following timings for quaternion of shape (1024, 2, 5, 4)

Device Function Time (us) Runs Threads
CPU math_utils.quat_rotate 213.12 1000 1
CPU iter_quat_rotate 558.01 1000 1
CPU iter_old_quat_rotate 517.66 1000 1
CUDA:0 math_utils.quat_rotate 49.76 1000 1
CUDA:0 iter_quat_rotate 616.85 1000 1
CUDA:0 iter_old_quat_rotate 557.65 1000 1
--------- -------------------------------- ----------- ------ ---------
CPU math_utils.quat_rotate_inverse 203.43 1000 1
CPU iter_quat_rotate_inverse 563.18 1000 1
CPU iter_old_quat_rotate_inverse 524.17 1000 1
CUDA:0 math_utils.quat_rotate_inverse 48.54 1000 1
CUDA:0 iter_quat_rotate_inverse 608.14 1000 1
CUDA:0 iter_old_quat_rotate_inverse 561.63 1000 1

Could you take the modified test to your branch (I can't seem to be able to push there). Also would be great to fill up the rest of the checklist in the MR description.

@Mayankm96 Mayankm96 changed the title Extend the capability of quat_rotate() and quat_rotate_inverse() in math.py Uses torch.einsum for quat_rotate and quat_rotate_inverse operations Aug 30, 2024
@Mayankm96
Copy link
Contributor

Mayankm96 commented Aug 30, 2024

Interesting. I tried with single-batched dimension, i.e., the quaternion shape of (4096, 4). There is a small drop in performance with the new implementation. It isn't majorly concerning, but it is something to consider.

@kellyguo11 @jsmith-bdai thoughts?

Device Function Time (us) Runs Threads
CPU math_utils.quat_rotate 90.34 1000 1
CPU old_quat_rotate 86.25 1000 1
CUDA:0 math_utils.quat_rotate 47.49 1000 1
CUDA:0 old_quat_rotate 42.14 1000 1
--------- -------------------------------- ----------- ------ ---------
CPU math_utils.quat_rotate_inverse 89.38 1000 1
CPU old_quat_rotate_inverse 92.81 1000 1
CUDA:0 math_utils.quat_rotate_inverse 47.00 1000 1
CUDA:0 old_quat_rotate_inverse 42.17 1000 1

@Mayankm96 Mayankm96 added the enhancement New feature or request label Aug 30, 2024
@dxyy1
Copy link
Contributor Author

dxyy1 commented Aug 30, 2024

Thank you for writing the test case, @Mayankm96 . I think your test case is honestly much neater than mine, and kinda made me realize there is much for me to learn in coding in that regard. So thank you :).

My understanding is that einsum can be faster If we are doing an operation that would otherwise require multiple steps of bmm. For a direct matrix multiplication (e.g [batch_size, n, m] and [batch_size, m, p]), I think perhaps bmm is faster because it is optimized for it and does not need to do the extra parsing.

@dxyy1
Copy link
Contributor Author

dxyy1 commented Sep 4, 2024

Hi there!

Given that einsum is slower than bmm if the input tensors are 2D, do you think a hybrid implementation would be better? i.e.

if q_vec.dim() == 2:
    c = q_vec * torch.bmm(q_vec.view(q.shape[0], 1, 3), v.view(q.shape[0], 3, 1)).squeeze(-1) * 2.0
else:
    c = q_vec * torch.einsum("...i,...i->...", q_vec, v).unsqueeze(-1) * 2.0

I ran the tests again and got the following results

Test A is the results of the latest implementation in the comit timed on my machine.
Test B and B2 are the two tests I ran with the changes mentioned above. I ran two tests to see if the times are different across tests.
Test A, B, B2 are all 1000 runs of the function with 1 thread each.

CPU

Function Test A time (us) Test B time (us) Test B2 time (us)
math_utils.quat_rotate 190.48 190.49 190.34
iter_quat_rotate 481.26 423.56 424.19
iter_old_quat_rotate 427.87 424.55 424.26
math_utils.quat_rotate_inverse 189.72 183.39 190.13
iter_quat_rotate_inverse 472.20 418.78 416.41
iter_old_quat_rotate_inverse 423.31 422.03 423.90

CUDA

Function Test A time (us) Test B time (us) Test B2 time (us)
math_utils.quat_rotate 41.43 41.63 42.71
iter_quat_rotate 524.79 465.26 466.28
iter_old_quat_rotate 464.82 464.21 478.76
math_utils.quat_rotate_inverse 42.20 42.96 41.73
iter_quat_rotate_inverse 534.17 459.73 476.05
iter_old_quat_rotate_inverse 460.25 476.94 467.14

Preivously, @Mayankm96 noted that the einsum runs slower than bmm for 2D tensor inputs. With the IF statement changes implemented, it seems that it runs comparably with the old implementation now, without sacrificing much on the >2D inputs case

Do you think implementing this IF statement is a good idea, or that is makes the code ugly?

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>
Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>
@Mayankm96
Copy link
Contributor

Yes that makes sense. 5 microsecond isn't a lot but would be good to keep the speed up. It isn't that messy to have this small if-else. Shouldn't be that messy.

Can you please add that, update the extension.toml, CHANGELOG.rst and run the pre-commit formatter? Thanks!

@dxyy1
Copy link
Contributor Author

dxyy1 commented Sep 6, 2024

I made the changes. I am not too sure if I modified the extension.toml and changelog correctly so please check them :). I also ran the pre-commit formatter.

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>
dxyy1 and others added 2 commits September 7, 2024 10:39
Signed-off-by: dxyy1 <139338590+dxyy1@users.noreply.github.com>
@dxyy1
Copy link
Contributor Author

dxyy1 commented Sep 7, 2024

Hi there,
My apologies for not understanding the request. I have made the new commits to update the extension.toml: removed the torch dependency and updated the version to 0.22.8.

Please let me know if I missed anything else!

Thank you :)

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>
Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>
Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>
Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>
Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>
@Mayankm96
Copy link
Contributor

@jsmith-bdai @Dhoeller19 Can we also get your approval so that we can merge this MR?

@Mayankm96 Mayankm96 merged commit 0c3fb1e into isaac-sim:main Sep 9, 2024
1 of 2 checks passed
iamdrfly pushed a commit to iamdrfly/IsaacLab that referenced this pull request Nov 21, 2024
isaac-sim#900)

# Description
Extended the two functions' capability so they now can take in
multidimensional tensors and no longer limited to 2D tensors of shape
(B,4) and (B, 3)

## Type of change
- New feature (non-breaking change which adds functionality)

## Checklist
- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [x] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [x] I have added tests that prove my fix is effective or that my
feature works
- [x] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there

---------

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>
Signed-off-by: dxyy1 <139338590+dxyy1@users.noreply.github.com>
Co-authored-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants