Uses `torch.einsum` for quat_rotate and quat_rotate_inverse operations #900

dxyy1 · 2024-08-29T11:55:26Z

Description

Extended the two functions' capability so they now can take in multidimensional tensors and no longer limited to 2D tensors of shape (B,4) and (B, 3)

Type of change

New feature (non-breaking change which adds functionality)

Screenshots

Before	After

Checklist

I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

…now can operate multidimensional tensors and not longer limited to 2D inputs of shape (B,4) or (B, 3)

dxyy1 · 2024-08-29T11:57:36Z

Hi there,

I have done the testing for this but not too sure where to add the test scripts to in the repo so I left it out. Please let me know if I need to add the tests and if my changes have issues.

Thanks :)

Mayankm96 · 2024-08-29T12:36:02Z

Thanks a lot for looking at this! Based on past experience, einsum should also be faster than batch matrix multiplication, right? would be great to get some numbers on that if possible :)

Here is where we usually put all the tests related to the math module: https://github.com/isaac-sim/IsaacLab/blob/main/source/extensions/omni.isaac.lab/test/utils/test_math.py

dxyy1 · 2024-08-30T05:33:56Z

I implemented the test in test_math.py and everything seems ok. I also timed the two implementations. The following table shows the time (in s) it took to operate on ~10 000 quat and vec, averaged over 100 tests.

Device	CPU	CUDA
original	0.001194	0.004874
new	0.000426	0.000113

0.001194 / 0.000426 = 2.8
0.004874 / 0.000113 = 43
It seems that on GPU, the new implementation is about 43 times faster than the original, at ~10 000 operations.
Please take a look at the test to make sure that I didn't implement it wrong, since this is my first time contributing to Isaaclab.

Mayankm96 · 2024-08-30T07:44:03Z

Hi @dxyy1

Thank you for writing the test. I checked it locally and it works fine :)

I made some modifications to it to make sure the numbers are correct. The time module doesn't make it fair to compare different torch operations due to threading. Instead it is recommended to use the torch.utils.benchmark module. Also the old implementations in your test weren't using JIT compiling. I saw a small performance improvement by having JIT compilation on them.

I made those changes in the test below:

https://github.com/isaac-sim/IsaacLab/blob/feature/improve-quat-rotate/source/extensions/omni.isaac.lab/test/utils/test_math.py#L232

Based on above, I get the following timings for quaternion of shape (1024, 2, 5, 4)

Device	Function	Time (us)	Runs	Threads
CPU	`math_utils.quat_rotate`	213.12	1000	1
CPU	`iter_quat_rotate`	558.01	1000	1
CPU	`iter_old_quat_rotate`	517.66	1000	1
CUDA:0	`math_utils.quat_rotate`	49.76	1000	1
CUDA:0	`iter_quat_rotate`	616.85	1000	1
CUDA:0	`iter_old_quat_rotate`	557.65	1000	1
---------	--------------------------------	-----------	------	---------
CPU	`math_utils.quat_rotate_inverse`	203.43	1000	1
CPU	`iter_quat_rotate_inverse`	563.18	1000	1
CPU	`iter_old_quat_rotate_inverse`	524.17	1000	1
CUDA:0	`math_utils.quat_rotate_inverse`	48.54	1000	1
CUDA:0	`iter_quat_rotate_inverse`	608.14	1000	1
CUDA:0	`iter_old_quat_rotate_inverse`	561.63	1000	1

Could you take the modified test to your branch (I can't seem to be able to push there). Also would be great to fill up the rest of the checklist in the MR description.

Mayankm96 · 2024-08-30T08:04:45Z

Interesting. I tried with single-batched dimension, i.e., the quaternion shape of (4096, 4). There is a small drop in performance with the new implementation. It isn't majorly concerning, but it is something to consider.

@kellyguo11 @jsmith-bdai thoughts?

Device	Function	Time (us)	Runs	Threads
CPU	`math_utils.quat_rotate`	90.34	1000	1
CPU	`old_quat_rotate`	86.25	1000	1
CUDA:0	`math_utils.quat_rotate`	47.49	1000	1
CUDA:0	`old_quat_rotate`	42.14	1000	1
---------	--------------------------------	-----------	------	---------
CPU	`math_utils.quat_rotate_inverse`	89.38	1000	1
CPU	`old_quat_rotate_inverse`	92.81	1000	1
CUDA:0	`math_utils.quat_rotate_inverse`	47.00	1000	1
CUDA:0	`old_quat_rotate_inverse`	42.17	1000	1

dxyy1 · 2024-08-30T08:26:16Z

Thank you for writing the test case, @Mayankm96 . I think your test case is honestly much neater than mine, and kinda made me realize there is much for me to learn in coding in that regard. So thank you :).

My understanding is that einsum can be faster If we are doing an operation that would otherwise require multiple steps of bmm. For a direct matrix multiplication (e.g [batch_size, n, m] and [batch_size, m, p]), I think perhaps bmm is faster because it is optimized for it and does not need to do the extra parsing.

dxyy1 · 2024-09-04T03:13:13Z

Hi there!

Given that einsum is slower than bmm if the input tensors are 2D, do you think a hybrid implementation would be better? i.e.

if q_vec.dim() == 2:
    c = q_vec * torch.bmm(q_vec.view(q.shape[0], 1, 3), v.view(q.shape[0], 3, 1)).squeeze(-1) * 2.0
else:
    c = q_vec * torch.einsum("...i,...i->...", q_vec, v).unsqueeze(-1) * 2.0

I ran the tests again and got the following results

Test A is the results of the latest implementation in the comit timed on my machine.
Test B and B2 are the two tests I ran with the changes mentioned above. I ran two tests to see if the times are different across tests.
Test A, B, B2 are all 1000 runs of the function with 1 thread each.

CPU

Function	Test A time (us)	Test B time (us)	Test B2 time (us)
math_utils.quat_rotate	190.48	190.49	190.34
iter_quat_rotate	481.26	423.56	424.19
iter_old_quat_rotate	427.87	424.55	424.26
math_utils.quat_rotate_inverse	189.72	183.39	190.13
iter_quat_rotate_inverse	472.20	418.78	416.41
iter_old_quat_rotate_inverse	423.31	422.03	423.90

CUDA

Function	Test A time (us)	Test B time (us)	Test B2 time (us)
math_utils.quat_rotate	41.43	41.63	42.71
iter_quat_rotate	524.79	465.26	466.28
iter_old_quat_rotate	464.82	464.21	478.76
math_utils.quat_rotate_inverse	42.20	42.96	41.73
iter_quat_rotate_inverse	534.17	459.73	476.05
iter_old_quat_rotate_inverse	460.25	476.94	467.14

Preivously, @Mayankm96 noted that the einsum runs slower than bmm for 2D tensor inputs. With the IF statement changes implemented, it seems that it runs comparably with the old implementation now, without sacrificing much on the >2D inputs case

Do you think implementing this IF statement is a good idea, or that is makes the code ugly?

source/extensions/omni.isaac.lab/docs/CHANGELOG.rst

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>

Mayankm96 · 2024-09-05T07:35:48Z

Yes that makes sense. 5 microsecond isn't a lot but would be good to keep the speed up. It isn't that messy to have this small if-else. Shouldn't be that messy.

Can you please add that, update the extension.toml, CHANGELOG.rst and run the pre-commit formatter? Thanks!

…-commit formatter prior to this commit

dxyy1 · 2024-09-06T03:14:44Z

I made the changes. I am not too sure if I modified the extension.toml and changelog correctly so please check them :). I also ran the pre-commit formatter.

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>

source/extensions/omni.isaac.lab/config/extension.toml

…irements and modules

Signed-off-by: dxyy1 <139338590+dxyy1@users.noreply.github.com>

dxyy1 · 2024-09-07T02:44:36Z

Hi there,
My apologies for not understanding the request. I have made the new commits to update the extension.toml: removed the torch dependency and updated the version to 0.22.8.

Please let me know if I missed anything else!

Thank you :)

source/extensions/omni.isaac.lab/test/utils/test_math.py

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>

source/extensions/omni.isaac.lab/omni/isaac/lab/utils/math.py

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>

source/extensions/omni.isaac.lab/omni/isaac/lab/utils/math.py

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>

source/extensions/omni.isaac.lab/config/extension.toml

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>

Mayankm96 · 2024-09-08T18:01:18Z

@jsmith-bdai @Dhoeller19 Can we also get your approval so that we can merge this MR?

isaac-sim#900) # Description Extended the two functions' capability so they now can take in multidimensional tensors and no longer limited to 2D tensors of shape (B,4) and (B, 3) ## Type of change - New feature (non-breaking change which adds functionality) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./isaaclab.sh --format` - [x] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [x] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file - [x] I have added my name to the `CONTRIBUTORS.md` or my name already exists there --------- Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com> Signed-off-by: dxyy1 <139338590+dxyy1@users.noreply.github.com> Co-authored-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>

Extend the capability of quat_rotate and quat_rotate_inverse(). They …

eb8c46a

…now can operate multidimensional tensors and not longer limited to 2D inputs of shape (B,4) or (B, 3)

dxyy1 requested review from jsmith-bdai, Dhoeller19 and Mayankm96 as code owners August 29, 2024 11:55

Mayankm96 assigned dxyy1 Aug 29, 2024

implementing test for qua_rotate

c80b834

dxyy1 requested a review from kellyguo11 as a code owner August 30, 2024 02:26

dxyy1 added 2 commits August 30, 2024 13:08

completed test for quat_rotate() and quat_rotate_inverse()

7281cf2

implement repeated testing and averaging of timing

e45d4b1

Mayankm96 changed the title ~~Extend the capability of quat_rotate() and quat_rotate_inverse() in math.py~~ Uses torch.einsum for quat_rotate and quat_rotate_inverse operations Aug 30, 2024

Mayankm96 added the enhancement New feature or request label Aug 30, 2024

updated contirbutor list and replaced test case with Mayankm96's

d647abe

updated changelog

9ce236f

Mayankm96 reviewed Sep 5, 2024

View reviewed changes

source/extensions/omni.isaac.lab/docs/CHANGELOG.rst Outdated Show resolved Hide resolved

Mayankm96 added 2 commits September 5, 2024 09:31

Apply suggestions from code review

4f9bd20

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>

Merge branch 'main' into feature/improve_math

ab007d3

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>

added IF statement, updated extension.toml, changlog.rst, and ran pre…

0a6f13a

…-commit formatter prior to this commit

Merge branch 'main' into feature/improve_math

5f6a984

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>

Mayankm96 reviewed Sep 6, 2024

View reviewed changes

source/extensions/omni.isaac.lab/config/extension.toml Outdated Show resolved Hide resolved

dxyy1 and others added 2 commits September 7, 2024 10:39

In extension.toml: updated version to 0.22.8, removed torch from requ…

a078ea4

…irements and modules

Update extension.toml: remove accidnetal whitespace

baaae8d

Signed-off-by: dxyy1 <139338590+dxyy1@users.noreply.github.com>

Mayankm96 reviewed Sep 8, 2024

View reviewed changes

source/extensions/omni.isaac.lab/test/utils/test_math.py Outdated Show resolved Hide resolved

Apply suggestions from code review

f80e9a3

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>

Mayankm96 reviewed Sep 8, 2024

View reviewed changes

source/extensions/omni.isaac.lab/omni/isaac/lab/utils/math.py Show resolved Hide resolved

Apply suggestions from code review

728eac5

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>

Mayankm96 reviewed Sep 8, 2024

View reviewed changes

source/extensions/omni.isaac.lab/omni/isaac/lab/utils/math.py Show resolved Hide resolved

Apply suggestions from code review

db015ad

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>

Mayankm96 reviewed Sep 8, 2024

View reviewed changes

source/extensions/omni.isaac.lab/config/extension.toml Outdated Show resolved Hide resolved

Mayankm96 added 2 commits September 8, 2024 19:58

Apply suggestions from code review

195f796

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>

Merge branch 'main' into feature/improve_math

cfe4a3a

Signed-off-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>

Mayankm96 approved these changes Sep 8, 2024

View reviewed changes

Dhoeller19 approved these changes Sep 9, 2024

View reviewed changes

Mayankm96 merged commit 0c3fb1e into isaac-sim:main Sep 9, 2024
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uses `torch.einsum` for quat_rotate and quat_rotate_inverse operations #900

Uses `torch.einsum` for quat_rotate and quat_rotate_inverse operations #900

dxyy1 commented Aug 29, 2024 •

edited

Loading

dxyy1 commented Aug 29, 2024

Mayankm96 commented Aug 29, 2024

dxyy1 commented Aug 30, 2024 •

edited

Loading

Mayankm96 commented Aug 30, 2024 •

edited

Loading

Mayankm96 commented Aug 30, 2024 •

edited

Loading

dxyy1 commented Aug 30, 2024 •

edited

Loading

dxyy1 commented Sep 4, 2024

Mayankm96 commented Sep 5, 2024

dxyy1 commented Sep 6, 2024

dxyy1 commented Sep 7, 2024

Mayankm96 commented Sep 8, 2024

Uses torch.einsum for quat_rotate and quat_rotate_inverse operations #900

Uses torch.einsum for quat_rotate and quat_rotate_inverse operations #900

Conversation

dxyy1 commented Aug 29, 2024 • edited Loading

Description

Type of change

Screenshots

Checklist

dxyy1 commented Aug 29, 2024

Mayankm96 commented Aug 29, 2024

dxyy1 commented Aug 30, 2024 • edited Loading

Mayankm96 commented Aug 30, 2024 • edited Loading

Mayankm96 commented Aug 30, 2024 • edited Loading

dxyy1 commented Aug 30, 2024 • edited Loading

dxyy1 commented Sep 4, 2024

CPU

CUDA

Mayankm96 commented Sep 5, 2024

dxyy1 commented Sep 6, 2024

dxyy1 commented Sep 7, 2024

Mayankm96 commented Sep 8, 2024

Uses `torch.einsum` for quat_rotate and quat_rotate_inverse operations #900

Uses `torch.einsum` for quat_rotate and quat_rotate_inverse operations #900

dxyy1 commented Aug 29, 2024 •

edited

Loading

dxyy1 commented Aug 30, 2024 •

edited

Loading

Mayankm96 commented Aug 30, 2024 •

edited

Loading

Mayankm96 commented Aug 30, 2024 •

edited

Loading

dxyy1 commented Aug 30, 2024 •

edited

Loading