Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Benchmark] Benchmark to(device) #385

Merged
merged 10 commits into from
May 18, 2023
Merged

[Benchmark] Benchmark to(device) #385

merged 10 commits into from
May 18, 2023

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented May 18, 2023

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 18, 2023
@vmoens vmoens requested a review from tcbegley May 18, 2023 08:00
@github-actions
Copy link

github-actions bot commented May 18, 2023

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 47. Improved: $\large\color{#35bf28}1$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_common_ops 1.2604ms 1.2182ms 820.9071 Ops/s 813.8796 Ops/s $\color{#35bf28}+0.86\%$
test_creation 4.3791μs 4.1891μs 238.7167 KOps/s 242.9321 KOps/s $\color{#d91a1a}-1.74\%$
test_creation_empty 17.0342μs 16.3979μs 60.9834 KOps/s 60.6388 KOps/s $\color{#35bf28}+0.57\%$
test_creation_nested_1 30.7423μs 28.5510μs 35.0251 KOps/s 34.0413 KOps/s $\color{#35bf28}+2.89\%$
test_creation_nested_2 29.8333μs 28.8961μs 34.6068 KOps/s 34.0831 KOps/s $\color{#35bf28}+1.54\%$
test_clone 28.2143μs 26.7031μs 37.4488 KOps/s 38.0033 KOps/s $\color{#d91a1a}-1.46\%$
test_getitem[int] 38.7461μs 32.3966μs 30.8674 KOps/s 31.0041 KOps/s $\color{#d91a1a}-0.44\%$
test_getitem[slice_int] 73.9413μs 66.6663μs 15.0001 KOps/s 14.9836 KOps/s $\color{#35bf28}+0.11\%$
test_getitem[range] 74.2921μs 69.1876μs 14.4535 KOps/s 14.6059 KOps/s $\color{#d91a1a}-1.04\%$
test_getitem[tuple] 65.0411μs 62.0465μs 16.1169 KOps/s 16.1197 KOps/s $\color{#d91a1a}-0.02\%$
test_getitem[list] 65.8684μs 60.9788μs 16.3991 KOps/s 16.4881 KOps/s $\color{#d91a1a}-0.54\%$
test_setitem_dim[int] 71.3000μs 45.6449μs 21.9083 KOps/s 21.9449 KOps/s $\color{#d91a1a}-0.17\%$
test_setitem_dim[slice_int] 0.1758ms 81.1112μs 12.3287 KOps/s 12.1614 KOps/s $\color{#35bf28}+1.38\%$
test_setitem_dim[range] 0.1788ms 77.0089μs 12.9855 KOps/s 12.9054 KOps/s $\color{#35bf28}+0.62\%$
test_setitem_dim[tuple] 0.1193ms 74.4281μs 13.4358 KOps/s 13.4092 KOps/s $\color{#35bf28}+0.20\%$
test_setitem 40.0194μs 38.8999μs 25.7070 KOps/s 26.4821 KOps/s $\color{#d91a1a}-2.93\%$
test_set 38.7954μs 37.9036μs 26.3827 KOps/s 26.9395 KOps/s $\color{#d91a1a}-2.07\%$
test_set_shared 0.1761ms 0.1725ms 5.7976 KOps/s 5.5898 KOps/s $\color{#35bf28}+3.72\%$
test_update 48.4875μs 47.6425μs 20.9897 KOps/s 20.9379 KOps/s $\color{#35bf28}+0.25\%$
test_update_nested 68.6918μs 67.8326μs 14.7422 KOps/s 14.6867 KOps/s $\color{#35bf28}+0.38\%$
test_set_nested 48.4155μs 47.2545μs 21.1620 KOps/s 21.2628 KOps/s $\color{#d91a1a}-0.47\%$
test_set_nested_new 67.1287μs 65.3483μs 15.3026 KOps/s 15.2878 KOps/s $\color{#35bf28}+0.10\%$
test_select 0.1048ms 0.1030ms 9.7084 KOps/s 9.7137 KOps/s $\color{#d91a1a}-0.05\%$
test_creation[device0] 1.3230ms 0.4980ms 2.0082 KOps/s 2.0212 KOps/s $\color{#d91a1a}-0.64\%$
test_creation_from_tensor 0.5925ms 0.4588ms 2.1796 KOps/s 2.1421 KOps/s $\color{#35bf28}+1.75\%$
test_add_one[memmap_tensor0] 38.4354μs 31.0937μs 32.1609 KOps/s 32.5054 KOps/s $\color{#d91a1a}-1.06\%$
test_contiguous[memmap_tensor0] 8.7811μs 8.3946μs 119.1236 KOps/s 125.4655 KOps/s $\textbf{\color{#d91a1a}-5.05\%}$
test_stack[memmap_tensor0] 0.1725ms 42.4503μs 23.5570 KOps/s 22.6395 KOps/s $\color{#35bf28}+4.05\%$
test_reshape_pytree 38.4604μs 35.5565μs 28.1242 KOps/s 27.6355 KOps/s $\color{#35bf28}+1.77\%$
test_reshape_td 51.3526μs 48.5635μs 20.5916 KOps/s 20.2104 KOps/s $\color{#35bf28}+1.89\%$
test_view_pytree 34.1914μs 32.7556μs 30.5291 KOps/s 30.0088 KOps/s $\color{#35bf28}+1.73\%$
test_view_td 10.0471μs 9.0904μs 110.0063 KOps/s 113.2958 KOps/s $\color{#d91a1a}-2.90\%$
test_unbind_pytree 38.0654μs 36.7597μs 27.2037 KOps/s 27.6739 KOps/s $\color{#d91a1a}-1.70\%$
test_unbind_td 0.1856ms 0.1829ms 5.4665 KOps/s 5.4929 KOps/s $\color{#d91a1a}-0.48\%$
test_split_pytree 43.0505μs 41.3203μs 24.2012 KOps/s 24.3498 KOps/s $\color{#d91a1a}-0.61\%$
test_split_td 0.1185ms 0.1151ms 8.6918 KOps/s 8.6866 KOps/s $\color{#35bf28}+0.06\%$
test_add_pytree 47.7655μs 45.8325μs 21.8186 KOps/s 22.0234 KOps/s $\color{#d91a1a}-0.93\%$
test_add_td 79.9829μs 77.7171μs 12.8672 KOps/s 13.3213 KOps/s $\color{#d91a1a}-3.41\%$
test_distributed 77.6010μs 77.6010μs 12.8864 KOps/s 11.6008 KOps/s $\textbf{\color{#35bf28}+11.08\%}$
test_tdmodule 63.6010μs 27.8449μs 35.9132 KOps/s 34.9557 KOps/s $\color{#35bf28}+2.74\%$
test_tdmodule_dispatch 63.8316ms 66.5463μs 15.0271 KOps/s 16.0410 KOps/s $\textbf{\color{#d91a1a}-6.32\%}$
test_tdseq 0.1906ms 37.8101μs 26.4480 KOps/s 25.6436 KOps/s $\color{#35bf28}+3.14\%$
test_tdseq_dispatch 0.1280ms 69.5180μs 14.3848 KOps/s 14.0158 KOps/s $\color{#35bf28}+2.63\%$
test_instantiation_functorch 1.7595ms 1.6069ms 622.3001 Ops/s 640.2802 Ops/s $\color{#d91a1a}-2.81\%$
test_instantiation_td 8.5516ms 1.2837ms 779.0093 Ops/s 836.0473 Ops/s $\textbf{\color{#d91a1a}-6.82\%}$
test_exec_functorch 0.1877ms 0.1831ms 5.4628 KOps/s 5.6044 KOps/s $\color{#d91a1a}-2.53\%$
test_exec_td 0.3305ms 0.3253ms 3.0738 KOps/s 3.0600 KOps/s $\color{#35bf28}+0.45\%$

Copy link
Contributor

@tcbegley tcbegley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@vmoens vmoens merged commit a84add9 into main May 18, 2023
@vmoens vmoens deleted the to_benchmark branch May 18, 2023 09:27
@github-actions
Copy link

$\color{#35bf28}\textsf{\Large✔\kern{0.2cm}\normalsize OK}$ Result of GPU Benchmark Tests

Total Benchmarks: 47. Improved: $\large\color{#35bf28}1$. Worsened: $\large\color{#d91a1a}0$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_common_ops 1.1232ms 1.0766ms 928.8860 Ops/s 921.1330 Ops/s $\color{#35bf28}+0.84\%$
test_creation 3.8350μs 3.2666μs 306.1258 KOps/s 307.7201 KOps/s $\color{#d91a1a}-0.52\%$
test_creation_empty 14.4871μs 13.4890μs 74.1343 KOps/s 74.2203 KOps/s $\color{#d91a1a}-0.12\%$
test_creation_nested_1 24.0982μs 22.1941μs 45.0570 KOps/s 44.3319 KOps/s $\color{#35bf28}+1.64\%$
test_creation_nested_2 24.5262μs 23.6078μs 42.3589 KOps/s 42.1579 KOps/s $\color{#35bf28}+0.48\%$
test_clone 23.5502μs 21.8500μs 45.7665 KOps/s 45.7861 KOps/s $\color{#d91a1a}-0.04\%$
test_getitem[int] 27.8318μs 26.4922μs 37.7470 KOps/s 38.6608 KOps/s $\color{#d91a1a}-2.36\%$
test_getitem[slice_int] 56.3528μs 53.0438μs 18.8523 KOps/s 18.9903 KOps/s $\color{#d91a1a}-0.73\%$
test_getitem[range] 64.3704μs 58.8155μs 17.0023 KOps/s 17.0098 KOps/s $\color{#d91a1a}-0.04\%$
test_getitem[tuple] 54.6059μs 49.9416μs 20.0234 KOps/s 20.6643 KOps/s $\color{#d91a1a}-3.10\%$
test_getitem[list] 58.7554μs 52.3576μs 19.0994 KOps/s 19.2817 KOps/s $\color{#d91a1a}-0.95\%$
test_setitem_dim[int] 67.7010μs 37.9697μs 26.3368 KOps/s 26.0684 KOps/s $\color{#35bf28}+1.03\%$
test_setitem_dim[slice_int] 0.1961ms 67.8758μs 14.7328 KOps/s 14.7313 KOps/s $\color{#35bf28}+0.01\%$
test_setitem_dim[range] 0.1982ms 67.7301μs 14.7645 KOps/s 14.6643 KOps/s $\color{#35bf28}+0.68\%$
test_setitem_dim[tuple] 0.1213ms 61.2643μs 16.3227 KOps/s 16.4124 KOps/s $\color{#d91a1a}-0.55\%$
test_setitem 31.9281μs 30.3152μs 32.9867 KOps/s 32.4005 KOps/s $\color{#35bf28}+1.81\%$
test_set 31.2731μs 29.5788μs 33.8080 KOps/s 33.4311 KOps/s $\color{#35bf28}+1.13\%$
test_set_shared 0.1717ms 0.1674ms 5.9726 KOps/s 5.8525 KOps/s $\color{#35bf28}+2.05\%$
test_update 38.9602μs 37.6532μs 26.5581 KOps/s 26.5426 KOps/s $\color{#35bf28}+0.06\%$
test_update_nested 57.7193μs 54.4123μs 18.3782 KOps/s 18.5271 KOps/s $\color{#d91a1a}-0.80\%$
test_set_nested 38.9432μs 37.2366μs 26.8553 KOps/s 26.6323 KOps/s $\color{#35bf28}+0.84\%$
test_set_nested_new 53.2383μs 51.5554μs 19.3966 KOps/s 19.1714 KOps/s $\color{#35bf28}+1.17\%$
test_select 86.6335μs 81.5595μs 12.2610 KOps/s 12.0047 KOps/s $\color{#35bf28}+2.13\%$
test_creation[device0] 1.3380ms 0.5091ms 1.9642 KOps/s 1.7195 KOps/s $\textbf{\color{#35bf28}+14.23\%}$
test_creation_from_tensor 0.6008ms 0.4716ms 2.1202 KOps/s 2.0980 KOps/s $\color{#35bf28}+1.06\%$
test_add_one[memmap_tensor0] 52.4973μs 29.3338μs 34.0903 KOps/s 34.0615 KOps/s $\color{#35bf28}+0.08\%$
test_contiguous[memmap_tensor0] 8.5661μs 8.0354μs 124.4489 KOps/s 125.4883 KOps/s $\color{#d91a1a}-0.83\%$
test_stack[memmap_tensor0] 0.2116ms 46.1703μs 21.6589 KOps/s 21.5702 KOps/s $\color{#35bf28}+0.41\%$
test_reshape_pytree 31.7102μs 28.9053μs 34.5957 KOps/s 35.1331 KOps/s $\color{#d91a1a}-1.53\%$
test_reshape_td 42.5352μs 39.7015μs 25.1880 KOps/s 25.1002 KOps/s $\color{#35bf28}+0.35\%$
test_view_pytree 27.2591μs 26.1016μs 38.3118 KOps/s 38.7916 KOps/s $\color{#d91a1a}-1.24\%$
test_view_td 8.2710μs 7.1184μs 140.4804 KOps/s 142.9036 KOps/s $\color{#d91a1a}-1.70\%$
test_unbind_pytree 32.5042μs 30.4162μs 32.8772 KOps/s 32.7653 KOps/s $\color{#35bf28}+0.34\%$
test_unbind_td 0.1539ms 0.1509ms 6.6270 KOps/s 6.9651 KOps/s $\color{#d91a1a}-4.85\%$
test_split_pytree 35.9802μs 33.6553μs 29.7130 KOps/s 29.7239 KOps/s $\color{#d91a1a}-0.04\%$
test_split_td 98.9505μs 94.6739μs 10.5626 KOps/s 10.8533 KOps/s $\color{#d91a1a}-2.68\%$
test_add_pytree 40.0102μs 37.6641μs 26.5505 KOps/s 27.0816 KOps/s $\color{#d91a1a}-1.96\%$
test_add_td 64.7363μs 61.2600μs 16.3239 KOps/s 16.4400 KOps/s $\color{#d91a1a}-0.71\%$
test_distributed 0.1083ms 0.1083ms 9.2335 KOps/s 9.1323 KOps/s $\color{#35bf28}+1.11\%$
test_tdmodule 0.1168ms 24.1332μs 41.4366 KOps/s 42.8568 KOps/s $\color{#d91a1a}-3.31\%$
test_tdmodule_dispatch 0.2237ms 53.0911μs 18.8356 KOps/s 19.3372 KOps/s $\color{#d91a1a}-2.59\%$
test_tdseq 0.1155ms 32.0818μs 31.1703 KOps/s 31.7417 KOps/s $\color{#d91a1a}-1.80\%$
test_tdseq_dispatch 0.1507ms 62.7798μs 15.9287 KOps/s 16.1876 KOps/s $\color{#d91a1a}-1.60\%$
test_instantiation_functorch 1.3400ms 1.2662ms 789.7704 Ops/s 779.4269 Ops/s $\color{#35bf28}+1.33\%$
test_instantiation_td 1.0497ms 0.9934ms 1.0067 KOps/s 1.0011 KOps/s $\color{#35bf28}+0.56\%$
test_exec_functorch 0.1937ms 0.1586ms 6.3049 KOps/s 6.4094 KOps/s $\color{#d91a1a}-1.63\%$
test_exec_td 0.2839ms 0.2761ms 3.6214 KOps/s 3.6649 KOps/s $\color{#d91a1a}-1.18\%$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmarks CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants