Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Benchmarks] using setup for functional benchmarks #351

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Apr 20, 2023

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 20, 2023
@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 46. Improved: $\large\color{#35bf28}3$. Worsened: $\large\color{#d91a1a}6$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_common_ops 1.1394ms 1.0597ms 943.6666 Ops/s 937.6902 Ops/s $\color{#35bf28}+0.64\%$
test_creation 4.3181μs 3.9514μs 253.0759 KOps/s 255.7809 KOps/s $\color{#d91a1a}-1.06\%$
test_creation_empty 11.8713μs 11.1167μs 89.9550 KOps/s 90.4665 KOps/s $\color{#d91a1a}-0.57\%$
test_creation_nested_1 22.6755μs 21.7376μs 46.0033 KOps/s 45.9475 KOps/s $\color{#35bf28}+0.12\%$
test_creation_nested_2 37.3038μs 21.2111μs 47.1451 KOps/s 47.6921 KOps/s $\color{#d91a1a}-1.15\%$
test_clone 26.4036μs 24.7453μs 40.4117 KOps/s 38.1965 KOps/s $\textbf{\color{#35bf28}+5.80\%}$
test_getitem[int] 43.4359μs 38.7709μs 25.7926 KOps/s 25.9153 KOps/s $\color{#d91a1a}-0.47\%$
test_getitem[slice_int] 78.0487μs 73.1682μs 13.6671 KOps/s 13.5015 KOps/s $\color{#35bf28}+1.23\%$
test_getitem[range] 80.3957μs 75.5117μs 13.2430 KOps/s 13.2134 KOps/s $\color{#35bf28}+0.22\%$
test_getitem[tuple] 68.6225μs 67.3846μs 14.8402 KOps/s 14.8375 KOps/s $\color{#35bf28}+0.02\%$
test_setitem_dim[int] 96.4020μs 44.7799μs 22.3315 KOps/s 21.6913 KOps/s $\color{#35bf28}+2.95\%$
test_setitem_dim[slice_int] 0.1511ms 78.5428μs 12.7319 KOps/s 12.2080 KOps/s $\color{#35bf28}+4.29\%$
test_setitem_dim[range] 0.2102ms 77.1912μs 12.9549 KOps/s 12.7673 KOps/s $\color{#35bf28}+1.47\%$
test_setitem_dim[tuple] 0.2007ms 72.1702μs 13.8561 KOps/s 13.5884 KOps/s $\color{#35bf28}+1.97\%$
test_setitem 32.8387μs 31.7816μs 31.4647 KOps/s 30.2809 KOps/s $\color{#35bf28}+3.91\%$
test_set 32.0847μs 30.7420μs 32.5288 KOps/s 30.7139 KOps/s $\textbf{\color{#35bf28}+5.91\%}$
test_set_shared 0.2132ms 0.1773ms 5.6414 KOps/s 5.6174 KOps/s $\color{#35bf28}+0.43\%$
test_update 41.9778μs 40.6892μs 24.5765 KOps/s 23.7814 KOps/s $\color{#35bf28}+3.34\%$
test_update_nested 60.2192μs 59.3211μs 16.8574 KOps/s 16.2332 KOps/s $\color{#35bf28}+3.84\%$
test_set_nested 51.2230μs 40.9687μs 24.4089 KOps/s 23.7264 KOps/s $\color{#35bf28}+2.88\%$
test_set_nested_new 58.5142μs 56.8142μs 17.6012 KOps/s 17.0734 KOps/s $\color{#35bf28}+3.09\%$
test_select 97.0860μs 94.4645μs 10.5860 KOps/s 10.5234 KOps/s $\color{#35bf28}+0.60\%$
test_creation[device0] 1.3389ms 0.5029ms 1.9885 KOps/s 2.0083 KOps/s $\color{#d91a1a}-0.99\%$
test_creation_from_tensor 0.6127ms 0.4672ms 2.1404 KOps/s 2.1142 KOps/s $\color{#35bf28}+1.24\%$
test_add_one[memmap_tensor0] 38.2058μs 31.1422μs 32.1108 KOps/s 31.4381 KOps/s $\color{#35bf28}+2.14\%$
test_contiguous[memmap_tensor0] 8.6402μs 8.1938μs 122.0439 KOps/s 121.7867 KOps/s $\color{#35bf28}+0.21\%$
test_stack[memmap_tensor0] 0.2029ms 45.0409μs 22.2020 KOps/s 23.0999 KOps/s $\color{#d91a1a}-3.89\%$
test_reshape_pytree 38.0698μs 35.0258μs 28.5504 KOps/s 28.5072 KOps/s $\color{#35bf28}+0.15\%$
test_reshape_td 51.0360μs 48.6920μs 20.5373 KOps/s 20.5991 KOps/s $\color{#d91a1a}-0.30\%$
test_view_pytree 34.1857μs 32.8790μs 30.4145 KOps/s 30.9997 KOps/s $\color{#d91a1a}-1.89\%$
test_view_td 10.0722μs 9.0166μs 110.9068 KOps/s 112.0939 KOps/s $\color{#d91a1a}-1.06\%$
test_unbind_pytree 37.8578μs 36.4630μs 27.4251 KOps/s 27.2997 KOps/s $\color{#35bf28}+0.46\%$
test_unbind_td 0.1918ms 0.1811ms 5.5214 KOps/s 4.5182 KOps/s $\textbf{\color{#35bf28}+22.20\%}$
test_split_pytree 45.9839μs 42.0305μs 23.7922 KOps/s 23.6866 KOps/s $\color{#35bf28}+0.45\%$
test_split_td 0.1145ms 0.1116ms 8.9610 KOps/s 8.7965 KOps/s $\color{#35bf28}+1.87\%$
test_add_pytree 48.1739μs 45.5654μs 21.9465 KOps/s 21.9440 KOps/s $\color{#35bf28}+0.01\%$
test_add_td 76.1044μs 74.1517μs 13.4859 KOps/s 12.9505 KOps/s $\color{#35bf28}+4.13\%$
test_distributed 77.2010μs 77.2010μs 12.9532 KOps/s 14.8586 KOps/s $\textbf{\color{#d91a1a}-12.82\%}$
test_tdmodule 62.0010μs 23.3195μs 42.8825 KOps/s 45.9615 KOps/s $\textbf{\color{#d91a1a}-6.70\%}$
test_tdmodule_dispatch 66.0149ms 58.5273μs 17.0860 KOps/s 20.0830 KOps/s $\textbf{\color{#d91a1a}-14.92\%}$
test_tdseq 0.6275ms 34.8722μs 28.6762 KOps/s 30.3844 KOps/s $\textbf{\color{#d91a1a}-5.62\%}$
test_tdseq_dispatch 0.1119ms 61.0545μs 16.3788 KOps/s 16.6126 KOps/s $\color{#d91a1a}-1.41\%$
test_instantiation_functorch 1.6672ms 1.5915ms 628.3532 Ops/s 1.0400 KOps/s $\textbf{\color{#d91a1a}-39.58\%}$
test_instantiation_td 9.2456ms 1.2879ms 776.4402 Ops/s 1.9086 KOps/s $\textbf{\color{#d91a1a}-59.32\%}$
test_exec_functorch 0.1862ms 0.1813ms 5.5157 KOps/s 5.4539 KOps/s $\color{#35bf28}+1.13\%$
test_exec_td 0.2938ms 0.2884ms 3.4671 KOps/s 3.4663 KOps/s $\color{#35bf28}+0.02\%$

@github-actions
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 46. Improved: $\large\color{#35bf28}6$. Worsened: $\large\color{#d91a1a}5$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_common_ops 1.0720ms 1.0156ms 984.6771 Ops/s 971.2296 Ops/s $\color{#35bf28}+1.38\%$
test_creation 3.9861μs 3.2033μs 312.1825 KOps/s 295.2580 KOps/s $\textbf{\color{#35bf28}+5.73\%}$
test_creation_empty 10.6341μs 9.5939μs 104.2331 KOps/s 98.2121 KOps/s $\textbf{\color{#35bf28}+6.13\%}$
test_creation_nested_1 19.8733μs 17.5807μs 56.8806 KOps/s 52.7232 KOps/s $\textbf{\color{#35bf28}+7.89\%}$
test_creation_nested_2 20.1503μs 18.3102μs 54.6144 KOps/s 51.6881 KOps/s $\textbf{\color{#35bf28}+5.66\%}$
test_clone 25.0434μs 23.8462μs 41.9353 KOps/s 42.8225 KOps/s $\color{#d91a1a}-2.07\%$
test_getitem[int] 38.7565μs 33.0344μs 30.2715 KOps/s 30.6224 KOps/s $\color{#d91a1a}-1.15\%$
test_getitem[slice_int] 69.5990μs 62.6335μs 15.9659 KOps/s 15.5051 KOps/s $\color{#35bf28}+2.97\%$
test_getitem[range] 80.1551μs 69.1484μs 14.4617 KOps/s 14.0893 KOps/s $\color{#35bf28}+2.64\%$
test_getitem[tuple] 67.5899μs 59.9505μs 16.6804 KOps/s 16.5768 KOps/s $\color{#35bf28}+0.63\%$
test_setitem_dim[int] 0.1209ms 42.6516μs 23.4458 KOps/s 23.1822 KOps/s $\color{#35bf28}+1.14\%$
test_setitem_dim[slice_int] 0.1920ms 74.8314μs 13.3634 KOps/s 12.9087 KOps/s $\color{#35bf28}+3.52\%$
test_setitem_dim[range] 0.2253ms 77.4941μs 12.9042 KOps/s 12.8744 KOps/s $\color{#35bf28}+0.23\%$
test_setitem_dim[tuple] 0.1663ms 69.1689μs 14.4574 KOps/s 14.7257 KOps/s $\color{#d91a1a}-1.82\%$
test_setitem 31.2094μs 27.9288μs 35.8053 KOps/s 35.5153 KOps/s $\color{#35bf28}+0.82\%$
test_set 30.8814μs 28.4150μs 35.1926 KOps/s 35.8693 KOps/s $\color{#d91a1a}-1.89\%$
test_set_shared 0.1964ms 0.1868ms 5.3541 KOps/s 5.4083 KOps/s $\color{#d91a1a}-1.00\%$
test_update 39.1626μs 36.4289μs 27.4507 KOps/s 27.2729 KOps/s $\color{#35bf28}+0.65\%$
test_update_nested 56.0568μs 52.6581μs 18.9904 KOps/s 18.3082 KOps/s $\color{#35bf28}+3.73\%$
test_set_nested 41.7326μs 34.7238μs 28.7987 KOps/s 28.4314 KOps/s $\color{#35bf28}+1.29\%$
test_set_nested_new 53.8558μs 50.0905μs 19.9639 KOps/s 19.5943 KOps/s $\color{#35bf28}+1.89\%$
test_select 88.0553μs 82.2268μs 12.1615 KOps/s 11.8351 KOps/s $\color{#35bf28}+2.76\%$
test_creation[device0] 1.3631ms 0.5282ms 1.8931 KOps/s 1.8629 KOps/s $\color{#35bf28}+1.62\%$
test_creation_from_tensor 0.6711ms 0.5459ms 1.8317 KOps/s 2.0266 KOps/s $\textbf{\color{#d91a1a}-9.62\%}$
test_add_one[memmap_tensor0] 39.4746μs 31.9512μs 31.2977 KOps/s 30.7200 KOps/s $\color{#35bf28}+1.88\%$
test_contiguous[memmap_tensor0] 9.5531μs 8.4816μs 117.9018 KOps/s 110.2838 KOps/s $\textbf{\color{#35bf28}+6.91\%}$
test_stack[memmap_tensor0] 0.2007ms 45.9206μs 21.7767 KOps/s 20.6481 KOps/s $\textbf{\color{#35bf28}+5.47\%}$
test_reshape_pytree 32.9395μs 30.5613μs 32.7211 KOps/s 31.9252 KOps/s $\color{#35bf28}+2.49\%$
test_reshape_td 46.0817μs 42.6579μs 23.4423 KOps/s 22.6730 KOps/s $\color{#35bf28}+3.39\%$
test_view_pytree 31.2994μs 28.3227μs 35.3073 KOps/s 34.7916 KOps/s $\color{#35bf28}+1.48\%$
test_view_td 9.0171μs 7.9592μs 125.6401 KOps/s 124.3028 KOps/s $\color{#35bf28}+1.08\%$
test_unbind_pytree 36.3415μs 33.0091μs 30.2947 KOps/s 30.7358 KOps/s $\color{#d91a1a}-1.44\%$
test_unbind_td 0.1636ms 0.1550ms 6.4526 KOps/s 6.4404 KOps/s $\color{#35bf28}+0.19\%$
test_split_pytree 40.3176μs 38.1257μs 26.2290 KOps/s 26.6848 KOps/s $\color{#d91a1a}-1.71\%$
test_split_td 0.1092ms 0.1033ms 9.6825 KOps/s 9.7033 KOps/s $\color{#d91a1a}-0.21\%$
test_add_pytree 44.4616μs 40.3014μs 24.8130 KOps/s 24.3175 KOps/s $\color{#35bf28}+2.04\%$
test_add_td 70.5390μs 66.3583μs 15.0697 KOps/s 14.7497 KOps/s $\color{#35bf28}+2.17\%$
test_distributed 74.3010μs 74.3010μs 13.4588 KOps/s 13.0376 KOps/s $\color{#35bf28}+3.23\%$
test_tdmodule 0.1346ms 22.4597μs 44.5243 KOps/s 49.4735 KOps/s $\textbf{\color{#d91a1a}-10.00\%}$
test_tdmodule_dispatch 0.2464ms 50.9523μs 19.6262 KOps/s 20.6805 KOps/s $\textbf{\color{#d91a1a}-5.10\%}$
test_tdseq 0.1011ms 30.3894μs 32.9062 KOps/s 33.7559 KOps/s $\color{#d91a1a}-2.52\%$
test_tdseq_dispatch 0.1256ms 60.8964μs 16.4213 KOps/s 16.9103 KOps/s $\color{#d91a1a}-2.89\%$
test_instantiation_functorch 1.5415ms 1.4615ms 684.2505 Ops/s 1.1023 KOps/s $\textbf{\color{#d91a1a}-37.93\%}$
test_instantiation_td 1.2104ms 1.1464ms 872.3312 Ops/s 1.9152 KOps/s $\textbf{\color{#d91a1a}-54.45\%}$
test_exec_functorch 0.2451ms 0.1784ms 5.6052 KOps/s 5.5143 KOps/s $\color{#35bf28}+1.65\%$
test_exec_td 0.2867ms 0.2718ms 3.6787 KOps/s 3.7808 KOps/s $\color{#d91a1a}-2.70\%$

@vmoens
Copy link
Contributor Author

vmoens commented Apr 20, 2023

@apbard @tcbegley I was hoping that this would speed up the functional tests but they are slower now, any idea why?
I thought setup was taken out of the speed measure so we would measure just the calls to make_functional and not those to deepcopy.
Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Benchmarks CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants