Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] TensorDict.numpy() #787

Merged
merged 4 commits into from
May 23, 2024
Merged

[Feature] TensorDict.numpy() #787

merged 4 commits into from
May 23, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented May 23, 2024

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 23, 2024
@vmoens vmoens added the enhancement New feature or request label May 23, 2024
Copy link
Contributor

@shagunsodhani shagunsodhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wohoo <3

Co-authored-by: Shagun Sodhani <1321193+shagunsodhani@users.noreply.github.com>
Copy link

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 135. Improved: $\large\color{#35bf28}11$. Worsened: $\large\color{#d91a1a}7$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 90.6250μs 12.6458μs 79.0779 KOps/s 80.9587 KOps/s $\color{#d91a1a}-2.32\%$
test_plain_set_stack_nested 30.0620μs 12.8210μs 77.9971 KOps/s 80.2178 KOps/s $\color{#d91a1a}-2.77\%$
test_plain_set_nested_inplace 45.9520μs 14.0818μs 71.0138 KOps/s 73.4019 KOps/s $\color{#d91a1a}-3.25\%$
test_plain_set_stack_nested_inplace 31.5220μs 14.2598μs 70.1273 KOps/s 72.4788 KOps/s $\color{#d91a1a}-3.24\%$
test_items 19.1810μs 4.6648μs 214.3718 KOps/s 209.5040 KOps/s $\color{#35bf28}+2.32\%$
test_items_nested 0.3918ms 0.3360ms 2.9766 KOps/s 2.9177 KOps/s $\color{#35bf28}+2.02\%$
test_items_nested_locked 0.3936ms 0.3373ms 2.9650 KOps/s 2.9429 KOps/s $\color{#35bf28}+0.75\%$
test_items_nested_leaf 0.1106ms 82.6091μs 12.1052 KOps/s 12.2989 KOps/s $\color{#d91a1a}-1.57\%$
test_items_stack_nested 0.4684ms 0.3370ms 2.9673 KOps/s 2.9371 KOps/s $\color{#35bf28}+1.03\%$
test_items_stack_nested_leaf 0.1181ms 83.7818μs 11.9358 KOps/s 11.8546 KOps/s $\color{#35bf28}+0.68\%$
test_items_stack_nested_locked 0.4105ms 0.3405ms 2.9366 KOps/s 2.8862 KOps/s $\color{#35bf28}+1.75\%$
test_keys 20.3510μs 4.3392μs 230.4573 KOps/s 230.1424 KOps/s $\color{#35bf28}+0.14\%$
test_keys_nested 96.3650μs 67.4018μs 14.8364 KOps/s 15.0080 KOps/s $\color{#d91a1a}-1.14\%$
test_keys_nested_locked 0.7543ms 72.0979μs 13.8700 KOps/s 13.8361 KOps/s $\color{#35bf28}+0.25\%$
test_keys_nested_leaf 0.2112ms 57.5596μs 17.3733 KOps/s 17.4659 KOps/s $\color{#d91a1a}-0.53\%$
test_keys_stack_nested 92.5050μs 67.4027μs 14.8362 KOps/s 14.8469 KOps/s $\color{#d91a1a}-0.07\%$
test_keys_stack_nested_leaf 0.1017ms 57.2305μs 17.4732 KOps/s 17.2942 KOps/s $\color{#35bf28}+1.04\%$
test_keys_stack_nested_locked 94.2450μs 72.4875μs 13.7955 KOps/s 13.9558 KOps/s $\color{#d91a1a}-1.15\%$
test_values 8.9037μs 1.8147μs 551.0501 KOps/s 551.0431 KOps/s $+0.00\%$
test_values_nested 58.2630μs 35.0557μs 28.5260 KOps/s 28.4896 KOps/s $\color{#35bf28}+0.13\%$
test_values_nested_locked 54.7330μs 36.8361μs 27.1473 KOps/s 27.0134 KOps/s $\color{#35bf28}+0.50\%$
test_values_nested_leaf 51.3230μs 31.3975μs 31.8497 KOps/s 31.6929 KOps/s $\color{#35bf28}+0.49\%$
test_values_stack_nested 56.4630μs 35.4400μs 28.2167 KOps/s 27.6670 KOps/s $\color{#35bf28}+1.99\%$
test_values_stack_nested_leaf 0.1497ms 31.7678μs 31.4785 KOps/s 31.1200 KOps/s $\color{#35bf28}+1.15\%$
test_values_stack_nested_locked 52.9920μs 37.5763μs 26.6125 KOps/s 26.4987 KOps/s $\color{#35bf28}+0.43\%$
test_membership 3.1030μs 0.7182μs 1.3923 MOps/s 1.3958 MOps/s $\color{#d91a1a}-0.25\%$
test_membership_nested 36.1320μs 2.6000μs 384.6089 KOps/s 393.4191 KOps/s $\color{#d91a1a}-2.24\%$
test_membership_nested_leaf 39.2430μs 2.5808μs 387.4764 KOps/s 390.4729 KOps/s $\color{#d91a1a}-0.77\%$
test_membership_stacked_nested 22.9710μs 2.5897μs 386.1420 KOps/s 386.0624 KOps/s $\color{#35bf28}+0.02\%$
test_membership_stacked_nested_leaf 27.3610μs 2.6060μs 383.7299 KOps/s 388.8129 KOps/s $\color{#d91a1a}-1.31\%$
test_membership_nested_last 31.9020μs 3.1246μs 320.0374 KOps/s 325.3299 KOps/s $\color{#d91a1a}-1.63\%$
test_membership_nested_leaf_last 29.0020μs 3.1502μs 317.4449 KOps/s 324.1846 KOps/s $\color{#d91a1a}-2.08\%$
test_membership_stacked_nested_last 24.2110μs 3.6292μs 275.5426 KOps/s 322.8036 KOps/s $\textbf{\color{#d91a1a}-14.64\%}$
test_membership_stacked_nested_leaf_last 19.7110μs 3.6417μs 274.5944 KOps/s 322.7769 KOps/s $\textbf{\color{#d91a1a}-14.93\%}$
test_nested_getleaf 24.8510μs 8.4406μs 118.4748 KOps/s 119.5795 KOps/s $\color{#d91a1a}-0.92\%$
test_nested_get 23.0210μs 7.9282μs 126.1316 KOps/s 126.6048 KOps/s $\color{#d91a1a}-0.37\%$
test_stacked_getleaf 32.4820μs 8.4189μs 118.7801 KOps/s 119.2445 KOps/s $\color{#d91a1a}-0.39\%$
test_stacked_get 33.6220μs 7.9214μs 126.2400 KOps/s 126.6212 KOps/s $\color{#d91a1a}-0.30\%$
test_nested_getitemleaf 32.2620μs 8.6500μs 115.6069 KOps/s 116.5013 KOps/s $\color{#d91a1a}-0.77\%$
test_nested_getitem 37.1420μs 8.0544μs 124.1561 KOps/s 124.3747 KOps/s $\color{#d91a1a}-0.18\%$
test_stacked_getitemleaf 26.7620μs 8.6449μs 115.6755 KOps/s 116.7667 KOps/s $\color{#d91a1a}-0.93\%$
test_stacked_getitem 49.0930μs 8.0993μs 123.4674 KOps/s 123.6642 KOps/s $\color{#d91a1a}-0.16\%$
test_lock_nested 58.8024ms 0.4130ms 2.4216 KOps/s 2.3481 KOps/s $\color{#35bf28}+3.13\%$
test_lock_stack_nested 0.3573ms 0.3089ms 3.2368 KOps/s 3.1880 KOps/s $\color{#35bf28}+1.53\%$
test_unlock_nested 0.7541ms 0.3576ms 2.7966 KOps/s 2.7819 KOps/s $\color{#35bf28}+0.53\%$
test_unlock_stack_nested 0.3792ms 0.3188ms 3.1369 KOps/s 3.1194 KOps/s $\color{#35bf28}+0.56\%$
test_flatten_speed 0.1868ms 0.1027ms 9.7359 KOps/s 9.8383 KOps/s $\color{#d91a1a}-1.04\%$
test_unflatten_speed 0.3489ms 0.2879ms 3.4736 KOps/s 3.4434 KOps/s $\color{#35bf28}+0.88\%$
test_common_ops 1.0610ms 0.5805ms 1.7225 KOps/s 1.7235 KOps/s $\color{#d91a1a}-0.06\%$
test_creation 35.5920μs 1.7400μs 574.7248 KOps/s 598.4863 KOps/s $\color{#d91a1a}-3.97\%$
test_creation_empty 32.3620μs 8.3992μs 119.0585 KOps/s 130.6856 KOps/s $\textbf{\color{#d91a1a}-8.90\%}$
test_creation_nested_1 38.7720μs 10.1483μs 98.5382 KOps/s 106.1727 KOps/s $\textbf{\color{#d91a1a}-7.19\%}$
test_creation_nested_2 38.1320μs 12.5533μs 79.6606 KOps/s 85.3070 KOps/s $\textbf{\color{#d91a1a}-6.62\%}$
test_clone 65.5440μs 11.7873μs 84.8371 KOps/s 83.3355 KOps/s $\color{#35bf28}+1.80\%$
test_getitem[int] 36.2520μs 10.7812μs 92.7539 KOps/s 86.8228 KOps/s $\textbf{\color{#35bf28}+6.83\%}$
test_getitem[slice_int] 42.8020μs 20.2684μs 49.3378 KOps/s 46.6529 KOps/s $\textbf{\color{#35bf28}+5.75\%}$
test_getitem[range] 64.9530μs 47.0846μs 21.2384 KOps/s 20.4454 KOps/s $\color{#35bf28}+3.88\%$
test_getitem[tuple] 42.3030μs 18.6300μs 53.6768 KOps/s 50.6199 KOps/s $\textbf{\color{#35bf28}+6.04\%}$
test_getitem[list] 0.1220ms 33.2842μs 30.0443 KOps/s 28.4505 KOps/s $\textbf{\color{#35bf28}+5.60\%}$
test_setitem_dim[int] 53.9530μs 29.7482μs 33.6155 KOps/s 35.0935 KOps/s $\color{#d91a1a}-4.21\%$
test_setitem_dim[slice_int] 71.7140μs 49.5361μs 20.1873 KOps/s 20.5022 KOps/s $\color{#d91a1a}-1.54\%$
test_setitem_dim[range] 92.2250μs 68.2911μs 14.6432 KOps/s 15.1010 KOps/s $\color{#d91a1a}-3.03\%$
test_setitem_dim[tuple] 75.4140μs 44.1063μs 22.6725 KOps/s 23.2700 KOps/s $\color{#d91a1a}-2.57\%$
test_setitem 39.3320μs 16.2518μs 61.5317 KOps/s 58.4118 KOps/s $\textbf{\color{#35bf28}+5.34\%}$
test_set 56.3530μs 15.7983μs 63.2979 KOps/s 62.2734 KOps/s $\color{#35bf28}+1.65\%$
test_set_shared 1.1214ms 97.2592μs 10.2818 KOps/s 10.0719 KOps/s $\color{#35bf28}+2.08\%$
test_update 0.1535ms 17.9930μs 55.5773 KOps/s 56.3522 KOps/s $\color{#d91a1a}-1.38\%$
test_update_nested 66.2040μs 23.3970μs 42.7406 KOps/s 43.0699 KOps/s $\color{#d91a1a}-0.76\%$
test_update__nested 48.3120μs 22.8622μs 43.7404 KOps/s 42.8020 KOps/s $\color{#35bf28}+2.19\%$
test_set_nested 47.2820μs 17.0728μs 58.5726 KOps/s 58.4377 KOps/s $\color{#35bf28}+0.23\%$
test_set_nested_new 0.1439ms 19.5003μs 51.2814 KOps/s 50.0969 KOps/s $\color{#35bf28}+2.36\%$
test_select 81.8540μs 33.6967μs 29.6765 KOps/s 29.5202 KOps/s $\color{#35bf28}+0.53\%$
test_select_nested 0.7758ms 54.5192μs 18.3422 KOps/s 18.3000 KOps/s $\color{#35bf28}+0.23\%$
test_exclude_nested 0.1552ms 0.1122ms 8.9136 KOps/s 8.9946 KOps/s $\color{#d91a1a}-0.90\%$
test_empty[True] 0.4011ms 0.3472ms 2.8805 KOps/s 2.8675 KOps/s $\color{#35bf28}+0.45\%$
test_empty[False] 3.9132μs 0.8768μs 1.1405 MOps/s 1.1633 MOps/s $\color{#d91a1a}-1.96\%$
test_to 0.1022ms 76.5000μs 13.0719 KOps/s 12.4277 KOps/s $\textbf{\color{#35bf28}+5.18\%}$
test_to_nonblocking 0.2331ms 62.1370μs 16.0935 KOps/s 15.9339 KOps/s $\color{#35bf28}+1.00\%$
test_unbind_speed 0.3107ms 0.2715ms 3.6836 KOps/s 3.5857 KOps/s $\color{#35bf28}+2.73\%$
test_unbind_speed_stack0 0.3347ms 0.2727ms 3.6673 KOps/s 3.6315 KOps/s $\color{#35bf28}+0.99\%$
test_unbind_speed_stack1 76.2536ms 0.8208ms 1.2184 KOps/s 1.2241 KOps/s $\color{#d91a1a}-0.47\%$
test_split 76.2664ms 1.6533ms 604.8564 Ops/s 593.4977 Ops/s $\color{#35bf28}+1.91\%$
test_chunk 77.0309ms 1.6484ms 606.6605 Ops/s 593.9403 Ops/s $\color{#35bf28}+2.14\%$
test_creation[device0] 0.1269ms 57.3743μs 17.4294 KOps/s 16.9193 KOps/s $\color{#35bf28}+3.02\%$
test_creation_from_tensor 0.1293ms 53.8953μs 18.5545 KOps/s 16.9018 KOps/s $\textbf{\color{#35bf28}+9.78\%}$
test_add_one[memmap_tensor0] 73.4840μs 6.9947μs 142.9648 KOps/s 138.6508 KOps/s $\color{#35bf28}+3.11\%$
test_contiguous[memmap_tensor0] 13.9100μs 0.7362μs 1.3582 MOps/s 1.4351 MOps/s $\textbf{\color{#d91a1a}-5.35\%}$
test_stack[memmap_tensor0] 30.2910μs 4.6840μs 213.4915 KOps/s 198.0917 KOps/s $\textbf{\color{#35bf28}+7.77\%}$
test_memmaptd_index 1.0189ms 0.2825ms 3.5396 KOps/s 3.3476 KOps/s $\textbf{\color{#35bf28}+5.73\%}$
test_memmaptd_index_astensor 0.6952ms 0.3582ms 2.7918 KOps/s 2.7134 KOps/s $\color{#35bf28}+2.89\%$
test_memmaptd_index_op 0.9376ms 0.6461ms 1.5477 KOps/s 1.5192 KOps/s $\color{#35bf28}+1.87\%$
test_serialize_model 0.1848s 0.1118s 8.9431 Ops/s 8.5377 Ops/s $\color{#35bf28}+4.75\%$
test_serialize_model_pickle 1.4624s 1.2396s 0.8067 Ops/s 0.8081 Ops/s $\color{#d91a1a}-0.17\%$
test_serialize_weights 0.1799s 0.1110s 9.0086 Ops/s 8.3784 Ops/s $\textbf{\color{#35bf28}+7.52\%}$
test_serialize_weights_returnearly 0.2348s 97.0101ms 10.3082 Ops/s 10.4096 Ops/s $\color{#d91a1a}-0.97\%$
test_serialize_weights_pickle 1.3500s 1.2481s 0.8012 Ops/s 0.7972 Ops/s $\color{#35bf28}+0.50\%$
test_reshape_pytree 56.9830μs 26.1591μs 38.2276 KOps/s 36.9563 KOps/s $\color{#35bf28}+3.44\%$
test_reshape_td 59.9730μs 31.5324μs 31.7135 KOps/s 31.1242 KOps/s $\color{#35bf28}+1.89\%$
test_view_pytree 0.1633ms 26.1177μs 38.2882 KOps/s 37.6722 KOps/s $\color{#35bf28}+1.64\%$
test_view_td 61.5130μs 35.3429μs 28.2942 KOps/s 27.8517 KOps/s $\color{#35bf28}+1.59\%$
test_unbind_pytree 58.7530μs 31.9446μs 31.3042 KOps/s 30.6328 KOps/s $\color{#35bf28}+2.19\%$
test_unbind_td 0.4113ms 41.8178μs 23.9133 KOps/s 23.1989 KOps/s $\color{#35bf28}+3.08\%$
test_split_pytree 0.1796ms 35.7259μs 27.9909 KOps/s 28.4002 KOps/s $\color{#d91a1a}-1.44\%$
test_split_td 0.1987ms 41.3591μs 24.1785 KOps/s 24.5023 KOps/s $\color{#d91a1a}-1.32\%$
test_add_pytree 0.1768ms 37.9533μs 26.3481 KOps/s 26.0945 KOps/s $\color{#35bf28}+0.97\%$
test_add_td 90.1340μs 53.3235μs 18.7535 KOps/s 20.4569 KOps/s $\textbf{\color{#d91a1a}-8.33\%}$
test_distributed 0.2028ms 67.3397μs 14.8501 KOps/s 14.8611 KOps/s $\color{#d91a1a}-0.07\%$
test_tdmodule 90.8850μs 14.8040μs 67.5495 KOps/s 67.3376 KOps/s $\color{#35bf28}+0.31\%$
test_tdmodule_dispatch 51.1530μs 29.1134μs 34.3485 KOps/s 34.9408 KOps/s $\color{#d91a1a}-1.70\%$
test_tdseq 40.7020μs 16.7164μs 59.8216 KOps/s 60.3652 KOps/s $\color{#d91a1a}-0.90\%$
test_tdseq_dispatch 57.9230μs 32.5325μs 30.7385 KOps/s 31.3560 KOps/s $\color{#d91a1a}-1.97\%$
test_instantiation_functorch 1.6784ms 1.5584ms 641.6967 Ops/s 643.4282 Ops/s $\color{#d91a1a}-0.27\%$
test_instantiation_td 78.9163ms 1.1657ms 857.8435 Ops/s 848.7154 Ops/s $\color{#35bf28}+1.08\%$
test_exec_functorch 0.1815ms 0.1474ms 6.7844 KOps/s 6.5883 KOps/s $\color{#35bf28}+2.98\%$
test_exec_functional_call 0.1665ms 0.1339ms 7.4673 KOps/s 7.2690 KOps/s $\color{#35bf28}+2.73\%$
test_exec_td 0.2625ms 0.1303ms 7.6719 KOps/s 7.2411 KOps/s $\textbf{\color{#35bf28}+5.95\%}$
test_exec_td_decorator 0.4432ms 0.2057ms 4.8624 KOps/s 4.7936 KOps/s $\color{#35bf28}+1.44\%$
test_vmap_mlp_speed[True-True] 0.7858ms 0.5872ms 1.7029 KOps/s 1.7057 KOps/s $\color{#d91a1a}-0.17\%$
test_vmap_mlp_speed[True-False] 0.6516ms 0.5805ms 1.7226 KOps/s 1.7198 KOps/s $\color{#35bf28}+0.16\%$
test_vmap_mlp_speed[False-True] 0.6922ms 0.5126ms 1.9509 KOps/s 1.9459 KOps/s $\color{#35bf28}+0.26\%$
test_vmap_mlp_speed[False-False] 0.6143ms 0.5139ms 1.9459 KOps/s 1.9486 KOps/s $\color{#d91a1a}-0.14\%$
test_vmap_mlp_speed_decorator[True-True] 0.8213ms 0.6486ms 1.5417 KOps/s 1.5417 KOps/s $+0.01\%$
test_vmap_mlp_speed_decorator[True-False] 0.9223ms 0.6606ms 1.5138 KOps/s 1.5507 KOps/s $\color{#d91a1a}-2.38\%$
test_vmap_mlp_speed_decorator[False-True] 0.7550ms 0.5718ms 1.7490 KOps/s 1.7547 KOps/s $\color{#d91a1a}-0.33\%$
test_vmap_mlp_speed_decorator[False-False] 0.7341ms 0.5687ms 1.7585 KOps/s 1.7479 KOps/s $\color{#35bf28}+0.61\%$
test_vmap_transformer_speed[True-True] 8.0121ms 7.7358ms 129.2691 Ops/s 128.7457 Ops/s $\color{#35bf28}+0.41\%$
test_vmap_transformer_speed[True-False] 8.0299ms 7.7395ms 129.2067 Ops/s 128.8128 Ops/s $\color{#35bf28}+0.31\%$
test_vmap_transformer_speed[False-True] 8.5189ms 7.8991ms 126.5963 Ops/s 129.8270 Ops/s $\color{#d91a1a}-2.49\%$
test_vmap_transformer_speed[False-False] 8.4427ms 7.9300ms 126.1032 Ops/s 130.1567 Ops/s $\color{#d91a1a}-3.11\%$
test_vmap_transformer_speed_decorator[True-True] 19.9777ms 19.2632ms 51.9126 Ops/s 53.0848 Ops/s $\color{#d91a1a}-2.21\%$
test_vmap_transformer_speed_decorator[True-False] 20.2042ms 19.2406ms 51.9734 Ops/s 52.8448 Ops/s $\color{#d91a1a}-1.65\%$
test_vmap_transformer_speed_decorator[False-True] 19.9257ms 19.1362ms 52.2569 Ops/s 53.2776 Ops/s $\color{#d91a1a}-1.92\%$
test_vmap_transformer_speed_decorator[False-False] 19.9491ms 19.0680ms 52.4439 Ops/s 53.1644 Ops/s $\color{#d91a1a}-1.36\%$
test_to_module_speed[True] 2.8140ms 1.5584ms 641.6799 Ops/s 640.8238 Ops/s $\color{#35bf28}+0.13\%$
test_to_module_speed[False] 2.0652ms 1.5470ms 646.4208 Ops/s 649.5224 Ops/s $\color{#d91a1a}-0.48\%$

Copy link

$\color{#D29922}\textsf{\Large&amp;#x26A0;\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 127. Improved: $\large\color{#35bf28}9$. Worsened: $\large\color{#d91a1a}3$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 40.7570μs 17.6144μs 56.7717 KOps/s 55.2492 KOps/s $\color{#35bf28}+2.76\%$
test_plain_set_stack_nested 46.6470μs 17.5740μs 56.9023 KOps/s 54.6138 KOps/s $\color{#35bf28}+4.19\%$
test_plain_set_nested_inplace 46.1660μs 19.4692μs 51.3633 KOps/s 49.6941 KOps/s $\color{#35bf28}+3.36\%$
test_plain_set_stack_nested_inplace 68.0380μs 19.3709μs 51.6239 KOps/s 49.5614 KOps/s $\color{#35bf28}+4.16\%$
test_items 22.1820μs 2.5276μs 395.6257 KOps/s 385.9964 KOps/s $\color{#35bf28}+2.49\%$
test_items_nested 0.4205ms 0.2641ms 3.7862 KOps/s 3.6904 KOps/s $\color{#35bf28}+2.60\%$
test_items_nested_locked 0.9817ms 0.2685ms 3.7237 KOps/s 3.7013 KOps/s $\color{#35bf28}+0.61\%$
test_items_nested_leaf 0.1297ms 78.7093μs 12.7050 KOps/s 12.9357 KOps/s $\color{#d91a1a}-1.78\%$
test_items_stack_nested 0.4669ms 0.2698ms 3.7063 KOps/s 3.6314 KOps/s $\color{#35bf28}+2.06\%$
test_items_stack_nested_leaf 0.1407ms 79.1644μs 12.6319 KOps/s 12.7787 KOps/s $\color{#d91a1a}-1.15\%$
test_items_stack_nested_locked 0.6079ms 0.2748ms 3.6388 KOps/s 3.6707 KOps/s $\color{#d91a1a}-0.87\%$
test_keys 28.1930μs 4.1424μs 241.4075 KOps/s 237.2266 KOps/s $\color{#35bf28}+1.76\%$
test_keys_nested 0.2542ms 0.1400ms 7.1433 KOps/s 7.1234 KOps/s $\color{#35bf28}+0.28\%$
test_keys_nested_locked 0.7394ms 0.1460ms 6.8490 KOps/s 6.9026 KOps/s $\color{#d91a1a}-0.78\%$
test_keys_nested_leaf 0.2026ms 0.1194ms 8.3764 KOps/s 8.3720 KOps/s $\color{#35bf28}+0.05\%$
test_keys_stack_nested 0.3120ms 0.1411ms 7.0863 KOps/s 7.0031 KOps/s $\color{#35bf28}+1.19\%$
test_keys_stack_nested_leaf 0.2363ms 0.1177ms 8.4967 KOps/s 8.3937 KOps/s $\color{#35bf28}+1.23\%$
test_keys_stack_nested_locked 0.2628ms 0.1444ms 6.9260 KOps/s 6.8821 KOps/s $\color{#35bf28}+0.64\%$
test_values 6.9505μs 1.1504μs 869.2830 KOps/s 858.5141 KOps/s $\color{#35bf28}+1.25\%$
test_values_nested 93.8950μs 51.8716μs 19.2784 KOps/s 19.2577 KOps/s $\color{#35bf28}+0.11\%$
test_values_nested_locked 98.7240μs 52.1633μs 19.1706 KOps/s 19.3582 KOps/s $\color{#d91a1a}-0.97\%$
test_values_nested_leaf 94.4260μs 46.9326μs 21.3071 KOps/s 21.2936 KOps/s $\color{#35bf28}+0.06\%$
test_values_stack_nested 0.1027ms 52.2137μs 19.1521 KOps/s 19.0854 KOps/s $\color{#35bf28}+0.35\%$
test_values_stack_nested_leaf 97.5630μs 46.9216μs 21.3121 KOps/s 20.9247 KOps/s $\color{#35bf28}+1.85\%$
test_values_stack_nested_locked 0.1004ms 51.9733μs 19.2406 KOps/s 19.1278 KOps/s $\color{#35bf28}+0.59\%$
test_membership 12.0030μs 1.3819μs 723.6603 KOps/s 740.6662 KOps/s $\color{#d91a1a}-2.30\%$
test_membership_nested 26.0190μs 3.5848μs 278.9518 KOps/s 286.9615 KOps/s $\color{#d91a1a}-2.79\%$
test_membership_nested_leaf 27.1300μs 3.5661μs 280.4197 KOps/s 284.6097 KOps/s $\color{#d91a1a}-1.47\%$
test_membership_stacked_nested 23.9350μs 3.5867μs 278.8083 KOps/s 271.7626 KOps/s $\color{#35bf28}+2.59\%$
test_membership_stacked_nested_leaf 23.3730μs 3.5807μs 279.2753 KOps/s 286.7945 KOps/s $\color{#d91a1a}-2.62\%$
test_membership_nested_last 23.1130μs 4.3607μs 229.3201 KOps/s 235.1485 KOps/s $\color{#d91a1a}-2.48\%$
test_membership_nested_leaf_last 28.5940μs 4.4203μs 226.2302 KOps/s 232.8138 KOps/s $\color{#d91a1a}-2.83\%$
test_membership_stacked_nested_last 27.5110μs 4.3741μs 228.6184 KOps/s 236.2154 KOps/s $\color{#d91a1a}-3.22\%$
test_membership_stacked_nested_leaf_last 28.9540μs 4.3190μs 231.5368 KOps/s 233.7828 KOps/s $\color{#d91a1a}-0.96\%$
test_nested_getleaf 51.3960μs 10.7692μs 92.8576 KOps/s 94.5485 KOps/s $\color{#d91a1a}-1.79\%$
test_nested_get 52.4880μs 10.1258μs 98.7578 KOps/s 100.0199 KOps/s $\color{#d91a1a}-1.26\%$
test_stacked_getleaf 50.0140μs 10.7108μs 93.3633 KOps/s 94.5474 KOps/s $\color{#d91a1a}-1.25\%$
test_stacked_get 31.8690μs 10.1081μs 98.9310 KOps/s 99.4937 KOps/s $\color{#d91a1a}-0.57\%$
test_nested_getitemleaf 49.4530μs 11.3088μs 88.4265 KOps/s 89.3644 KOps/s $\color{#d91a1a}-1.05\%$
test_nested_getitem 28.4830μs 10.4102μs 96.0600 KOps/s 97.0232 KOps/s $\color{#d91a1a}-0.99\%$
test_stacked_getitemleaf 54.0450μs 11.1544μs 89.6509 KOps/s 90.5297 KOps/s $\color{#d91a1a}-0.97\%$
test_stacked_getitem 32.6210μs 10.3261μs 96.8418 KOps/s 97.8475 KOps/s $\color{#d91a1a}-1.03\%$
test_lock_nested 47.8485ms 0.4019ms 2.4880 KOps/s 2.8040 KOps/s $\textbf{\color{#d91a1a}-11.27\%}$
test_lock_stack_nested 0.4357ms 0.3126ms 3.1992 KOps/s 3.1140 KOps/s $\color{#35bf28}+2.74\%$
test_unlock_nested 0.7529ms 0.3522ms 2.8392 KOps/s 2.4880 KOps/s $\textbf{\color{#35bf28}+14.12\%}$
test_unlock_stack_nested 0.5554ms 0.3213ms 3.1120 KOps/s 3.0582 KOps/s $\color{#35bf28}+1.76\%$
test_flatten_speed 0.2005ms 96.5569μs 10.3566 KOps/s 10.4071 KOps/s $\color{#d91a1a}-0.49\%$
test_unflatten_speed 0.8687ms 0.4102ms 2.4377 KOps/s 2.4452 KOps/s $\color{#d91a1a}-0.31\%$
test_common_ops 1.5799ms 0.7143ms 1.3999 KOps/s 1.3581 KOps/s $\color{#35bf28}+3.08\%$
test_creation 18.2740μs 1.9263μs 519.1314 KOps/s 524.3297 KOps/s $\color{#d91a1a}-0.99\%$
test_creation_empty 31.5990μs 11.1565μs 89.6338 KOps/s 85.8093 KOps/s $\color{#35bf28}+4.46\%$
test_creation_nested_1 34.8860μs 13.6709μs 73.1480 KOps/s 69.6843 KOps/s $\color{#35bf28}+4.97\%$
test_creation_nested_2 41.3980μs 16.9957μs 58.8385 KOps/s 56.5922 KOps/s $\color{#35bf28}+3.97\%$
test_clone 77.0140μs 13.3370μs 74.9796 KOps/s 73.2346 KOps/s $\color{#35bf28}+2.38\%$
test_getitem[int] 29.4460μs 11.6684μs 85.7012 KOps/s 85.4479 KOps/s $\color{#35bf28}+0.30\%$
test_getitem[slice_int] 56.7060μs 22.7267μs 44.0011 KOps/s 43.3675 KOps/s $\color{#35bf28}+1.46\%$
test_getitem[range] 78.7070μs 58.8502μs 16.9923 KOps/s 17.7641 KOps/s $\color{#d91a1a}-4.34\%$
test_getitem[tuple] 45.8850μs 19.0597μs 52.4666 KOps/s 51.4552 KOps/s $\color{#35bf28}+1.97\%$
test_getitem[list] 0.1038ms 41.4370μs 24.1330 KOps/s 24.8604 KOps/s $\color{#d91a1a}-2.93\%$
test_setitem_dim[int] 54.9330μs 36.4302μs 27.4498 KOps/s 27.1119 KOps/s $\color{#35bf28}+1.25\%$
test_setitem_dim[slice_int] 97.7330μs 61.9981μs 16.1295 KOps/s 15.8865 KOps/s $\color{#35bf28}+1.53\%$
test_setitem_dim[range] 0.1343ms 83.7496μs 11.9403 KOps/s 11.7780 KOps/s $\color{#35bf28}+1.38\%$
test_setitem_dim[tuple] 96.7910μs 51.5749μs 19.3893 KOps/s 19.4810 KOps/s $\color{#d91a1a}-0.47\%$
test_setitem 50.0940μs 20.5190μs 48.7352 KOps/s 49.1079 KOps/s $\color{#d91a1a}-0.76\%$
test_set 0.1443ms 20.6788μs 48.3586 KOps/s 49.4045 KOps/s $\color{#d91a1a}-2.12\%$
test_set_shared 3.4793ms 0.1405ms 7.1198 KOps/s 7.2290 KOps/s $\color{#d91a1a}-1.51\%$
test_update 89.8880μs 22.3915μs 44.6599 KOps/s 44.3903 KOps/s $\color{#35bf28}+0.61\%$
test_update_nested 69.4800μs 30.5837μs 32.6972 KOps/s 31.5775 KOps/s $\color{#35bf28}+3.55\%$
test_update__nested 75.7520μs 25.4079μs 39.3579 KOps/s 38.8525 KOps/s $\color{#35bf28}+1.30\%$
test_set_nested 60.2130μs 21.7411μs 45.9958 KOps/s 45.1930 KOps/s $\color{#35bf28}+1.78\%$
test_set_nested_new 82.6350μs 25.9139μs 38.5893 KOps/s 35.3713 KOps/s $\textbf{\color{#35bf28}+9.10\%}$
test_select 0.9608ms 40.5344μs 24.6704 KOps/s 24.2642 KOps/s $\color{#35bf28}+1.67\%$
test_select_nested 0.1226ms 61.2646μs 16.3226 KOps/s 16.2432 KOps/s $\color{#35bf28}+0.49\%$
test_exclude_nested 0.2647ms 0.1216ms 8.2244 KOps/s 8.2262 KOps/s $\color{#d91a1a}-0.02\%$
test_empty[True] 0.7835ms 0.4012ms 2.4928 KOps/s 2.4820 KOps/s $\color{#35bf28}+0.43\%$
test_empty[False] 13.3122μs 1.1117μs 899.5341 KOps/s 918.8820 KOps/s $\color{#d91a1a}-2.11\%$
test_unbind_speed 1.7315ms 0.2641ms 3.7861 KOps/s 3.6965 KOps/s $\color{#35bf28}+2.42\%$
test_unbind_speed_stack0 0.3885ms 0.2609ms 3.8325 KOps/s 3.7579 KOps/s $\color{#35bf28}+1.99\%$
test_unbind_speed_stack1 61.4301ms 0.7326ms 1.3650 KOps/s 1.2637 KOps/s $\textbf{\color{#35bf28}+8.02\%}$
test_split 62.9947ms 1.5842ms 631.2518 Ops/s 619.5291 Ops/s $\color{#35bf28}+1.89\%$
test_chunk 62.2583ms 1.5833ms 631.5899 Ops/s 619.5115 Ops/s $\color{#35bf28}+1.95\%$
test_creation[device0] 0.1590ms 83.2525μs 12.0117 KOps/s 11.9133 KOps/s $\color{#35bf28}+0.83\%$
test_creation_from_tensor 3.7029ms 87.4454μs 11.4357 KOps/s 11.5653 KOps/s $\color{#d91a1a}-1.12\%$
test_add_one[memmap_tensor0] 60.3530μs 5.3184μs 188.0258 KOps/s 181.8450 KOps/s $\color{#35bf28}+3.40\%$
test_contiguous[memmap_tensor0] 17.7330μs 0.6381μs 1.5671 MOps/s 1.5866 MOps/s $\color{#d91a1a}-1.23\%$
test_stack[memmap_tensor0] 17.5230μs 3.5990μs 277.8587 KOps/s 276.3699 KOps/s $\color{#35bf28}+0.54\%$
test_memmaptd_index 1.0386ms 0.2532ms 3.9497 KOps/s 3.8438 KOps/s $\color{#35bf28}+2.75\%$
test_memmaptd_index_astensor 0.7371ms 0.3283ms 3.0460 KOps/s 2.9743 KOps/s $\color{#35bf28}+2.41\%$
test_memmaptd_index_op 1.1354ms 0.6120ms 1.6340 KOps/s 1.5424 KOps/s $\textbf{\color{#35bf28}+5.94\%}$
test_serialize_model 0.1074s 0.1025s 9.7565 Ops/s 8.7324 Ops/s $\textbf{\color{#35bf28}+11.73\%}$
test_serialize_model_pickle 0.4498s 0.3795s 2.6349 Ops/s 2.6107 Ops/s $\color{#35bf28}+0.93\%$
test_serialize_weights 0.1078s 0.1005s 9.9537 Ops/s 8.9353 Ops/s $\textbf{\color{#35bf28}+11.40\%}$
test_serialize_weights_returnearly 0.2647s 0.1471s 6.7973 Ops/s 7.9567 Ops/s $\textbf{\color{#d91a1a}-14.57\%}$
test_serialize_weights_pickle 0.9447s 0.5911s 1.6916 Ops/s 2.3373 Ops/s $\textbf{\color{#d91a1a}-27.62\%}$
test_serialize_weights_filesystem 0.1627s 97.4914ms 10.2573 Ops/s 10.2990 Ops/s $\color{#d91a1a}-0.40\%$
test_serialize_model_filesystem 0.1565s 98.3030ms 10.1726 Ops/s 10.5304 Ops/s $\color{#d91a1a}-3.40\%$
test_reshape_pytree 64.7510μs 25.7492μs 38.8362 KOps/s 39.3156 KOps/s $\color{#d91a1a}-1.22\%$
test_reshape_td 67.0250μs 33.1159μs 30.1970 KOps/s 28.3400 KOps/s $\textbf{\color{#35bf28}+6.55\%}$
test_view_pytree 78.4040μs 25.5943μs 39.0712 KOps/s 38.8949 KOps/s $\color{#35bf28}+0.45\%$
test_view_td 78.9580μs 37.1900μs 26.8889 KOps/s 25.7404 KOps/s $\color{#35bf28}+4.46\%$
test_unbind_pytree 73.1470μs 29.4654μs 33.9381 KOps/s 33.9008 KOps/s $\color{#35bf28}+0.11\%$
test_unbind_td 0.3688ms 38.2415μs 26.1496 KOps/s 25.2956 KOps/s $\color{#35bf28}+3.38\%$
test_split_pytree 77.1440μs 29.1121μs 34.3500 KOps/s 33.9988 KOps/s $\color{#35bf28}+1.03\%$
test_split_td 0.1220ms 41.0242μs 24.3758 KOps/s 24.0893 KOps/s $\color{#35bf28}+1.19\%$
test_add_pytree 0.1075ms 34.4655μs 29.0145 KOps/s 28.2483 KOps/s $\color{#35bf28}+2.71\%$
test_add_td 0.1683ms 55.5077μs 18.0155 KOps/s 17.1154 KOps/s $\textbf{\color{#35bf28}+5.26\%}$
test_distributed 0.1815ms 0.1003ms 9.9746 KOps/s 9.8357 KOps/s $\color{#35bf28}+1.41\%$
test_tdmodule 65.7930μs 18.1606μs 55.0642 KOps/s 54.4050 KOps/s $\color{#35bf28}+1.21\%$
test_tdmodule_dispatch 64.1100μs 36.0916μs 27.7073 KOps/s 26.7742 KOps/s $\color{#35bf28}+3.49\%$
test_tdseq 48.7810μs 21.3517μs 46.8346 KOps/s 44.7508 KOps/s $\color{#35bf28}+4.66\%$
test_tdseq_dispatch 71.5840μs 41.6480μs 24.0107 KOps/s 23.3391 KOps/s $\color{#35bf28}+2.88\%$
test_instantiation_functorch 7.4331ms 1.3315ms 751.0452 Ops/s 749.4356 Ops/s $\color{#35bf28}+0.21\%$
test_instantiation_td 2.0213ms 0.9992ms 1.0008 KOps/s 908.8059 Ops/s $\textbf{\color{#35bf28}+10.13\%}$
test_exec_functorch 0.2541ms 0.1589ms 6.2931 KOps/s 6.3262 KOps/s $\color{#d91a1a}-0.52\%$
test_exec_functional_call 0.2261ms 0.1431ms 6.9859 KOps/s 6.8308 KOps/s $\color{#35bf28}+2.27\%$
test_exec_td 0.2884ms 0.1410ms 7.0940 KOps/s 7.0792 KOps/s $\color{#35bf28}+0.21\%$
test_exec_td_decorator 0.4725ms 0.2169ms 4.6101 KOps/s 4.5398 KOps/s $\color{#35bf28}+1.55\%$
test_vmap_mlp_speed[True-True] 0.7377ms 0.4875ms 2.0513 KOps/s 2.0369 KOps/s $\color{#35bf28}+0.71\%$
test_vmap_mlp_speed[True-False] 0.7617ms 0.4814ms 2.0773 KOps/s 2.0764 KOps/s $\color{#35bf28}+0.04\%$
test_vmap_mlp_speed[False-True] 0.5993ms 0.3901ms 2.5634 KOps/s 2.5948 KOps/s $\color{#d91a1a}-1.21\%$
test_vmap_mlp_speed[False-False] 0.6224ms 0.3915ms 2.5544 KOps/s 2.5869 KOps/s $\color{#d91a1a}-1.26\%$
test_vmap_mlp_speed_decorator[True-True] 1.1422ms 0.5517ms 1.8126 KOps/s 1.8065 KOps/s $\color{#35bf28}+0.34\%$
test_vmap_mlp_speed_decorator[True-False] 0.9833ms 0.5522ms 1.8109 KOps/s 1.8158 KOps/s $\color{#d91a1a}-0.27\%$
test_vmap_mlp_speed_decorator[False-True] 0.6122ms 0.4518ms 2.2135 KOps/s 2.2031 KOps/s $\color{#35bf28}+0.47\%$
test_vmap_mlp_speed_decorator[False-False] 0.6376ms 0.4527ms 2.2091 KOps/s 2.2099 KOps/s $\color{#d91a1a}-0.04\%$
test_to_module_speed[True] 2.2718ms 1.6851ms 593.4265 Ops/s 578.4029 Ops/s $\color{#35bf28}+2.60\%$
test_to_module_speed[False] 2.2118ms 1.6567ms 603.6238 Ops/s 592.6238 Ops/s $\color{#35bf28}+1.86\%$

@vmoens vmoens merged commit 8c30fbc into main May 23, 2024
35 of 38 checks passed
@vmoens vmoens deleted the numpy branch May 23, 2024 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants