Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Better detection of non-tensor data #685

Merged
merged 19 commits into from
Feb 26, 2024
Merged

[Feature] Better detection of non-tensor data #685

merged 19 commits into from
Feb 26, 2024

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Feb 21, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 21, 2024
Copy link

github-actions bot commented Feb 21, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 126. Improved: $\large\color{#35bf28}18$. Worsened: $\large\color{#d91a1a}25$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 37.3300μs 17.6113μs 56.7816 KOps/s 59.9580 KOps/s $\textbf{\color{#d91a1a}-5.30\%}$
test_plain_set_stack_nested 41.7580μs 17.9131μs 55.8250 KOps/s 59.8555 KOps/s $\textbf{\color{#d91a1a}-6.73\%}$
test_plain_set_nested_inplace 59.5510μs 20.3795μs 49.0689 KOps/s 53.3248 KOps/s $\textbf{\color{#d91a1a}-7.98\%}$
test_plain_set_stack_nested_inplace 70.4420μs 20.3671μs 49.0988 KOps/s 52.9238 KOps/s $\textbf{\color{#d91a1a}-7.23\%}$
test_items 19.6460μs 2.4195μs 413.3085 KOps/s 412.0162 KOps/s $\color{#35bf28}+0.31\%$
test_items_nested 0.4727ms 0.2708ms 3.6932 KOps/s 3.6546 KOps/s $\color{#35bf28}+1.06\%$
test_items_nested_locked 0.8595ms 0.2726ms 3.6677 KOps/s 3.6553 KOps/s $\color{#35bf28}+0.34\%$
test_items_nested_leaf 0.3052ms 0.1680ms 5.9529 KOps/s 5.9219 KOps/s $\color{#35bf28}+0.52\%$
test_items_stack_nested 0.4244ms 0.2706ms 3.6950 KOps/s 3.6239 KOps/s $\color{#35bf28}+1.96\%$
test_items_stack_nested_leaf 0.5004ms 0.1657ms 6.0343 KOps/s 5.8522 KOps/s $\color{#35bf28}+3.11\%$
test_items_stack_nested_locked 0.4264ms 0.2739ms 3.6508 KOps/s 3.5586 KOps/s $\color{#35bf28}+2.59\%$
test_keys 27.5920μs 3.8503μs 259.7211 KOps/s 262.5149 KOps/s $\color{#d91a1a}-1.06\%$
test_keys_nested 2.0815ms 0.1474ms 6.7840 KOps/s 6.4860 KOps/s $\color{#35bf28}+4.59\%$
test_keys_nested_locked 0.2715ms 0.1507ms 6.6343 KOps/s 6.3209 KOps/s $\color{#35bf28}+4.96\%$
test_keys_nested_leaf 39.0613ms 0.1352ms 7.3974 KOps/s 7.4630 KOps/s $\color{#d91a1a}-0.88\%$
test_keys_stack_nested 0.2567ms 0.1502ms 6.6595 KOps/s 6.4471 KOps/s $\color{#35bf28}+3.30\%$
test_keys_stack_nested_leaf 0.2620ms 0.1325ms 7.5452 KOps/s 7.3566 KOps/s $\color{#35bf28}+2.56\%$
test_keys_stack_nested_locked 0.3579ms 0.1549ms 6.4541 KOps/s 6.2306 KOps/s $\color{#35bf28}+3.59\%$
test_values 10.4927μs 1.1798μs 847.6050 KOps/s 843.8706 KOps/s $\color{#35bf28}+0.44\%$
test_values_nested 0.1010ms 52.3050μs 19.1186 KOps/s 19.1634 KOps/s $\color{#d91a1a}-0.23\%$
test_values_nested_locked 94.2960μs 51.8601μs 19.2827 KOps/s 19.2592 KOps/s $\color{#35bf28}+0.12\%$
test_values_nested_leaf 86.6720μs 46.8786μs 21.3317 KOps/s 21.1588 KOps/s $\color{#35bf28}+0.82\%$
test_values_stack_nested 0.1027ms 52.6553μs 18.9914 KOps/s 19.0851 KOps/s $\color{#d91a1a}-0.49\%$
test_values_stack_nested_leaf 92.1920μs 46.5143μs 21.4988 KOps/s 21.7319 KOps/s $\color{#d91a1a}-1.07\%$
test_values_stack_nested_locked 96.7610μs 52.3983μs 19.0846 KOps/s 19.1665 KOps/s $\color{#d91a1a}-0.43\%$
test_membership 13.9960μs 1.3279μs 753.0551 KOps/s 743.2057 KOps/s $\color{#35bf28}+1.33\%$
test_membership_nested 40.4050μs 3.4274μs 291.7694 KOps/s 284.7988 KOps/s $\color{#35bf28}+2.45\%$
test_membership_nested_leaf 18.5650μs 3.4908μs 286.4701 KOps/s 280.8854 KOps/s $\color{#35bf28}+1.99\%$
test_membership_stacked_nested 25.1170μs 3.4046μs 293.7209 KOps/s 281.4346 KOps/s $\color{#35bf28}+4.37\%$
test_membership_stacked_nested_leaf 27.5020μs 3.4024μs 293.9103 KOps/s 281.7147 KOps/s $\color{#35bf28}+4.33\%$
test_membership_nested_last 24.9760μs 4.3500μs 229.8851 KOps/s 146.4782 KOps/s $\textbf{\color{#35bf28}+56.94\%}$
test_membership_nested_leaf_last 20.6580μs 4.3299μs 230.9529 KOps/s 146.2749 KOps/s $\textbf{\color{#35bf28}+57.89\%}$
test_membership_stacked_nested_last 42.4400μs 5.0400μs 198.4146 KOps/s 148.7213 KOps/s $\textbf{\color{#35bf28}+33.41\%}$
test_membership_stacked_nested_leaf_last 32.1900μs 4.9901μs 200.3958 KOps/s 148.8793 KOps/s $\textbf{\color{#35bf28}+34.60\%}$
test_nested_getleaf 32.1400μs 10.6998μs 93.4596 KOps/s 95.9924 KOps/s $\color{#d91a1a}-2.64\%$
test_nested_get 45.7430μs 10.1213μs 98.8017 KOps/s 100.7326 KOps/s $\color{#d91a1a}-1.92\%$
test_stacked_getleaf 38.9130μs 10.6429μs 93.9595 KOps/s 95.8487 KOps/s $\color{#d91a1a}-1.97\%$
test_stacked_get 50.5740μs 10.0730μs 99.2756 KOps/s 101.2146 KOps/s $\color{#d91a1a}-1.92\%$
test_nested_getitemleaf 48.7310μs 11.1896μs 89.3684 KOps/s 83.8759 KOps/s $\textbf{\color{#35bf28}+6.55\%}$
test_nested_getitem 35.6560μs 10.5435μs 94.8447 KOps/s 87.0827 KOps/s $\textbf{\color{#35bf28}+8.91\%}$
test_stacked_getitemleaf 45.9360μs 11.1238μs 89.8973 KOps/s 83.4397 KOps/s $\textbf{\color{#35bf28}+7.74\%}$
test_stacked_getitem 49.7130μs 10.5313μs 94.9549 KOps/s 86.9335 KOps/s $\textbf{\color{#35bf28}+9.23\%}$
test_lock_nested 0.7056ms 0.3324ms 3.0083 KOps/s 2.9433 KOps/s $\color{#35bf28}+2.21\%$
test_lock_stack_nested 0.3865ms 0.2985ms 3.3504 KOps/s 3.3139 KOps/s $\color{#35bf28}+1.10\%$
test_unlock_nested 82.1423ms 0.4174ms 2.3960 KOps/s 2.3608 KOps/s $\color{#35bf28}+1.49\%$
test_unlock_stack_nested 0.4255ms 0.3068ms 3.2593 KOps/s 3.2229 KOps/s $\color{#35bf28}+1.13\%$
test_flatten_speed 5.4903ms 0.2846ms 3.5140 KOps/s 2.6924 KOps/s $\textbf{\color{#35bf28}+30.51\%}$
test_unflatten_speed 0.4761ms 0.4108ms 2.4341 KOps/s 2.1907 KOps/s $\textbf{\color{#35bf28}+11.11\%}$
test_common_ops 1.1842ms 0.7293ms 1.3712 KOps/s 1.4541 KOps/s $\textbf{\color{#d91a1a}-5.70\%}$
test_creation 52.1970μs 1.9214μs 520.4514 KOps/s 548.4730 KOps/s $\textbf{\color{#d91a1a}-5.11\%}$
test_creation_empty 38.0710μs 12.0390μs 83.0636 KOps/s 112.5383 KOps/s $\textbf{\color{#d91a1a}-26.19\%}$
test_creation_nested_1 75.8620μs 14.5655μs 68.6554 KOps/s 85.9136 KOps/s $\textbf{\color{#d91a1a}-20.09\%}$
test_creation_nested_2 70.2110μs 17.8527μs 56.0140 KOps/s 67.3024 KOps/s $\textbf{\color{#d91a1a}-16.77\%}$
test_clone 55.5530μs 13.5783μs 73.6471 KOps/s 76.1399 KOps/s $\color{#d91a1a}-3.27\%$
test_getitem[int] 29.9960μs 11.4197μs 87.5682 KOps/s 90.5112 KOps/s $\color{#d91a1a}-3.25\%$
test_getitem[slice_int] 64.4900μs 22.7372μs 43.9808 KOps/s 44.2609 KOps/s $\color{#d91a1a}-0.63\%$
test_getitem[range] 0.1373ms 42.7971μs 23.3661 KOps/s 24.3712 KOps/s $\color{#d91a1a}-4.12\%$
test_getitem[tuple] 54.2720μs 18.8454μs 53.0632 KOps/s 53.0275 KOps/s $\color{#35bf28}+0.07\%$
test_getitem[list] 0.1348ms 36.4152μs 27.4611 KOps/s 27.3702 KOps/s $\color{#35bf28}+0.33\%$
test_setitem_dim[int] 73.0460μs 36.2806μs 27.5629 KOps/s 33.9253 KOps/s $\textbf{\color{#d91a1a}-18.75\%}$
test_setitem_dim[slice_int] 98.6640μs 62.3842μs 16.0297 KOps/s 18.2677 KOps/s $\textbf{\color{#d91a1a}-12.25\%}$
test_setitem_dim[range] 0.1513ms 81.6312μs 12.2502 KOps/s 14.0118 KOps/s $\textbf{\color{#d91a1a}-12.57\%}$
test_setitem_dim[tuple] 84.5480μs 51.3695μs 19.4668 KOps/s 23.0135 KOps/s $\textbf{\color{#d91a1a}-15.41\%}$
test_setitem 67.5250μs 21.6573μs 46.1738 KOps/s 51.3645 KOps/s $\textbf{\color{#d91a1a}-10.11\%}$
test_set 77.3550μs 21.0073μs 47.6026 KOps/s 53.6355 KOps/s $\textbf{\color{#d91a1a}-11.25\%}$
test_set_shared 1.7517ms 0.1387ms 7.2100 KOps/s 7.1378 KOps/s $\color{#35bf28}+1.01\%$
test_update 0.1288ms 24.1645μs 41.3831 KOps/s 47.6977 KOps/s $\textbf{\color{#d91a1a}-13.24\%}$
test_update_nested 86.1510μs 31.7304μs 31.5155 KOps/s 34.7694 KOps/s $\textbf{\color{#d91a1a}-9.36\%}$
test_set_nested 86.0500μs 23.7337μs 42.1342 KOps/s 47.1370 KOps/s $\textbf{\color{#d91a1a}-10.61\%}$
test_set_nested_new 72.6260μs 27.5263μs 36.3289 KOps/s 39.4405 KOps/s $\textbf{\color{#d91a1a}-7.89\%}$
test_select 0.1132ms 41.0314μs 24.3716 KOps/s 25.9220 KOps/s $\textbf{\color{#d91a1a}-5.98\%}$
test_select_nested 0.1248ms 59.7801μs 16.7280 KOps/s 16.8193 KOps/s $\color{#d91a1a}-0.54\%$
test_exclude_nested 0.2165ms 0.1186ms 8.4345 KOps/s 8.2930 KOps/s $\color{#35bf28}+1.71\%$
test_empty[True] 1.2258ms 0.4143ms 2.4135 KOps/s 2.3840 KOps/s $\color{#35bf28}+1.24\%$
test_empty[False] 9.7802μs 1.0247μs 975.8827 KOps/s 966.0438 KOps/s $\color{#35bf28}+1.02\%$
test_unbind_speed 0.3599ms 0.2444ms 4.0924 KOps/s 4.0549 KOps/s $\color{#35bf28}+0.92\%$
test_unbind_speed_stack0 0.4209ms 0.2405ms 4.1583 KOps/s 4.1116 KOps/s $\color{#35bf28}+1.14\%$
test_unbind_speed_stack1 0.1186s 0.6653ms 1.5032 KOps/s 1.4449 KOps/s $\color{#35bf28}+4.04\%$
test_split 0.1148s 1.6416ms 609.1461 Ops/s 598.8942 Ops/s $\color{#35bf28}+1.71\%$
test_chunk 1.6857ms 1.4617ms 684.1561 Ops/s 685.5729 Ops/s $\color{#d91a1a}-0.21\%$
test_creation[device0] 4.0725ms 0.1052ms 9.5073 KOps/s 9.6515 KOps/s $\color{#d91a1a}-1.49\%$
test_creation_from_tensor 0.1694ms 82.6869μs 12.0938 KOps/s 11.8067 KOps/s $\color{#35bf28}+2.43\%$
test_add_one[memmap_tensor0] 70.9220μs 5.4410μs 183.7902 KOps/s 188.8241 KOps/s $\color{#d91a1a}-2.67\%$
test_contiguous[memmap_tensor0] 16.7520μs 0.6398μs 1.5631 MOps/s 1.5232 MOps/s $\color{#35bf28}+2.62\%$
test_stack[memmap_tensor0] 28.9940μs 3.6504μs 273.9456 KOps/s 284.5864 KOps/s $\color{#d91a1a}-3.74\%$
test_memmaptd_index 0.9439ms 0.2353ms 4.2508 KOps/s 4.2409 KOps/s $\color{#35bf28}+0.23\%$
test_memmaptd_index_astensor 0.6207ms 0.2993ms 3.3416 KOps/s 3.3553 KOps/s $\color{#d91a1a}-0.41\%$
test_memmaptd_index_op 1.1466ms 0.6297ms 1.5881 KOps/s 1.7666 KOps/s $\textbf{\color{#d91a1a}-10.10\%}$
test_serialize_model 0.2298s 0.1156s 8.6498 Ops/s 8.3219 Ops/s $\color{#35bf28}+3.94\%$
test_serialize_model_pickle 0.4512s 0.3752s 2.6649 Ops/s 2.6370 Ops/s $\color{#35bf28}+1.06\%$
test_serialize_weights 0.1043s 97.6729ms 10.2383 Ops/s 9.6030 Ops/s $\textbf{\color{#35bf28}+6.62\%}$
test_serialize_weights_returnearly 0.2483s 0.1356s 7.3759 Ops/s 8.0021 Ops/s $\textbf{\color{#d91a1a}-7.82\%}$
test_serialize_weights_pickle 1.0609s 0.6963s 1.4362 Ops/s 2.3369 Ops/s $\textbf{\color{#d91a1a}-38.54\%}$
test_serialize_weights_filesystem 94.0838ms 89.4027ms 11.1853 Ops/s 9.3647 Ops/s $\textbf{\color{#35bf28}+19.44\%}$
test_serialize_model_filesystem 94.6582ms 91.0549ms 10.9824 Ops/s 10.7828 Ops/s $\color{#35bf28}+1.85\%$
test_reshape_pytree 43.4510μs 20.8956μs 47.8569 KOps/s 47.1432 KOps/s $\color{#35bf28}+1.51\%$
test_reshape_td 86.4040μs 31.4667μs 31.7796 KOps/s 31.3823 KOps/s $\color{#35bf28}+1.27\%$
test_view_pytree 55.4030μs 20.6061μs 48.5294 KOps/s 48.2591 KOps/s $\color{#35bf28}+0.56\%$
test_view_td 0.1198s 60.3750μs 16.5631 KOps/s 16.0321 KOps/s $\color{#35bf28}+3.31\%$
test_unbind_pytree 57.3570μs 24.4006μs 40.9826 KOps/s 41.7201 KOps/s $\color{#d91a1a}-1.77\%$
test_unbind_td 0.1285ms 35.5232μs 28.1506 KOps/s 27.1121 KOps/s $\color{#35bf28}+3.83\%$
test_split_pytree 63.2570μs 23.9229μs 41.8010 KOps/s 41.7813 KOps/s $\color{#35bf28}+0.05\%$
test_split_td 0.1229ms 39.8666μs 25.0836 KOps/s 24.9111 KOps/s $\color{#35bf28}+0.69\%$
test_add_pytree 94.5870μs 30.7218μs 32.5502 KOps/s 33.8994 KOps/s $\color{#d91a1a}-3.98\%$
test_add_td 0.1066ms 57.9440μs 17.2581 KOps/s 19.2828 KOps/s $\textbf{\color{#d91a1a}-10.50\%}$
test_distributed 0.2015ms 0.1002ms 9.9827 KOps/s 9.8043 KOps/s $\color{#35bf28}+1.82\%$
test_tdmodule 78.7170μs 17.9968μs 55.5655 KOps/s 45.2748 KOps/s $\textbf{\color{#35bf28}+22.73\%}$
test_tdmodule_dispatch 54.0310μs 34.1504μs 29.2822 KOps/s 23.4871 KOps/s $\textbf{\color{#35bf28}+24.67\%}$
test_tdseq 33.6130μs 20.3464μs 49.1488 KOps/s 40.8961 KOps/s $\textbf{\color{#35bf28}+20.18\%}$
test_tdseq_dispatch 70.2310μs 40.1382μs 24.9139 KOps/s 21.9386 KOps/s $\textbf{\color{#35bf28}+13.56\%}$
test_instantiation_functorch 1.4522ms 1.3149ms 760.4980 Ops/s 748.3866 Ops/s $\color{#35bf28}+1.62\%$
test_instantiation_td 1.4848ms 1.0133ms 986.9031 Ops/s 986.7125 Ops/s $\color{#35bf28}+0.02\%$
test_exec_functorch 0.3228ms 0.1623ms 6.1599 KOps/s 6.3050 KOps/s $\color{#d91a1a}-2.30\%$
test_exec_functional_call 0.2893ms 0.1505ms 6.6445 KOps/s 6.7781 KOps/s $\color{#d91a1a}-1.97\%$
test_exec_td 0.2953ms 0.1452ms 6.8852 KOps/s 7.0735 KOps/s $\color{#d91a1a}-2.66\%$
test_exec_td_decorator 0.5960ms 0.1977ms 5.0582 KOps/s 5.2320 KOps/s $\color{#d91a1a}-3.32\%$
test_vmap_mlp_speed[True-True] 0.6881ms 0.4682ms 2.1357 KOps/s 2.1523 KOps/s $\color{#d91a1a}-0.77\%$
test_vmap_mlp_speed[True-False] 0.6742ms 0.4667ms 2.1429 KOps/s 2.1553 KOps/s $\color{#d91a1a}-0.57\%$
test_vmap_mlp_speed[False-True] 0.6211ms 0.3801ms 2.6306 KOps/s 2.6159 KOps/s $\color{#35bf28}+0.56\%$
test_vmap_mlp_speed[False-False] 0.6287ms 0.3802ms 2.6299 KOps/s 2.6136 KOps/s $\color{#35bf28}+0.63\%$
test_vmap_mlp_speed_decorator[True-True] 0.9811ms 0.4861ms 2.0572 KOps/s 1.9498 KOps/s $\textbf{\color{#35bf28}+5.50\%}$
test_vmap_mlp_speed_decorator[True-False] 0.7393ms 0.4864ms 2.0557 KOps/s 1.9499 KOps/s $\textbf{\color{#35bf28}+5.43\%}$
test_vmap_mlp_speed_decorator[False-True] 1.1149ms 0.4304ms 2.3235 KOps/s 2.5129 KOps/s $\textbf{\color{#d91a1a}-7.54\%}$
test_vmap_mlp_speed_decorator[False-False] 0.5310ms 0.3947ms 2.5335 KOps/s 2.5205 KOps/s $\color{#35bf28}+0.52\%$
test_to_module_speed[True] 2.1526ms 1.3750ms 727.2662 Ops/s 724.7738 Ops/s $\color{#35bf28}+0.34\%$
test_to_module_speed[False] 1.4462ms 1.3647ms 732.7674 Ops/s 725.8685 Ops/s $\color{#35bf28}+0.95\%$

Copy link

github-actions bot commented Feb 21, 2024

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 134. Improved: $\large\color{#35bf28}40$. Worsened: $\large\color{#d91a1a}4$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 27.7600μs 12.6648μs 78.9590 KOps/s 68.4920 KOps/s $\textbf{\color{#35bf28}+15.28\%}$
test_plain_set_stack_nested 31.2010μs 12.7279μs 78.5676 KOps/s 68.5138 KOps/s $\textbf{\color{#35bf28}+14.67\%}$
test_plain_set_nested_inplace 37.5600μs 14.0014μs 71.4213 KOps/s 62.8839 KOps/s $\textbf{\color{#35bf28}+13.58\%}$
test_plain_set_stack_nested_inplace 34.0810μs 14.0334μs 71.2585 KOps/s 62.7434 KOps/s $\textbf{\color{#35bf28}+13.57\%}$
test_items 31.2010μs 4.7489μs 210.5758 KOps/s 211.4235 KOps/s $\color{#d91a1a}-0.40\%$
test_items_nested 0.3610ms 0.3387ms 2.9522 KOps/s 2.9510 KOps/s $\color{#35bf28}+0.04\%$
test_items_nested_locked 0.3699ms 0.3447ms 2.9011 KOps/s 2.9128 KOps/s $\color{#d91a1a}-0.40\%$
test_items_nested_leaf 0.2276ms 0.2022ms 4.9463 KOps/s 4.9774 KOps/s $\color{#d91a1a}-0.63\%$
test_items_stack_nested 0.3792ms 0.3409ms 2.9330 KOps/s 2.9333 KOps/s $\color{#d91a1a}-0.01\%$
test_items_stack_nested_leaf 0.2433ms 0.1997ms 5.0077 KOps/s 4.9656 KOps/s $\color{#35bf28}+0.85\%$
test_items_stack_nested_locked 0.3732ms 0.3415ms 2.9280 KOps/s 2.9163 KOps/s $\color{#35bf28}+0.40\%$
test_keys 31.8900μs 4.6238μs 216.2710 KOps/s 219.5368 KOps/s $\color{#d91a1a}-1.49\%$
test_keys_nested 43.5992ms 0.1003ms 9.9740 KOps/s 10.6154 KOps/s $\textbf{\color{#d91a1a}-6.04\%}$
test_keys_nested_locked 0.1419ms 98.0907μs 10.1946 KOps/s 10.1833 KOps/s $\color{#35bf28}+0.11\%$
test_keys_nested_leaf 0.1142ms 77.3199μs 12.9333 KOps/s 12.8275 KOps/s $\color{#35bf28}+0.83\%$
test_keys_stack_nested 0.1145ms 93.2222μs 10.7271 KOps/s 10.6804 KOps/s $\color{#35bf28}+0.44\%$
test_keys_stack_nested_leaf 0.1011ms 77.1760μs 12.9574 KOps/s 12.8812 KOps/s $\color{#35bf28}+0.59\%$
test_keys_stack_nested_locked 0.8106ms 98.5108μs 10.1512 KOps/s 10.1759 KOps/s $\color{#d91a1a}-0.24\%$
test_values 11.4233μs 1.9269μs 518.9554 KOps/s 532.1783 KOps/s $\color{#d91a1a}-2.48\%$
test_values_nested 68.3110μs 45.5806μs 21.9392 KOps/s 22.0464 KOps/s $\color{#d91a1a}-0.49\%$
test_values_nested_locked 69.7410μs 47.8503μs 20.8985 KOps/s 21.0284 KOps/s $\color{#d91a1a}-0.62\%$
test_values_nested_leaf 86.4200μs 39.9869μs 25.0082 KOps/s 25.2465 KOps/s $\color{#d91a1a}-0.94\%$
test_values_stack_nested 75.9500μs 45.8096μs 21.8295 KOps/s 21.6821 KOps/s $\color{#35bf28}+0.68\%$
test_values_stack_nested_leaf 67.5900μs 39.8468μs 25.0961 KOps/s 25.0864 KOps/s $\color{#35bf28}+0.04\%$
test_values_stack_nested_locked 68.8300μs 47.9566μs 20.8522 KOps/s 20.6230 KOps/s $\color{#35bf28}+1.11\%$
test_membership 5.0200μs 0.9802μs 1.0202 MOps/s 1.0679 MOps/s $\color{#d91a1a}-4.47\%$
test_membership_nested 30.3500μs 2.9378μs 340.3903 KOps/s 343.4579 KOps/s $\color{#d91a1a}-0.89\%$
test_membership_nested_leaf 27.1100μs 2.9453μs 339.5192 KOps/s 343.2100 KOps/s $\color{#d91a1a}-1.08\%$
test_membership_stacked_nested 37.6900μs 2.9266μs 341.6927 KOps/s 343.6553 KOps/s $\color{#d91a1a}-0.57\%$
test_membership_stacked_nested_leaf 47.1510μs 2.9192μs 342.5571 KOps/s 341.3597 KOps/s $\color{#35bf28}+0.35\%$
test_membership_nested_last 35.2900μs 3.6113μs 276.9104 KOps/s 186.0517 KOps/s $\textbf{\color{#35bf28}+48.84\%}$
test_membership_nested_leaf_last 21.9500μs 3.6071μs 277.2310 KOps/s 186.5397 KOps/s $\textbf{\color{#35bf28}+48.62\%}$
test_membership_stacked_nested_last 48.2800μs 3.6156μs 276.5807 KOps/s 175.9311 KOps/s $\textbf{\color{#35bf28}+57.21\%}$
test_membership_stacked_nested_leaf_last 39.2200μs 3.6073μs 277.2156 KOps/s 175.5003 KOps/s $\textbf{\color{#35bf28}+57.96\%}$
test_nested_getleaf 36.3900μs 8.4193μs 118.7747 KOps/s 118.7982 KOps/s $\color{#d91a1a}-0.02\%$
test_nested_get 37.3800μs 7.9402μs 125.9407 KOps/s 125.9043 KOps/s $\color{#35bf28}+0.03\%$
test_stacked_getleaf 41.4710μs 8.3499μs 119.7618 KOps/s 119.0777 KOps/s $\color{#35bf28}+0.57\%$
test_stacked_get 50.9600μs 7.8800μs 126.9036 KOps/s 126.2588 KOps/s $\color{#35bf28}+0.51\%$
test_nested_getitemleaf 26.3300μs 8.6766μs 115.2529 KOps/s 102.1614 KOps/s $\textbf{\color{#35bf28}+12.81\%}$
test_nested_getitem 32.3600μs 8.2185μs 121.6764 KOps/s 106.8548 KOps/s $\textbf{\color{#35bf28}+13.87\%}$
test_stacked_getitemleaf 39.9800μs 8.6718μs 115.3168 KOps/s 102.3183 KOps/s $\textbf{\color{#35bf28}+12.70\%}$
test_stacked_getitem 37.4900μs 8.2184μs 121.6785 KOps/s 107.3088 KOps/s $\textbf{\color{#35bf28}+13.39\%}$
test_lock_nested 2.3515ms 0.3613ms 2.7677 KOps/s 2.7933 KOps/s $\color{#d91a1a}-0.92\%$
test_lock_stack_nested 0.3387ms 0.3140ms 3.1847 KOps/s 3.2313 KOps/s $\color{#d91a1a}-1.44\%$
test_unlock_nested 0.7886ms 0.3559ms 2.8096 KOps/s 2.8406 KOps/s $\color{#d91a1a}-1.09\%$
test_unlock_stack_nested 0.3659ms 0.3238ms 3.0883 KOps/s 3.1399 KOps/s $\color{#d91a1a}-1.64\%$
test_flatten_speed 0.3040ms 0.1966ms 5.0864 KOps/s 3.8028 KOps/s $\textbf{\color{#35bf28}+33.75\%}$
test_unflatten_speed 0.3470ms 0.3244ms 3.0829 KOps/s 2.7976 KOps/s $\textbf{\color{#35bf28}+10.20\%}$
test_common_ops 1.0047ms 0.5709ms 1.7516 KOps/s 1.5488 KOps/s $\textbf{\color{#35bf28}+13.10\%}$
test_creation 33.5600μs 1.5867μs 630.2489 KOps/s 637.5470 KOps/s $\color{#d91a1a}-1.14\%$
test_creation_empty 36.5400μs 6.4584μs 154.8368 KOps/s 100.5559 KOps/s $\textbf{\color{#35bf28}+53.98\%}$
test_creation_nested_1 49.8400μs 8.1202μs 123.1491 KOps/s 84.4166 KOps/s $\textbf{\color{#35bf28}+45.88\%}$
test_creation_nested_2 34.4410μs 10.6903μs 93.5430 KOps/s 68.9741 KOps/s $\textbf{\color{#35bf28}+35.62\%}$
test_clone 65.7610μs 14.0933μs 70.9557 KOps/s 71.1952 KOps/s $\color{#d91a1a}-0.34\%$
test_getitem[int] 25.6300μs 10.8948μs 91.7871 KOps/s 91.4349 KOps/s $\color{#35bf28}+0.39\%$
test_getitem[slice_int] 46.6300μs 22.1320μs 45.1833 KOps/s 46.0830 KOps/s $\color{#d91a1a}-1.95\%$
test_getitem[range] 68.8510μs 50.7664μs 19.6981 KOps/s 18.8479 KOps/s $\color{#35bf28}+4.51\%$
test_getitem[tuple] 69.9710μs 18.9337μs 52.8159 KOps/s 52.3383 KOps/s $\color{#35bf28}+0.91\%$
test_getitem[list] 0.1501ms 37.0962μs 26.9569 KOps/s 26.0273 KOps/s $\color{#35bf28}+3.57\%$
test_setitem_dim[int] 45.9500μs 29.3410μs 34.0821 KOps/s 32.5050 KOps/s $\color{#35bf28}+4.85\%$
test_setitem_dim[slice_int] 69.8600μs 51.2065μs 19.5288 KOps/s 19.0842 KOps/s $\color{#35bf28}+2.33\%$
test_setitem_dim[range] 0.1028ms 68.6127μs 14.5746 KOps/s 13.7751 KOps/s $\textbf{\color{#35bf28}+5.80\%}$
test_setitem_dim[tuple] 60.4610μs 42.9162μs 23.3012 KOps/s 22.0281 KOps/s $\textbf{\color{#35bf28}+5.78\%}$
test_setitem 49.7510μs 18.4024μs 54.3407 KOps/s 49.5803 KOps/s $\textbf{\color{#35bf28}+9.60\%}$
test_set 56.9400μs 17.7877μs 56.2186 KOps/s 50.8744 KOps/s $\textbf{\color{#35bf28}+10.50\%}$
test_set_shared 0.1267s 0.1285ms 7.7798 KOps/s 9.5916 KOps/s $\textbf{\color{#d91a1a}-18.89\%}$
test_update 83.1500μs 18.5510μs 53.9056 KOps/s 43.6708 KOps/s $\textbf{\color{#35bf28}+23.44\%}$
test_update_nested 81.3000μs 25.5785μs 39.0953 KOps/s 33.5450 KOps/s $\textbf{\color{#35bf28}+16.55\%}$
test_set_nested 65.3400μs 19.3367μs 51.7150 KOps/s 48.0388 KOps/s $\textbf{\color{#35bf28}+7.65\%}$
test_set_nested_new 69.8610μs 21.7117μs 46.0581 KOps/s 42.6118 KOps/s $\textbf{\color{#35bf28}+8.09\%}$
test_select 0.9304ms 34.5462μs 28.9467 KOps/s 27.4176 KOps/s $\textbf{\color{#35bf28}+5.58\%}$
test_select_nested 79.9510μs 53.7787μs 18.5947 KOps/s 18.9159 KOps/s $\color{#d91a1a}-1.70\%$
test_exclude_nested 0.2055ms 0.1153ms 8.6719 KOps/s 8.7428 KOps/s $\color{#d91a1a}-0.81\%$
test_empty[True] 1.0075ms 0.3915ms 2.5541 KOps/s 2.5610 KOps/s $\color{#d91a1a}-0.27\%$
test_empty[False] 3.6671μs 0.8446μs 1.1839 MOps/s 1.1747 MOps/s $\color{#35bf28}+0.78\%$
test_to 78.1210μs 57.0566μs 17.5265 KOps/s 15.9087 KOps/s $\textbf{\color{#35bf28}+10.17\%}$
test_to_nonblocking 66.5700μs 36.3990μs 27.4732 KOps/s 26.6268 KOps/s $\color{#35bf28}+3.18\%$
test_unbind_speed 0.3942ms 0.2690ms 3.7178 KOps/s 3.6929 KOps/s $\color{#35bf28}+0.67\%$
test_unbind_speed_stack0 0.3106ms 0.2688ms 3.7209 KOps/s 3.7537 KOps/s $\color{#d91a1a}-0.87\%$
test_unbind_speed_stack1 0.1259s 0.7808ms 1.2807 KOps/s 1.2939 KOps/s $\color{#d91a1a}-1.02\%$
test_split 1.6171ms 1.5491ms 645.5369 Ops/s 648.9359 Ops/s $\color{#d91a1a}-0.52\%$
test_chunk 1.5927ms 1.5433ms 647.9705 Ops/s 653.2277 Ops/s $\color{#d91a1a}-0.80\%$
test_creation[device0] 0.1385ms 74.1879μs 13.4793 KOps/s 13.3807 KOps/s $\color{#35bf28}+0.74\%$
test_creation_from_tensor 0.1083ms 54.3230μs 18.4084 KOps/s 18.3949 KOps/s $\color{#35bf28}+0.07\%$
test_add_one[memmap_tensor0] 92.3810μs 7.1653μs 139.5612 KOps/s 140.2878 KOps/s $\color{#d91a1a}-0.52\%$
test_contiguous[memmap_tensor0] 27.1800μs 0.6526μs 1.5323 MOps/s 1.5501 MOps/s $\color{#d91a1a}-1.15\%$
test_stack[memmap_tensor0] 27.3900μs 4.4695μs 223.7363 KOps/s 215.9596 KOps/s $\color{#35bf28}+3.60\%$
test_memmaptd_index 1.0891ms 0.2641ms 3.7861 KOps/s 3.8195 KOps/s $\color{#d91a1a}-0.87\%$
test_memmaptd_index_astensor 0.5937ms 0.3242ms 3.0848 KOps/s 3.1322 KOps/s $\color{#d91a1a}-1.51\%$
test_memmaptd_index_op 0.8950ms 0.5897ms 1.6957 KOps/s 1.5363 KOps/s $\textbf{\color{#35bf28}+10.38\%}$
test_serialize_model 0.2194s 0.1028s 9.7262 Ops/s 9.1524 Ops/s $\textbf{\color{#35bf28}+6.27\%}$
test_serialize_model_pickle 1.3509s 1.2353s 0.8095 Ops/s 0.8085 Ops/s $\color{#35bf28}+0.12\%$
test_serialize_weights 91.6983ms 86.9980ms 11.4945 Ops/s 10.8313 Ops/s $\textbf{\color{#35bf28}+6.12\%}$
test_serialize_weights_returnearly 62.6786ms 57.4676ms 17.4011 Ops/s 11.0770 Ops/s $\textbf{\color{#35bf28}+57.09\%}$
test_serialize_weights_pickle 1.3588s 1.2486s 0.8009 Ops/s 0.7965 Ops/s $\color{#35bf28}+0.55\%$
test_reshape_pytree 54.6400μs 25.3091μs 39.5115 KOps/s 39.4532 KOps/s $\color{#35bf28}+0.15\%$
test_reshape_td 56.1110μs 30.5902μs 32.6903 KOps/s 30.9690 KOps/s $\textbf{\color{#35bf28}+5.56\%}$
test_view_pytree 41.0700μs 24.9633μs 40.0588 KOps/s 40.2298 KOps/s $\color{#d91a1a}-0.43\%$
test_view_td 0.1368s 58.9276μs 16.9700 KOps/s 20.1020 KOps/s $\textbf{\color{#d91a1a}-15.58\%}$
test_unbind_pytree 58.4500μs 30.5911μs 32.6893 KOps/s 32.1449 KOps/s $\color{#35bf28}+1.69\%$
test_unbind_td 0.1141ms 40.1708μs 24.8937 KOps/s 24.0819 KOps/s $\color{#35bf28}+3.37\%$
test_split_pytree 46.1300μs 29.0046μs 34.4772 KOps/s 32.5985 KOps/s $\textbf{\color{#35bf28}+5.76\%}$
test_split_td 0.1185ms 38.9739μs 25.6582 KOps/s 25.2232 KOps/s $\color{#35bf28}+1.72\%$
test_add_pytree 55.4500μs 36.8897μs 27.1078 KOps/s 27.3237 KOps/s $\color{#d91a1a}-0.79\%$
test_add_td 79.8610μs 48.3158μs 20.6972 KOps/s 18.6080 KOps/s $\textbf{\color{#35bf28}+11.23\%}$
test_distributed 2.0816ms 90.0128μs 11.1095 KOps/s 14.1738 KOps/s $\textbf{\color{#d91a1a}-21.62\%}$
test_tdmodule 59.0700μs 13.1465μs 76.0659 KOps/s 52.9586 KOps/s $\textbf{\color{#35bf28}+43.63\%}$
test_tdmodule_dispatch 41.6810μs 25.4978μs 39.2191 KOps/s 25.7324 KOps/s $\textbf{\color{#35bf28}+52.41\%}$
test_tdseq 31.0500μs 16.3597μs 61.1257 KOps/s 45.3977 KOps/s $\textbf{\color{#35bf28}+34.64\%}$
test_tdseq_dispatch 49.6200μs 29.8312μs 33.5219 KOps/s 23.9935 KOps/s $\textbf{\color{#35bf28}+39.71\%}$
test_instantiation_functorch 1.7472ms 1.6801ms 595.2096 Ops/s 588.3446 Ops/s $\color{#35bf28}+1.17\%$
test_instantiation_td 1.7074ms 1.1708ms 854.1179 Ops/s 849.5044 Ops/s $\color{#35bf28}+0.54\%$
test_exec_functorch 0.2024ms 0.1614ms 6.1966 KOps/s 6.2334 KOps/s $\color{#d91a1a}-0.59\%$
test_exec_functional_call 0.2158ms 0.1591ms 6.2836 KOps/s 6.3222 KOps/s $\color{#d91a1a}-0.61\%$
test_exec_td 0.1821ms 0.1525ms 6.5563 KOps/s 6.5965 KOps/s $\color{#d91a1a}-0.61\%$
test_exec_td_decorator 0.6819ms 0.2007ms 4.9827 KOps/s 5.0836 KOps/s $\color{#d91a1a}-1.99\%$
test_vmap_mlp_speed[True-True] 0.6734ms 0.6047ms 1.6537 KOps/s 1.6356 KOps/s $\color{#35bf28}+1.11\%$
test_vmap_mlp_speed[True-False] 0.6422ms 0.6034ms 1.6574 KOps/s 1.6432 KOps/s $\color{#35bf28}+0.87\%$
test_vmap_mlp_speed[False-True] 0.5768ms 0.5341ms 1.8722 KOps/s 1.8675 KOps/s $\color{#35bf28}+0.25\%$
test_vmap_mlp_speed[False-False] 0.5873ms 0.5338ms 1.8733 KOps/s 1.8717 KOps/s $\color{#35bf28}+0.08\%$
test_vmap_mlp_speed_decorator[True-True] 1.0713ms 0.6249ms 1.6002 KOps/s 1.5397 KOps/s $\color{#35bf28}+3.93\%$
test_vmap_mlp_speed_decorator[True-False] 0.7477ms 0.6182ms 1.6176 KOps/s 1.5349 KOps/s $\textbf{\color{#35bf28}+5.39\%}$
test_vmap_mlp_speed_decorator[False-True] 0.6697ms 0.5487ms 1.8223 KOps/s 1.8191 KOps/s $\color{#35bf28}+0.18\%$
test_vmap_mlp_speed_decorator[False-False] 0.8023ms 0.5484ms 1.8235 KOps/s 1.8102 KOps/s $\color{#35bf28}+0.73\%$
test_vmap_transformer_speed[True-True] 8.2153ms 8.1160ms 123.2129 Ops/s 122.9375 Ops/s $\color{#35bf28}+0.22\%$
test_vmap_transformer_speed[True-False] 8.1726ms 8.1099ms 123.3053 Ops/s 123.5025 Ops/s $\color{#d91a1a}-0.16\%$
test_vmap_transformer_speed[False-True] 8.1114ms 8.0556ms 124.1370 Ops/s 123.9391 Ops/s $\color{#35bf28}+0.16\%$
test_vmap_transformer_speed[False-False] 8.0985ms 8.0213ms 124.6675 Ops/s 124.6670 Ops/s $+0.00\%$
test_vmap_transformer_speed_decorator[True-True] 19.1042ms 18.9825ms 52.6801 Ops/s 51.5885 Ops/s $\color{#35bf28}+2.12\%$
test_vmap_transformer_speed_decorator[True-False] 19.0998ms 19.0003ms 52.6307 Ops/s 51.4294 Ops/s $\color{#35bf28}+2.34\%$
test_vmap_transformer_speed_decorator[False-True] 19.0754ms 18.9750ms 52.7008 Ops/s 52.5118 Ops/s $\color{#35bf28}+0.36\%$
test_vmap_transformer_speed_decorator[False-False] 19.0530ms 18.9594ms 52.7442 Ops/s 52.7244 Ops/s $\color{#35bf28}+0.04\%$
test_to_module_speed[True] 1.3766ms 1.2723ms 785.9948 Ops/s 782.3934 Ops/s $\color{#35bf28}+0.46\%$
test_to_module_speed[False] 2.3732ms 1.2230ms 817.6807 Ops/s 808.4627 Ops/s $\color{#35bf28}+1.14\%$

@vmoens vmoens merged commit 551331d into main Feb 26, 2024
47 of 48 checks passed
@vmoens vmoens deleted the stack-non-tensor branch February 26, 2024 23:47
vmoens added a commit that referenced this pull request Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants