Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] grad and data for tensorclasses #904

Merged
merged 2 commits into from
Jul 19, 2024
Merged

Conversation

vmoens
Copy link
Contributor

@vmoens vmoens commented Jul 19, 2024

No description provided.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 19, 2024
@vmoens vmoens added the enhancement New feature or request label Jul 19, 2024
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of CPU Benchmark Tests

Total Benchmarks: 144. Improved: $\large\color{#35bf28}24$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 51.0260μs 21.1951μs 47.1807 KOps/s 46.3557 KOps/s $\color{#35bf28}+1.78\%$
test_plain_set_stack_nested 54.4310μs 21.5088μs 46.4926 KOps/s 46.1894 KOps/s $\color{#35bf28}+0.66\%$
test_plain_set_nested_inplace 66.0230μs 23.4562μs 42.6327 KOps/s 42.1527 KOps/s $\color{#35bf28}+1.14\%$
test_plain_set_stack_nested_inplace 79.0180μs 23.4014μs 42.7324 KOps/s 42.4058 KOps/s $\color{#35bf28}+0.77\%$
test_items 29.3850μs 2.6703μs 374.4874 KOps/s 384.9355 KOps/s $\color{#d91a1a}-2.71\%$
test_items_nested 0.5178ms 0.3658ms 2.7335 KOps/s 2.7672 KOps/s $\color{#d91a1a}-1.22\%$
test_items_nested_locked 0.4740ms 0.3648ms 2.7409 KOps/s 2.7505 KOps/s $\color{#d91a1a}-0.35\%$
test_items_nested_leaf 0.1751ms 87.4353μs 11.4370 KOps/s 11.5714 KOps/s $\color{#d91a1a}-1.16\%$
test_items_stack_nested 0.5989ms 0.3632ms 2.7532 KOps/s 2.7484 KOps/s $\color{#35bf28}+0.18\%$
test_items_stack_nested_leaf 0.1525ms 88.0684μs 11.3548 KOps/s 11.3414 KOps/s $\color{#35bf28}+0.12\%$
test_items_stack_nested_locked 1.4865ms 0.3666ms 2.7275 KOps/s 2.7348 KOps/s $\color{#d91a1a}-0.27\%$
test_keys 29.5650μs 3.8694μs 258.4408 KOps/s 250.0362 KOps/s $\color{#35bf28}+3.36\%$
test_keys_nested 0.2478ms 0.1438ms 6.9538 KOps/s 6.9634 KOps/s $\color{#d91a1a}-0.14\%$
test_keys_nested_locked 0.8173ms 0.1498ms 6.6751 KOps/s 6.6394 KOps/s $\color{#35bf28}+0.54\%$
test_keys_nested_leaf 0.2116ms 0.1226ms 8.1594 KOps/s 8.1674 KOps/s $\color{#d91a1a}-0.10\%$
test_keys_stack_nested 0.2522ms 0.1451ms 6.8899 KOps/s 6.9555 KOps/s $\color{#d91a1a}-0.94\%$
test_keys_stack_nested_leaf 0.2143ms 0.1229ms 8.1343 KOps/s 8.1141 KOps/s $\color{#35bf28}+0.25\%$
test_keys_stack_nested_locked 0.2411ms 0.1494ms 6.6944 KOps/s 6.6516 KOps/s $\color{#35bf28}+0.64\%$
test_values 8.6060μs 1.1520μs 868.0416 KOps/s 856.5357 KOps/s $\color{#35bf28}+1.34\%$
test_values_nested 89.9570μs 50.1450μs 19.9422 KOps/s 19.8873 KOps/s $\color{#35bf28}+0.28\%$
test_values_nested_locked 0.1297ms 50.4074μs 19.8384 KOps/s 19.9931 KOps/s $\color{#d91a1a}-0.77\%$
test_values_nested_leaf 82.2340μs 45.3495μs 22.0510 KOps/s 22.3714 KOps/s $\color{#d91a1a}-1.43\%$
test_values_stack_nested 94.5470μs 50.6952μs 19.7257 KOps/s 19.7640 KOps/s $\color{#d91a1a}-0.19\%$
test_values_stack_nested_leaf 0.1070ms 45.7251μs 21.8698 KOps/s 22.2294 KOps/s $\color{#d91a1a}-1.62\%$
test_values_stack_nested_locked 0.1336ms 50.6324μs 19.7502 KOps/s 19.7601 KOps/s $\color{#d91a1a}-0.05\%$
test_membership 2.4485μs 0.7288μs 1.3722 MOps/s 1.1091 MOps/s $\textbf{\color{#35bf28}+23.72\%}$
test_membership_nested 29.0350μs 2.6812μs 372.9733 KOps/s 366.3534 KOps/s $\color{#35bf28}+1.81\%$
test_membership_nested_leaf 49.6520μs 2.6898μs 371.7808 KOps/s 365.2147 KOps/s $\color{#35bf28}+1.80\%$
test_membership_stacked_nested 23.2440μs 2.6834μs 372.6587 KOps/s 359.7256 KOps/s $\color{#35bf28}+3.60\%$
test_membership_stacked_nested_leaf 38.2210μs 2.7370μs 365.3602 KOps/s 328.8534 KOps/s $\textbf{\color{#35bf28}+11.10\%}$
test_membership_nested_last 30.6070μs 4.0451μs 247.2103 KOps/s 249.4387 KOps/s $\color{#d91a1a}-0.89\%$
test_membership_nested_leaf_last 23.7240μs 4.0440μs 247.2775 KOps/s 247.2145 KOps/s $\color{#35bf28}+0.03\%$
test_membership_stacked_nested_last 35.7370μs 4.5713μs 218.7583 KOps/s 252.3663 KOps/s $\textbf{\color{#d91a1a}-13.32\%}$
test_membership_stacked_nested_leaf_last 24.2450μs 4.6727μs 214.0097 KOps/s 248.4814 KOps/s $\textbf{\color{#d91a1a}-13.87\%}$
test_nested_getleaf 34.1540μs 10.9142μs 91.6240 KOps/s 90.8065 KOps/s $\color{#35bf28}+0.90\%$
test_nested_get 41.3370μs 10.5260μs 95.0032 KOps/s 97.2059 KOps/s $\color{#d91a1a}-2.27\%$
test_stacked_getleaf 36.4580μs 11.0031μs 90.8832 KOps/s 91.6695 KOps/s $\color{#d91a1a}-0.86\%$
test_stacked_get 35.2860μs 10.3175μs 96.9229 KOps/s 96.7558 KOps/s $\color{#35bf28}+0.17\%$
test_nested_getitemleaf 43.5710μs 11.3649μs 87.9899 KOps/s 87.5971 KOps/s $\color{#35bf28}+0.45\%$
test_nested_getitem 46.4670μs 10.5228μs 95.0315 KOps/s 95.7963 KOps/s $\color{#d91a1a}-0.80\%$
test_stacked_getitemleaf 48.4410μs 11.4198μs 87.5670 KOps/s 89.6079 KOps/s $\color{#d91a1a}-2.28\%$
test_stacked_getitem 47.2780μs 10.5456μs 94.8266 KOps/s 97.2471 KOps/s $\color{#d91a1a}-2.49\%$
test_lock_nested 0.9992ms 0.5075ms 1.9703 KOps/s 1.7058 KOps/s $\textbf{\color{#35bf28}+15.50\%}$
test_lock_stack_nested 0.8167ms 0.4819ms 2.0752 KOps/s 2.0634 KOps/s $\color{#35bf28}+0.57\%$
test_unlock_nested 0.8084ms 0.4281ms 2.3357 KOps/s 1.9573 KOps/s $\textbf{\color{#35bf28}+19.33\%}$
test_unlock_stack_nested 0.6438ms 0.3975ms 2.5157 KOps/s 2.5124 KOps/s $\color{#35bf28}+0.13\%$
test_flatten_speed 0.2405ms 0.1081ms 9.2478 KOps/s 9.4442 KOps/s $\color{#d91a1a}-2.08\%$
test_unflatten_speed 0.5470ms 0.4442ms 2.2513 KOps/s 2.2572 KOps/s $\color{#d91a1a}-0.26\%$
test_common_ops 1.8211ms 1.0812ms 924.8665 Ops/s 897.3385 Ops/s $\color{#35bf28}+3.07\%$
test_creation 92.2520μs 2.4792μs 403.3505 KOps/s 396.9578 KOps/s $\color{#35bf28}+1.61\%$
test_creation_empty 54.6820μs 17.4540μs 57.2933 KOps/s 53.9922 KOps/s $\textbf{\color{#35bf28}+6.11\%}$
test_creation_nested_1 63.0080μs 21.0167μs 47.5813 KOps/s 46.1294 KOps/s $\color{#35bf28}+3.15\%$
test_creation_nested_2 57.7680μs 24.7647μs 40.3801 KOps/s 38.9029 KOps/s $\color{#35bf28}+3.80\%$
test_clone 72.2440μs 16.9607μs 58.9598 KOps/s 57.2911 KOps/s $\color{#35bf28}+2.91\%$
test_getitem[int] 0.9338ms 12.6450μs 79.0827 KOps/s 78.5365 KOps/s $\color{#35bf28}+0.70\%$
test_getitem[slice_int] 0.1222ms 31.8758μs 31.3718 KOps/s 30.8818 KOps/s $\color{#35bf28}+1.59\%$
test_getitem[range] 0.2701ms 55.9735μs 17.8656 KOps/s 17.5680 KOps/s $\color{#35bf28}+1.69\%$
test_getitem[tuple] 0.1838ms 26.2366μs 38.1147 KOps/s 37.7443 KOps/s $\color{#35bf28}+0.98\%$
test_getitem[list] 0.3431ms 49.0621μs 20.3823 KOps/s 19.1984 KOps/s $\textbf{\color{#35bf28}+6.17\%}$
test_setitem_dim[int] 55.9240μs 31.0047μs 32.2531 KOps/s 31.1486 KOps/s $\color{#35bf28}+3.55\%$
test_setitem_dim[slice_int] 0.1184ms 66.9752μs 14.9309 KOps/s 14.1100 KOps/s $\textbf{\color{#35bf28}+5.82\%}$
test_setitem_dim[range] 0.1307ms 87.5227μs 11.4256 KOps/s 11.0484 KOps/s $\color{#35bf28}+3.41\%$
test_setitem_dim[tuple] 85.1590μs 54.7042μs 18.2801 KOps/s 17.3749 KOps/s $\textbf{\color{#35bf28}+5.21\%}$
test_setitem 0.1104ms 28.1436μs 35.5320 KOps/s 33.2765 KOps/s $\textbf{\color{#35bf28}+6.78\%}$
test_set 0.1823ms 27.3329μs 36.5859 KOps/s 34.6009 KOps/s $\textbf{\color{#35bf28}+5.74\%}$
test_set_shared 3.3364ms 0.2165ms 4.6200 KOps/s 4.6547 KOps/s $\color{#d91a1a}-0.75\%$
test_update 0.1985ms 33.6879μs 29.6843 KOps/s 27.8567 KOps/s $\textbf{\color{#35bf28}+6.56\%}$
test_update_nested 0.1578ms 43.2339μs 23.1300 KOps/s 22.0045 KOps/s $\textbf{\color{#35bf28}+5.11\%}$
test_update__nested 0.1581ms 34.0066μs 29.4061 KOps/s 28.9460 KOps/s $\color{#35bf28}+1.59\%$
test_set_nested 0.1393ms 30.1530μs 33.1642 KOps/s 31.3991 KOps/s $\textbf{\color{#35bf28}+5.62\%}$
test_set_nested_new 0.1797ms 34.8941μs 28.6582 KOps/s 27.1920 KOps/s $\textbf{\color{#35bf28}+5.39\%}$
test_select 0.1221ms 52.0430μs 19.2149 KOps/s 18.7216 KOps/s $\color{#35bf28}+2.63\%$
test_select_nested 0.1528ms 60.7521μs 16.4603 KOps/s 16.7550 KOps/s $\color{#d91a1a}-1.76\%$
test_exclude_nested 0.1570ms 80.4873μs 12.4243 KOps/s 12.6809 KOps/s $\color{#d91a1a}-2.02\%$
test_empty[True] 0.7512ms 0.3436ms 2.9103 KOps/s 2.9749 KOps/s $\color{#d91a1a}-2.17\%$
test_empty[False] 13.7390μs 1.2425μs 804.8024 KOps/s 796.8952 KOps/s $\color{#35bf28}+0.99\%$
test_unbind_speed 0.5130ms 0.3222ms 3.1040 KOps/s 3.1136 KOps/s $\color{#d91a1a}-0.31\%$
test_unbind_speed_stack0 0.7449ms 0.3205ms 3.1198 KOps/s 3.1544 KOps/s $\color{#d91a1a}-1.10\%$
test_unbind_speed_stack1 83.6750ms 0.8287ms 1.2067 KOps/s 1.3022 KOps/s $\textbf{\color{#d91a1a}-7.33\%}$
test_split 76.4589ms 2.2146ms 451.5464 Ops/s 442.7136 Ops/s $\color{#35bf28}+2.00\%$
test_chunk 78.5357ms 2.2177ms 450.9204 Ops/s 410.3240 Ops/s $\textbf{\color{#35bf28}+9.89\%}$
test_creation[device0] 4.1083ms 0.1223ms 8.1791 KOps/s 8.2124 KOps/s $\color{#d91a1a}-0.41\%$
test_creation_from_tensor 0.2574ms 0.1185ms 8.4391 KOps/s 8.3805 KOps/s $\color{#35bf28}+0.70\%$
test_add_one[memmap_tensor0] 0.1606ms 7.8955μs 126.6552 KOps/s 124.8608 KOps/s $\color{#35bf28}+1.44\%$
test_contiguous[memmap_tensor0] 23.3440μs 2.2221μs 450.0332 KOps/s 467.9648 KOps/s $\color{#d91a1a}-3.83\%$
test_stack[memmap_tensor0] 78.0850μs 5.9195μs 168.9344 KOps/s 169.4848 KOps/s $\color{#d91a1a}-0.32\%$
test_memmaptd_index 1.2972ms 0.4319ms 2.3154 KOps/s 2.3388 KOps/s $\color{#d91a1a}-1.00\%$
test_memmaptd_index_astensor 0.7470ms 0.5026ms 1.9897 KOps/s 1.9637 KOps/s $\color{#35bf28}+1.32\%$
test_memmaptd_index_op 1.4077ms 1.0214ms 979.0341 Ops/s 950.8310 Ops/s $\color{#35bf28}+2.97\%$
test_serialize_model 0.2044s 0.1407s 7.1056 Ops/s 7.8946 Ops/s $\textbf{\color{#d91a1a}-9.99\%}$
test_serialize_model_pickle 0.4515s 0.3966s 2.5214 Ops/s 2.4897 Ops/s $\color{#35bf28}+1.27\%$
test_serialize_weights 0.1298s 0.1245s 8.0341 Ops/s 7.1043 Ops/s $\textbf{\color{#35bf28}+13.09\%}$
test_serialize_weights_returnearly 0.1852s 0.1678s 5.9585 Ops/s 5.9026 Ops/s $\color{#35bf28}+0.95\%$
test_serialize_weights_pickle 1.0533s 0.7427s 1.3464 Ops/s 2.4235 Ops/s $\textbf{\color{#d91a1a}-44.45\%}$
test_serialize_weights_filesystem 0.1550s 0.1434s 6.9743 Ops/s 6.9419 Ops/s $\color{#35bf28}+0.47\%$
test_serialize_model_filesystem 0.1548s 0.1457s 6.8647 Ops/s 6.0947 Ops/s $\textbf{\color{#35bf28}+12.63\%}$
test_reshape_pytree 85.7710μs 39.6912μs 25.1945 KOps/s 26.0755 KOps/s $\color{#d91a1a}-3.38\%$
test_reshape_td 0.1050ms 50.5104μs 19.7979 KOps/s 20.1431 KOps/s $\color{#d91a1a}-1.71\%$
test_view_pytree 88.3160μs 39.6711μs 25.2073 KOps/s 25.1947 KOps/s $\color{#35bf28}+0.05\%$
test_view_td 0.1440ms 57.5327μs 17.3814 KOps/s 17.2109 KOps/s $\color{#35bf28}+0.99\%$
test_unbind_pytree 97.2110μs 36.0554μs 27.7351 KOps/s 27.3301 KOps/s $\color{#35bf28}+1.48\%$
test_unbind_td 0.3761ms 47.4005μs 21.0968 KOps/s 20.8849 KOps/s $\color{#35bf28}+1.01\%$
test_split_pytree 80.1400μs 38.7093μs 25.8336 KOps/s 26.2727 KOps/s $\color{#d91a1a}-1.67\%$
test_split_td 0.5560ms 61.3841μs 16.2909 KOps/s 16.4323 KOps/s $\color{#d91a1a}-0.86\%$
test_add_pytree 90.5190μs 44.1528μs 22.6486 KOps/s 22.4159 KOps/s $\color{#35bf28}+1.04\%$
test_add_td 0.1481ms 80.3172μs 12.4506 KOps/s 11.8108 KOps/s $\textbf{\color{#35bf28}+5.42\%}$
test_distributed 1.4906ms 0.1315ms 7.6017 KOps/s 7.3497 KOps/s $\color{#35bf28}+3.43\%$
test_tdmodule 55.6340μs 15.8907μs 62.9297 KOps/s 57.0194 KOps/s $\textbf{\color{#35bf28}+10.37\%}$
test_tdmodule_dispatch 57.8780μs 33.9663μs 29.4409 KOps/s 27.1012 KOps/s $\textbf{\color{#35bf28}+8.63\%}$
test_tdseq 34.0540μs 17.8646μs 55.9766 KOps/s 52.1978 KOps/s $\textbf{\color{#35bf28}+7.24\%}$
test_tdseq_dispatch 64.8910μs 37.9470μs 26.3525 KOps/s 23.9626 KOps/s $\textbf{\color{#35bf28}+9.97\%}$
test_instantiation_functorch 1.8680ms 1.5864ms 630.3631 Ops/s 623.8587 Ops/s $\color{#35bf28}+1.04\%$
test_instantiation_td 81.1004ms 1.2600ms 793.6748 Ops/s 856.6215 Ops/s $\textbf{\color{#d91a1a}-7.35\%}$
test_exec_functorch 0.3169ms 0.1820ms 5.4955 KOps/s 5.3777 KOps/s $\color{#35bf28}+2.19\%$
test_exec_functional_call 0.3317ms 0.1730ms 5.7795 KOps/s 5.8070 KOps/s $\color{#d91a1a}-0.47\%$
test_exec_td 0.2820ms 0.1733ms 5.7687 KOps/s 5.4209 KOps/s $\textbf{\color{#35bf28}+6.42\%}$
test_exec_td_decorator 1.0195ms 0.2565ms 3.8988 KOps/s 3.8601 KOps/s $\color{#35bf28}+1.00\%$
test_vmap_mlp_speed[True-True] 1.0644ms 0.6027ms 1.6592 KOps/s 1.6300 KOps/s $\color{#35bf28}+1.79\%$
test_vmap_mlp_speed[True-False] 0.8950ms 0.5963ms 1.6770 KOps/s 1.6479 KOps/s $\color{#35bf28}+1.77\%$
test_vmap_mlp_speed[False-True] 0.7310ms 0.4995ms 2.0019 KOps/s 1.9776 KOps/s $\color{#35bf28}+1.23\%$
test_vmap_mlp_speed[False-False] 0.7960ms 0.4956ms 2.0178 KOps/s 1.9839 KOps/s $\color{#35bf28}+1.71\%$
test_vmap_mlp_speed_decorator[True-True] 1.0688ms 0.6886ms 1.4523 KOps/s 1.4249 KOps/s $\color{#35bf28}+1.93\%$
test_vmap_mlp_speed_decorator[True-False] 1.0353ms 0.6887ms 1.4521 KOps/s 1.4359 KOps/s $\color{#35bf28}+1.13\%$
test_vmap_mlp_speed_decorator[False-True] 1.0194ms 0.5774ms 1.7318 KOps/s 1.7132 KOps/s $\color{#35bf28}+1.08\%$
test_vmap_mlp_speed_decorator[False-False] 0.9246ms 0.5804ms 1.7229 KOps/s 1.7162 KOps/s $\color{#35bf28}+0.39\%$
test_to_module_speed[True] 2.8180ms 1.7979ms 556.2123 Ops/s 557.2429 Ops/s $\color{#d91a1a}-0.18\%$
test_to_module_speed[False] 2.0787ms 1.7650ms 566.5763 Ops/s 571.7137 Ops/s $\color{#d91a1a}-0.90\%$
test_tc_init 88.7970μs 44.0693μs 22.6915 KOps/s 23.5764 KOps/s $\color{#d91a1a}-3.75\%$
test_tc_init_nested 0.1585ms 87.0463μs 11.4881 KOps/s 11.4958 KOps/s $\color{#d91a1a}-0.07\%$
test_tc_first_layer_tensor 34.0440μs 9.2608μs 107.9824 KOps/s 111.7676 KOps/s $\color{#d91a1a}-3.39\%$
test_tc_first_layer_nontensor 32.6210μs 9.1976μs 108.7246 KOps/s 110.2235 KOps/s $\color{#d91a1a}-1.36\%$
test_tc_second_layer_tensor 32.0700μs 2.8612μs 349.5059 KOps/s 360.9619 KOps/s $\color{#d91a1a}-3.17\%$
test_tc_second_layer_nontensor 37.0490μs 10.3962μs 96.1885 KOps/s 99.4335 KOps/s $\color{#d91a1a}-3.26\%$
test_unbind 97.3084ms 12.8342ms 77.9171 Ops/s 73.0647 Ops/s $\textbf{\color{#35bf28}+6.64\%}$
test_full_like 11.7384ms 7.9320ms 126.0722 Ops/s 134.4754 Ops/s $\textbf{\color{#d91a1a}-6.25\%}$
test_zeros_like 11.5188ms 7.6050ms 131.4928 Ops/s 143.3878 Ops/s $\textbf{\color{#d91a1a}-8.30\%}$
test_ones_like 12.0563ms 7.5870ms 131.8052 Ops/s 126.5863 Ops/s $\color{#35bf28}+4.12\%$
test_clone 12.8221ms 8.8912ms 112.4714 Ops/s 111.5478 Ops/s $\color{#35bf28}+0.83\%$
test_squeeze 70.1910μs 14.0237μs 71.3080 KOps/s 70.5200 KOps/s $\color{#35bf28}+1.12\%$
test_unsqueeze 0.2108ms 97.0056μs 10.3087 KOps/s 10.1014 KOps/s $\color{#35bf28}+2.05\%$
test_split 0.4477ms 0.2092ms 4.7790 KOps/s 4.8188 KOps/s $\color{#d91a1a}-0.83\%$
test_permute 0.3574ms 0.2290ms 4.3674 KOps/s 4.4706 KOps/s $\color{#d91a1a}-2.31\%$
test_stack 28.5600ms 24.4114ms 40.9644 Ops/s 40.4560 Ops/s $\color{#35bf28}+1.26\%$
test_cat 29.5394ms 24.0248ms 41.6236 Ops/s 41.0165 Ops/s $\color{#35bf28}+1.48\%$

@vmoens vmoens merged commit 24efbcb into main Jul 19, 2024
38 of 41 checks passed
Copy link

$\color{#D29922}\textsf{\Large⚠\kern{0.2cm}\normalsize Warning}$ Result of GPU Benchmark Tests

Total Benchmarks: 219. Improved: $\large\color{#35bf28}13$. Worsened: $\large\color{#d91a1a}8$.

Expand to view detailed results
Name Max Mean Ops Ops on Repo HEAD Change
test_plain_set_nested 40.1010μs 16.6648μs 60.0066 KOps/s 56.9594 KOps/s $\textbf{\color{#35bf28}+5.35\%}$
test_plain_set_stack_nested 0.1402ms 16.7211μs 59.8047 KOps/s 56.7335 KOps/s $\textbf{\color{#35bf28}+5.41\%}$
test_plain_set_nested_inplace 38.8410μs 17.9962μs 55.5673 KOps/s 53.1535 KOps/s $\color{#35bf28}+4.54\%$
test_plain_set_stack_nested_inplace 39.8310μs 17.9466μs 55.7208 KOps/s 53.6446 KOps/s $\color{#35bf28}+3.87\%$
test_items 17.4100μs 4.7546μs 210.3225 KOps/s 211.0559 KOps/s $\color{#d91a1a}-0.35\%$
test_items_nested 0.4440ms 0.4008ms 2.4952 KOps/s 2.5619 KOps/s $\color{#d91a1a}-2.60\%$
test_items_nested_locked 0.4222ms 0.3998ms 2.5012 KOps/s 2.5249 KOps/s $\color{#d91a1a}-0.94\%$
test_items_nested_leaf 0.1060ms 85.7015μs 11.6684 KOps/s 11.5463 KOps/s $\color{#35bf28}+1.06\%$
test_items_stack_nested 0.4458ms 0.3936ms 2.5408 KOps/s 2.5276 KOps/s $\color{#35bf28}+0.53\%$
test_items_stack_nested_leaf 0.1016ms 86.5591μs 11.5528 KOps/s 11.5199 KOps/s $\color{#35bf28}+0.29\%$
test_items_stack_nested_locked 0.4302ms 0.4021ms 2.4869 KOps/s 2.5312 KOps/s $\color{#d91a1a}-1.75\%$
test_keys 17.2800μs 4.3639μs 229.1506 KOps/s 228.7743 KOps/s $\color{#35bf28}+0.16\%$
test_keys_nested 85.9730μs 67.4913μs 14.8167 KOps/s 15.1929 KOps/s $\color{#d91a1a}-2.48\%$
test_keys_nested_locked 0.9158ms 72.3816μs 13.8157 KOps/s 13.5941 KOps/s $\color{#35bf28}+1.63\%$
test_keys_nested_leaf 76.7720μs 56.9047μs 17.5732 KOps/s 17.7862 KOps/s $\color{#d91a1a}-1.20\%$
test_keys_stack_nested 0.2399ms 66.5337μs 15.0300 KOps/s 15.0114 KOps/s $\color{#35bf28}+0.12\%$
test_keys_stack_nested_leaf 0.2236ms 56.2978μs 17.7627 KOps/s 17.3896 KOps/s $\color{#35bf28}+2.15\%$
test_keys_stack_nested_locked 0.2599ms 71.9810μs 13.8926 KOps/s 13.8236 KOps/s $\color{#35bf28}+0.50\%$
test_values 62.4450μs 1.7493μs 571.6460 KOps/s 562.6350 KOps/s $\color{#35bf28}+1.60\%$
test_values_nested 49.1610μs 33.8985μs 29.4998 KOps/s 29.6296 KOps/s $\color{#d91a1a}-0.44\%$
test_values_nested_locked 0.2309ms 35.6621μs 28.0410 KOps/s 27.9598 KOps/s $\color{#35bf28}+0.29\%$
test_values_nested_leaf 52.3610μs 30.1087μs 33.2129 KOps/s 33.0408 KOps/s $\color{#35bf28}+0.52\%$
test_values_stack_nested 57.6730μs 34.6930μs 28.8243 KOps/s 29.1207 KOps/s $\color{#d91a1a}-1.02\%$
test_values_stack_nested_leaf 0.1650ms 30.8026μs 32.4648 KOps/s 32.6667 KOps/s $\color{#d91a1a}-0.62\%$
test_values_stack_nested_locked 0.1367ms 36.6219μs 27.3061 KOps/s 27.7118 KOps/s $\color{#d91a1a}-1.46\%$
test_membership 1.3275μs 0.5392μs 1.8548 MOps/s 1.8464 MOps/s $\color{#35bf28}+0.45\%$
test_membership_nested 15.2210μs 2.0978μs 476.6923 KOps/s 480.6010 KOps/s $\color{#d91a1a}-0.81\%$
test_membership_nested_leaf 10.2505μs 2.0172μs 495.7457 KOps/s 492.4195 KOps/s $\color{#35bf28}+0.68\%$
test_membership_stacked_nested 19.5010μs 2.0681μs 483.5318 KOps/s 478.4099 KOps/s $\color{#35bf28}+1.07\%$
test_membership_stacked_nested_leaf 15.1300μs 2.0815μs 480.4138 KOps/s 480.5678 KOps/s $\color{#d91a1a}-0.03\%$
test_membership_nested_last 27.4010μs 2.9802μs 335.5507 KOps/s 330.7265 KOps/s $\color{#35bf28}+1.46\%$
test_membership_nested_leaf_last 27.6300μs 2.9697μs 336.7400 KOps/s 330.8515 KOps/s $\color{#35bf28}+1.78\%$
test_membership_stacked_nested_last 28.2210μs 9.1427μs 109.3775 KOps/s 288.0346 KOps/s $\textbf{\color{#d91a1a}-62.03\%}$
test_membership_stacked_nested_leaf_last 25.1810μs 9.2178μs 108.4861 KOps/s 290.7614 KOps/s $\textbf{\color{#d91a1a}-62.69\%}$
test_nested_getleaf 26.3300μs 8.0977μs 123.4926 KOps/s 124.3144 KOps/s $\color{#d91a1a}-0.66\%$
test_nested_get 23.1710μs 7.6213μs 131.2110 KOps/s 132.2257 KOps/s $\color{#d91a1a}-0.77\%$
test_stacked_getleaf 24.9810μs 8.0739μs 123.8556 KOps/s 123.5019 KOps/s $\color{#35bf28}+0.29\%$
test_stacked_get 29.6410μs 7.5431μs 132.5716 KOps/s 132.4665 KOps/s $\color{#35bf28}+0.08\%$
test_nested_getitemleaf 64.6610μs 8.1834μs 122.1988 KOps/s 122.3977 KOps/s $\color{#d91a1a}-0.16\%$
test_nested_getitem 23.2510μs 7.7221μs 129.4992 KOps/s 129.8298 KOps/s $\color{#d91a1a}-0.25\%$
test_stacked_getitemleaf 24.3410μs 8.2016μs 121.9271 KOps/s 121.4170 KOps/s $\color{#35bf28}+0.42\%$
test_stacked_getitem 31.0110μs 7.7392μs 129.2129 KOps/s 129.1854 KOps/s $\color{#35bf28}+0.02\%$
test_lock_nested 1.0553ms 0.4738ms 2.1108 KOps/s 2.1170 KOps/s $\color{#d91a1a}-0.29\%$
test_lock_stack_nested 0.5442ms 0.4218ms 2.3707 KOps/s 2.2934 KOps/s $\color{#35bf28}+3.37\%$
test_unlock_nested 0.8169ms 0.3921ms 2.5507 KOps/s 2.5439 KOps/s $\color{#35bf28}+0.26\%$
test_unlock_stack_nested 0.5283ms 0.3418ms 2.9260 KOps/s 2.8395 KOps/s $\color{#35bf28}+3.05\%$
test_flatten_speed 0.2164ms 0.1049ms 9.5337 KOps/s 9.4190 KOps/s $\color{#35bf28}+1.22\%$
test_unflatten_speed 0.3125ms 0.2914ms 3.4314 KOps/s 3.3808 KOps/s $\color{#35bf28}+1.50\%$
test_common_ops 1.5964ms 1.3379ms 747.4414 Ops/s 725.8187 Ops/s $\color{#35bf28}+2.98\%$
test_creation 16.8510μs 1.9599μs 510.2184 KOps/s 514.7404 KOps/s $\color{#d91a1a}-0.88\%$
test_creation_empty 33.7910μs 17.1420μs 58.3362 KOps/s 53.0736 KOps/s $\textbf{\color{#35bf28}+9.92\%}$
test_creation_nested_1 0.1149ms 19.1035μs 52.3463 KOps/s 47.7541 KOps/s $\textbf{\color{#35bf28}+9.62\%}$
test_creation_nested_2 41.7710μs 22.0749μs 45.3003 KOps/s 42.3061 KOps/s $\textbf{\color{#35bf28}+7.08\%}$
test_clone 0.1774ms 30.3060μs 32.9968 KOps/s 32.7868 KOps/s $\color{#35bf28}+0.64\%$
test_getitem[int] 1.2204ms 16.5929μs 60.2668 KOps/s 59.7231 KOps/s $\color{#35bf28}+0.91\%$
test_getitem[slice_int] 0.2035ms 29.4975μs 33.9012 KOps/s 33.0570 KOps/s $\color{#35bf28}+2.55\%$
test_getitem[range] 0.2385ms 0.1123ms 8.9043 KOps/s 8.8276 KOps/s $\color{#35bf28}+0.87\%$
test_getitem[tuple] 0.1704ms 25.0368μs 39.9411 KOps/s 37.9400 KOps/s $\textbf{\color{#35bf28}+5.27\%}$
test_getitem[list] 0.2503ms 0.1024ms 9.7613 KOps/s 9.1077 KOps/s $\textbf{\color{#35bf28}+7.18\%}$
test_setitem_dim[int] 0.1795ms 53.7644μs 18.5997 KOps/s 16.9572 KOps/s $\textbf{\color{#35bf28}+9.69\%}$
test_setitem_dim[slice_int] 0.2281ms 82.2584μs 12.1568 KOps/s 11.7789 KOps/s $\color{#35bf28}+3.21\%$
test_setitem_dim[range] 0.3050ms 0.1467ms 6.8177 KOps/s 6.6574 KOps/s $\color{#35bf28}+2.41\%$
test_setitem_dim[tuple] 0.2371ms 73.9016μs 13.5315 KOps/s 12.9827 KOps/s $\color{#35bf28}+4.23\%$
test_setitem 0.2289ms 47.7231μs 20.9542 KOps/s 20.4589 KOps/s $\color{#35bf28}+2.42\%$
test_set 0.2172ms 46.4533μs 21.5270 KOps/s 20.7360 KOps/s $\color{#35bf28}+3.81\%$
test_set_shared 0.4194ms 54.6707μs 18.2913 KOps/s 17.7905 KOps/s $\color{#35bf28}+2.81\%$
test_update 0.2011ms 51.0447μs 19.5907 KOps/s 17.6888 KOps/s $\textbf{\color{#35bf28}+10.75\%}$
test_update_nested 0.2461ms 63.3124μs 15.7947 KOps/s 15.2540 KOps/s $\color{#35bf28}+3.54\%$
test_update__nested 0.2423ms 65.6895μs 15.2231 KOps/s 14.7663 KOps/s $\color{#35bf28}+3.09\%$
test_set_nested 0.2234ms 48.6427μs 20.5581 KOps/s 19.7251 KOps/s $\color{#35bf28}+4.22\%$
test_set_nested_new 0.2316ms 52.2692μs 19.1317 KOps/s 18.3201 KOps/s $\color{#35bf28}+4.43\%$
test_select 0.2298ms 68.3825μs 14.6236 KOps/s 14.2181 KOps/s $\color{#35bf28}+2.85\%$
test_select_nested 0.1777ms 53.3003μs 18.7616 KOps/s 18.4679 KOps/s $\color{#35bf28}+1.59\%$
test_exclude_nested 0.1959ms 72.6556μs 13.7636 KOps/s 13.5351 KOps/s $\color{#35bf28}+1.69\%$
test_empty[True] 0.3540ms 0.3030ms 3.3001 KOps/s 3.3263 KOps/s $\color{#d91a1a}-0.79\%$
test_empty[False] 2.2810μs 0.9304μs 1.0748 MOps/s 1.0942 MOps/s $\color{#d91a1a}-1.77\%$
test_to 0.1481ms 38.5576μs 25.9353 KOps/s 26.1420 KOps/s $\color{#d91a1a}-0.79\%$
test_to_nonblocking 0.1095ms 24.7754μs 40.3626 KOps/s 42.0701 KOps/s $\color{#d91a1a}-4.06\%$
test_unbind_speed 0.5051ms 0.3078ms 3.2485 KOps/s 3.3027 KOps/s $\color{#d91a1a}-1.64\%$
test_unbind_speed_stack0 0.3999ms 0.2946ms 3.3944 KOps/s 3.3024 KOps/s $\color{#35bf28}+2.79\%$
test_unbind_speed_stack1 93.2608ms 0.8276ms 1.2083 KOps/s 1.2804 KOps/s $\textbf{\color{#d91a1a}-5.63\%}$
test_split 92.1612ms 2.3225ms 430.5648 Ops/s 435.0392 Ops/s $\color{#d91a1a}-1.03\%$
test_chunk 2.3739ms 2.1285ms 469.8204 Ops/s 430.9545 Ops/s $\textbf{\color{#35bf28}+9.02\%}$
test_creation[device0] 0.2900ms 0.1038ms 9.6372 KOps/s 9.5877 KOps/s $\color{#35bf28}+0.52\%$
test_creation_from_tensor 0.3145ms 0.1060ms 9.4317 KOps/s 9.9745 KOps/s $\textbf{\color{#d91a1a}-5.44\%}$
test_add_one[memmap_tensor0] 21.2510μs 8.6594μs 115.4819 KOps/s 115.7026 KOps/s $\color{#d91a1a}-0.19\%$
test_contiguous[memmap_tensor0] 0.1150ms 2.1593μs 463.1094 KOps/s 461.2617 KOps/s $\color{#35bf28}+0.40\%$
test_stack[memmap_tensor0] 55.8410μs 6.5297μs 153.1463 KOps/s 152.4144 KOps/s $\color{#35bf28}+0.48\%$
test_memmaptd_index 1.3826ms 0.4227ms 2.3659 KOps/s 2.3942 KOps/s $\color{#d91a1a}-1.18\%$
test_memmaptd_index_astensor 0.8574ms 0.4873ms 2.0523 KOps/s 2.0741 KOps/s $\color{#d91a1a}-1.05\%$
test_memmaptd_index_op 1.4819ms 1.0346ms 966.5912 Ops/s 965.4060 Ops/s $\color{#35bf28}+0.12\%$
test_serialize_model 0.1009s 97.0387ms 10.3052 Ops/s 10.0918 Ops/s $\color{#35bf28}+2.11\%$
test_serialize_model_pickle 1.3475s 1.2375s 0.8081 Ops/s 0.8072 Ops/s $\color{#35bf28}+0.11\%$
test_serialize_weights 96.0035ms 92.7856ms 10.7775 Ops/s 9.1239 Ops/s $\textbf{\color{#35bf28}+18.12\%}$
test_serialize_weights_returnearly 89.4512ms 72.8553ms 13.7258 Ops/s 14.0488 Ops/s $\color{#d91a1a}-2.30\%$
test_serialize_weights_pickle 1.3513s 1.2237s 0.8172 Ops/s 0.8182 Ops/s $\color{#d91a1a}-0.13\%$
test_reshape_pytree 0.1838ms 38.8866μs 25.7158 KOps/s 25.8300 KOps/s $\color{#d91a1a}-0.44\%$
test_reshape_td 84.6120μs 45.8119μs 21.8284 KOps/s 21.8215 KOps/s $\color{#35bf28}+0.03\%$
test_view_pytree 0.2745ms 38.6121μs 25.8986 KOps/s 26.1689 KOps/s $\color{#d91a1a}-1.03\%$
test_view_td 0.2486ms 54.0185μs 18.5122 KOps/s 18.4220 KOps/s $\color{#35bf28}+0.49\%$
test_unbind_pytree 0.1593ms 36.6378μs 27.2942 KOps/s 26.2123 KOps/s $\color{#35bf28}+4.13\%$
test_unbind_td 0.3751ms 45.5833μs 21.9379 KOps/s 21.9666 KOps/s $\color{#d91a1a}-0.13\%$
test_split_pytree 0.3469ms 51.8724μs 19.2781 KOps/s 19.4907 KOps/s $\color{#d91a1a}-1.09\%$
test_split_td 91.0628ms 70.1126μs 14.2628 KOps/s 14.3498 KOps/s $\color{#d91a1a}-0.61\%$
test_add_pytree 0.2050ms 58.5318μs 17.0847 KOps/s 16.6386 KOps/s $\color{#35bf28}+2.68\%$
test_add_td 0.4172ms 0.1038ms 9.6298 KOps/s 9.6474 KOps/s $\color{#d91a1a}-0.18\%$
test_compile_add_one_nested[tensordict-compile] 0.4137ms 0.2062ms 4.8489 KOps/s 4.8347 KOps/s $\color{#35bf28}+0.29\%$
test_compile_add_one_nested[tensordict-eager] 0.3188ms 0.1758ms 5.6875 KOps/s 5.7340 KOps/s $\color{#d91a1a}-0.81\%$
test_compile_add_one_nested[pytree-compile] 0.2845ms 0.1439ms 6.9495 KOps/s 6.9284 KOps/s $\color{#35bf28}+0.30\%$
test_compile_add_one_nested[pytree-eager] 0.3647ms 0.1942ms 5.1503 KOps/s 5.2010 KOps/s $\color{#d91a1a}-0.98\%$
test_compile_copy_nested[tensordict-compile] 0.1515ms 21.5861μs 46.3260 KOps/s 45.8068 KOps/s $\color{#35bf28}+1.13\%$
test_compile_copy_nested[tensordict-eager] 0.1885ms 48.4879μs 20.6237 KOps/s 20.6822 KOps/s $\color{#d91a1a}-0.28\%$
test_compile_copy_nested[pytree-compile] 0.1598ms 72.5545μs 13.7828 KOps/s 13.8260 KOps/s $\color{#d91a1a}-0.31\%$
test_compile_copy_nested[pytree-eager] 0.1208ms 59.5846μs 16.7829 KOps/s 16.6599 KOps/s $\color{#35bf28}+0.74\%$
test_compile_add_one_flat[tensordict-compile] 0.4343ms 0.3242ms 3.0849 KOps/s 3.0942 KOps/s $\color{#d91a1a}-0.30\%$
test_compile_add_one_flat[tensordict-eager] 0.3410ms 0.2228ms 4.4887 KOps/s 4.4784 KOps/s $\color{#35bf28}+0.23\%$
test_compile_add_one_flat[tensorclass-compile] 0.2926ms 0.1344ms 7.4425 KOps/s 7.7399 KOps/s $\color{#d91a1a}-3.84\%$
test_compile_add_one_flat[tensorclass-eager] 0.2516ms 66.5338μs 15.0299 KOps/s 15.7708 KOps/s $\color{#d91a1a}-4.70\%$
test_compile_add_one_flat[pytree-compile] 0.4283ms 0.3244ms 3.0825 KOps/s 3.1017 KOps/s $\color{#d91a1a}-0.62\%$
test_compile_add_one_flat[pytree-eager] 0.8663ms 0.6635ms 1.5071 KOps/s 1.6174 KOps/s $\textbf{\color{#d91a1a}-6.82\%}$
test_compile_add_self_flat[tensordict-eager] 0.4752ms 0.2758ms 3.6259 KOps/s 3.6601 KOps/s $\color{#d91a1a}-0.93\%$
test_compile_add_self_flat[tensordict-compile] 0.4904ms 0.3276ms 3.0529 KOps/s 3.0578 KOps/s $\color{#d91a1a}-0.16\%$
test_compile_add_self_flat[tensorclass-eager] 0.2813ms 79.7626μs 12.5372 KOps/s 12.6710 KOps/s $\color{#d91a1a}-1.06\%$
test_compile_add_self_flat[tensorclass-compile] 0.2977ms 0.1340ms 7.4638 KOps/s 7.4508 KOps/s $\color{#35bf28}+0.17\%$
test_compile_add_self_flat[pytree-eager] 0.7553ms 0.5441ms 1.8378 KOps/s 1.8762 KOps/s $\color{#d91a1a}-2.05\%$
test_compile_add_self_flat[pytree-compile] 0.4458ms 0.3221ms 3.1042 KOps/s 3.0826 KOps/s $\color{#35bf28}+0.70\%$
test_compile_copy_flat[tensordict-compile] 0.1419ms 18.6080μs 53.7404 KOps/s 51.2897 KOps/s $\color{#35bf28}+4.78\%$
test_compile_copy_flat[tensordict-eager] 67.1420μs 32.9510μs 30.3481 KOps/s 30.7431 KOps/s $\color{#d91a1a}-1.28\%$
test_compile_copy_flat[pytree-compile] 0.1097ms 74.9015μs 13.3509 KOps/s 13.0662 KOps/s $\color{#35bf28}+2.18\%$
test_compile_copy_flat[pytree-eager] 92.4730μs 60.4134μs 16.5526 KOps/s 16.3672 KOps/s $\color{#35bf28}+1.13\%$
test_compile_assign_and_add[tensordict-compile] 2.7822ms 0.9734ms 1.0273 KOps/s 1.0475 KOps/s $\color{#d91a1a}-1.93\%$
test_compile_assign_and_add[tensordict-eager] 3.6439ms 3.3066ms 302.4240 Ops/s 308.8549 Ops/s $\color{#d91a1a}-2.08\%$
test_compile_assign_and_add[pytree-compile] 2.5985ms 0.9305ms 1.0747 KOps/s 1.0666 KOps/s $\color{#35bf28}+0.76\%$
test_compile_assign_and_add[pytree-eager] 3.4319ms 3.1719ms 315.2719 Ops/s 314.4910 Ops/s $\color{#35bf28}+0.25\%$
test_compile_indexing[tensor-tensordict-compile] 0.2813ms 0.1128ms 8.8664 KOps/s 9.1573 KOps/s $\color{#d91a1a}-3.18\%$
test_compile_indexing[tensor-tensordict-eager] 0.2567ms 66.7888μs 14.9726 KOps/s 16.2945 KOps/s $\textbf{\color{#d91a1a}-8.11\%}$
test_compile_indexing[tensor-tensorclass-compile] 0.2508ms 0.1022ms 9.7803 KOps/s 9.8404 KOps/s $\color{#d91a1a}-0.61\%$
test_compile_indexing[tensor-tensorclass-eager] 0.1952ms 44.8062μs 22.3184 KOps/s 22.4887 KOps/s $\color{#d91a1a}-0.76\%$
test_compile_indexing[tensor-pytree-compile] 0.2494ms 0.1024ms 9.7612 KOps/s 9.3526 KOps/s $\color{#35bf28}+4.37\%$
test_compile_indexing[tensor-pytree-eager] 0.1923ms 45.0072μs 22.2187 KOps/s 21.1833 KOps/s $\color{#35bf28}+4.89\%$
test_compile_indexing[slice-tensordict-compile] 0.2761ms 0.1383ms 7.2308 KOps/s 7.2459 KOps/s $\color{#d91a1a}-0.21\%$
test_compile_indexing[slice-tensordict-eager] 0.1999ms 26.2953μs 38.0295 KOps/s 38.7162 KOps/s $\color{#d91a1a}-1.77\%$
test_compile_indexing[slice-tensorclass-compile] 0.2741ms 0.1294ms 7.7304 KOps/s 7.7279 KOps/s $\color{#35bf28}+0.03\%$
test_compile_indexing[slice-tensorclass-eager] 0.1351ms 22.5439μs 44.3578 KOps/s 44.9007 KOps/s $\color{#d91a1a}-1.21\%$
test_compile_indexing[slice-pytree-compile] 0.2802ms 0.1294ms 7.7274 KOps/s 7.5000 KOps/s $\color{#35bf28}+3.03\%$
test_compile_indexing[slice-pytree-eager] 53.8010μs 22.8799μs 43.7065 KOps/s 44.3448 KOps/s $\color{#d91a1a}-1.44\%$
test_compile_indexing[int-tensordict-compile] 0.2874ms 0.1381ms 7.2410 KOps/s 7.2132 KOps/s $\color{#35bf28}+0.39\%$
test_compile_indexing[int-tensordict-eager] 0.5210ms 26.5643μs 37.6445 KOps/s 39.2206 KOps/s $\color{#d91a1a}-4.02\%$
test_compile_indexing[int-tensorclass-compile] 0.2741ms 0.1294ms 7.7294 KOps/s 7.6199 KOps/s $\color{#35bf28}+1.44\%$
test_compile_indexing[int-tensorclass-eager] 0.1382ms 23.0498μs 43.3844 KOps/s 44.9965 KOps/s $\color{#d91a1a}-3.58\%$
test_compile_indexing[int-pytree-compile] 0.2809ms 0.1293ms 7.7350 KOps/s 7.4246 KOps/s $\color{#35bf28}+4.18\%$
test_compile_indexing[int-pytree-eager] 50.3610μs 22.7091μs 44.0352 KOps/s 43.9705 KOps/s $\color{#35bf28}+0.15\%$
test_mod_add[eager] 0.1835ms 37.4056μs 26.7339 KOps/s 26.4104 KOps/s $\color{#35bf28}+1.22\%$
test_mod_add[compile] 0.2450ms 68.4863μs 14.6015 KOps/s 14.7263 KOps/s $\color{#d91a1a}-0.85\%$
test_mod_add[compile-overhead] 0.2624ms 0.1450ms 6.8959 KOps/s 6.6611 KOps/s $\color{#35bf28}+3.52\%$
test_mod_wrap[eager] 0.4181ms 0.2506ms 3.9911 KOps/s 4.0149 KOps/s $\color{#d91a1a}-0.59\%$
test_mod_wrap[compile] 0.4621ms 0.2984ms 3.3511 KOps/s 3.3508 KOps/s $+0.01\%$
test_mod_wrap[compile-overhead] 8.1936ms 4.3655ms 229.0689 Ops/s 233.5367 Ops/s $\color{#d91a1a}-1.91\%$
test_mod_wrap_and_backward[eager] 1.7401ms 1.4185ms 704.9905 Ops/s 700.1131 Ops/s $\color{#35bf28}+0.70\%$
test_mod_wrap_and_backward[compile] 1.7857ms 1.4743ms 678.2800 Ops/s 675.1183 Ops/s $\color{#35bf28}+0.47\%$
test_mod_wrap_and_backward[compile-overhead] 1.4662ms 0.9955ms 1.0046 KOps/s 992.5771 Ops/s $\color{#35bf28}+1.21\%$
test_seq_add[eager] 0.2524ms 0.1088ms 9.1880 KOps/s 8.9020 KOps/s $\color{#35bf28}+3.21\%$
test_seq_add[compile] 0.2637ms 87.6502μs 11.4090 KOps/s 11.6768 KOps/s $\color{#d91a1a}-2.29\%$
test_seq_add[compile-overhead] 0.3037ms 0.1263ms 7.9160 KOps/s 8.2274 KOps/s $\color{#d91a1a}-3.78\%$
test_seq_wrap[eager] 0.6386ms 0.4416ms 2.2643 KOps/s 2.3535 KOps/s $\color{#d91a1a}-3.79\%$
test_seq_wrap[compile] 1.5464ms 0.3401ms 2.9402 KOps/s 3.0192 KOps/s $\color{#d91a1a}-2.62\%$
test_seq_wrap[compile-overhead] 0.3066s 0.1467s 6.8159 Ops/s 6.7561 Ops/s $\color{#35bf28}+0.89\%$
test_func_call_runtime[False-eager] 0.9729ms 0.7410ms 1.3495 KOps/s 1.3618 KOps/s $\color{#d91a1a}-0.90\%$
test_func_call_runtime[False-compile] 0.9901ms 0.8278ms 1.2080 KOps/s 1.2049 KOps/s $\color{#35bf28}+0.26\%$
test_func_call_runtime[False-compile-overhead] 0.5346ms 0.3713ms 2.6935 KOps/s 2.6920 KOps/s $\color{#35bf28}+0.06\%$
test_func_call_runtime[True-eager] 1.2591ms 0.9927ms 1.0074 KOps/s 1.0121 KOps/s $\color{#d91a1a}-0.47\%$
test_func_call_runtime[True-compile] 1.0333ms 0.8712ms 1.1478 KOps/s 1.1567 KOps/s $\color{#d91a1a}-0.77\%$
test_func_call_runtime[True-compile-overhead] 0.5792ms 0.4128ms 2.4225 KOps/s 2.4325 KOps/s $\color{#d91a1a}-0.41\%$
test_distributed 2.5607ms 72.9434μs 13.7093 KOps/s 14.2064 KOps/s $\color{#d91a1a}-3.50\%$
test_tdmodule 38.7410μs 16.6809μs 59.9489 KOps/s 59.0896 KOps/s $\color{#35bf28}+1.45\%$
test_tdmodule_dispatch 53.0410μs 33.7473μs 29.6320 KOps/s 29.0043 KOps/s $\color{#35bf28}+2.16\%$
test_tdseq 32.8810μs 16.9850μs 58.8753 KOps/s 57.6200 KOps/s $\color{#35bf28}+2.18\%$
test_tdseq_dispatch 54.2610μs 35.9868μs 27.7880 KOps/s 27.2601 KOps/s $\color{#35bf28}+1.94\%$
test_instantiation_functorch 2.2367ms 2.0166ms 495.8894 Ops/s 504.9812 Ops/s $\color{#d91a1a}-1.80\%$
test_instantiation_td 2.0427ms 1.3104ms 763.1180 Ops/s 767.2721 Ops/s $\color{#d91a1a}-0.54\%$
test_exec_functorch 0.3969ms 0.2303ms 4.3419 KOps/s 4.5002 KOps/s $\color{#d91a1a}-3.52\%$
test_exec_functional_call 0.4288ms 0.2303ms 4.3413 KOps/s 4.6229 KOps/s $\textbf{\color{#d91a1a}-6.09\%}$
test_exec_td 0.4263ms 0.2313ms 4.3240 KOps/s 4.6336 KOps/s $\textbf{\color{#d91a1a}-6.68\%}$
test_exec_td_decorator 0.5029ms 0.3053ms 3.2754 KOps/s 3.4324 KOps/s $\color{#d91a1a}-4.57\%$
test_vmap_mlp_speed[True-True] 1.1326ms 0.6959ms 1.4371 KOps/s 1.5111 KOps/s $\color{#d91a1a}-4.90\%$
test_vmap_mlp_speed[True-False] 0.8748ms 0.6935ms 1.4419 KOps/s 1.5160 KOps/s $\color{#d91a1a}-4.89\%$
test_vmap_mlp_speed[False-True] 0.7970ms 0.6104ms 1.6382 KOps/s 1.7002 KOps/s $\color{#d91a1a}-3.65\%$
test_vmap_mlp_speed[False-False] 0.7994ms 0.6070ms 1.6474 KOps/s 1.6603 KOps/s $\color{#d91a1a}-0.78\%$
test_vmap_mlp_speed_decorator[True-True] 1.1838ms 0.7565ms 1.3219 KOps/s 1.3534 KOps/s $\color{#d91a1a}-2.32\%$
test_vmap_mlp_speed_decorator[True-False] 0.9612ms 0.7567ms 1.3215 KOps/s 1.3580 KOps/s $\color{#d91a1a}-2.69\%$
test_vmap_mlp_speed_decorator[False-True] 0.9286ms 0.6454ms 1.5495 KOps/s 1.5403 KOps/s $\color{#35bf28}+0.60\%$
test_vmap_mlp_speed_decorator[False-False] 0.8328ms 0.6400ms 1.5625 KOps/s 1.5365 KOps/s $\color{#35bf28}+1.70\%$
test_vmap_transformer_speed[True-True] 9.2810ms 8.6221ms 115.9814 Ops/s 116.6427 Ops/s $\color{#d91a1a}-0.57\%$
test_vmap_transformer_speed[True-False] 8.7915ms 8.5221ms 117.3422 Ops/s 116.9780 Ops/s $\color{#35bf28}+0.31\%$
test_vmap_transformer_speed[False-True] 9.1360ms 8.5545ms 116.8972 Ops/s 117.5954 Ops/s $\color{#d91a1a}-0.59\%$
test_vmap_transformer_speed[False-False] 8.6737ms 8.4344ms 118.5621 Ops/s 117.2465 Ops/s $\color{#35bf28}+1.12\%$
test_vmap_transformer_speed_decorator[True-True] 21.4650ms 20.4838ms 48.8192 Ops/s 48.5833 Ops/s $\color{#35bf28}+0.49\%$
test_vmap_transformer_speed_decorator[True-False] 20.9730ms 20.4156ms 48.9822 Ops/s 48.7020 Ops/s $\color{#35bf28}+0.58\%$
test_vmap_transformer_speed_decorator[False-True] 20.8915ms 20.1900ms 49.5294 Ops/s 49.3271 Ops/s $\color{#35bf28}+0.41\%$
test_vmap_transformer_speed_decorator[False-False] 21.0686ms 20.2677ms 49.3395 Ops/s 48.9703 Ops/s $\color{#35bf28}+0.75\%$
test_to_module_speed[True] 1.6163ms 1.4898ms 671.2368 Ops/s 670.0949 Ops/s $\color{#35bf28}+0.17\%$
test_to_module_speed[False] 1.5940ms 1.4652ms 682.4839 Ops/s 678.1156 Ops/s $\color{#35bf28}+0.64\%$
test_tc_init 56.6120μs 36.9251μs 27.0818 KOps/s 25.2378 KOps/s $\textbf{\color{#35bf28}+7.31\%}$
test_tc_init_nested 0.1845ms 76.5208μs 13.0683 KOps/s 12.1407 KOps/s $\textbf{\color{#35bf28}+7.64\%}$
test_tc_first_layer_tensor 19.8310μs 3.9787μs 251.3371 KOps/s 251.4609 KOps/s $\color{#d91a1a}-0.05\%$
test_tc_first_layer_nontensor 26.4600μs 3.9895μs 250.6607 KOps/s 248.2448 KOps/s $\color{#35bf28}+0.97\%$
test_tc_second_layer_tensor 6.1252μs 1.3051μs 766.1978 KOps/s 776.4833 KOps/s $\color{#d91a1a}-1.32\%$
test_tc_second_layer_nontensor 20.2700μs 4.6085μs 216.9886 KOps/s 216.0545 KOps/s $\color{#35bf28}+0.43\%$
test_unbind 0.3207s 13.0766ms 76.4727 Ops/s 76.0125 Ops/s $\color{#35bf28}+0.61\%$
test_full_like 0.7636ms 0.5769ms 1.7333 KOps/s 1.7290 KOps/s $\color{#35bf28}+0.24\%$
test_zeros_like 0.3487ms 0.1979ms 5.0531 KOps/s 5.0469 KOps/s $\color{#35bf28}+0.12\%$
test_ones_like 0.3595ms 0.1979ms 5.0520 KOps/s 5.0510 KOps/s $\color{#35bf28}+0.02\%$
test_clone 0.5687ms 0.4143ms 2.4136 KOps/s 2.4034 KOps/s $\color{#35bf28}+0.43\%$
test_squeeze 0.1366ms 11.6507μs 85.8314 KOps/s 84.8725 KOps/s $\color{#35bf28}+1.13\%$
test_unsqueeze 0.2810ms 85.9941μs 11.6287 KOps/s 11.6640 KOps/s $\color{#d91a1a}-0.30\%$
test_split 0.4912ms 0.1850ms 5.4044 KOps/s 5.4564 KOps/s $\color{#d91a1a}-0.95\%$
test_permute 0.3748ms 0.2020ms 4.9515 KOps/s 5.0250 KOps/s $\color{#d91a1a}-1.46\%$
test_stack 1.3801ms 0.8985ms 1.1130 KOps/s 1.1143 KOps/s $\color{#d91a1a}-0.12\%$
test_cat 1.3710ms 1.2320ms 811.6871 Ops/s 811.6402 Ops/s $+0.01\%$

@vmoens vmoens deleted the tensorclass-data-grad branch October 21, 2024 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants