Expand benchmarks for dataset insertion and creation #7236

hmaarrfk · 2022-10-29T13:55:19Z

Taken from discussions in #7224 (comment)

Closes #xxxx
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

hmaarrfk · 2022-10-29T13:55:33Z

$ asv run -E existing --quick --bench merge
· Discovering benchmarks
· Running 5 total benchmarks (1 commits * 1 environments * 5 benchmarks)
[  0.00%] ·· Benchmarking existing-py_home_mark_mambaforge_envs_mcam_dev_bin_python
[ 10.00%] ··· merge.DatasetAddVariable.time_dict_of_dataarrays_to_dataset                                             ok
[ 10.00%] ··· =================== ==========
               existing_elements
              ------------------- ----------
                       0           762±0μs
                       10          7.18±0ms
                      100          12.6±0ms
                      1000         89.1±0ms
              =================== ==========

[ 20.00%] ··· merge.DatasetAddVariable.time_dict_of_tuples_to_dataset                                                 ok
[ 20.00%] ··· =================== ==========
               existing_elements
              ------------------- ----------
                       0           889±0μs
                       10          2.01±0ms
                      100          1.34±0ms
                      1000         605±0μs
              =================== ==========

[ 30.00%] ··· merge.DatasetAddVariable.time_dict_of_variables_to_dataset                                              ok
[ 30.00%] ··· =================== ==========
               existing_elements
              ------------------- ----------
                       0           2.48±0ms
                       10          2.06±0ms
                      100          2.13±0ms
                      1000         2.38±0ms
              =================== ==========

[ 40.00%] ··· merge.DatasetAddVariable.time_merge_two_datasets                                                        ok
[ 40.00%] ··· =================== ==========
               existing_elements
              ------------------- ----------
                       0           814±0μs
                       10          945±0μs
                      100          2.42±0ms
                      1000         5.23±0ms
              =================== ==========

[ 50.00%] ··· merge.DatasetAddVariable.time_variable_insertion                                                        ok
[ 50.00%] ··· =================== ==========
               existing_elements
              ------------------- ----------
                       0           1.10±0ms
                       10          954±0μs
                      100          1.88±0ms
                      1000         5.29±0ms
              =================== ==========

hmaarrfk · 2022-10-29T14:28:25Z

On the CI, it reports similar findings:

[ 67.73%] ··· ...dVariable.time_dict_of_dataarrays_to_dataset                 ok
[ 67.73%] ··· =================== =============
               existing_elements               
              ------------------- -------------
                       0            269±0.9μs  
                       10          2.21±0.01ms 
                      100          16.5±0.07ms 
                      1000          153±0.9ms  
              =================== =============

[ 67.88%] ··· ...etAddVariable.time_dict_of_tuples_to_dataset                 ok
[ 67.88%] ··· =================== ===========
               existing_elements             
              ------------------- -----------
                       0           269±0.6μs 
                       10          289±0.4μs 
                      100           293±1μs  
                      1000         346±0.4μs 
              =================== ===========

[ 68.02%] ··· ...ddVariable.time_dict_of_variables_to_dataset                 ok
[ 68.02%] ··· =================== =============
               existing_elements               
              ------------------- -------------
                       0             270±1μs   
                       10           329±0.6μs  
                      100            636±1μs   
                      1000         3.70±0.01ms 
              =================== =============

[ 68.17%] ··· ...e.DatasetAddVariable.time_merge_two_datasets                 ok
[ 68.17%] ··· =================== =============
               existing_elements               
              ------------------- -------------
                       0            104±0.5μs  
                       10           235±0.6μs  
                      100            1.05±0ms  
                      1000         9.02±0.02ms 
              =================== =============

[ 68.31%] ··· ...e.DatasetAddVariable.time_variable_insertion                 ok
[ 68.31%] ··· =================== =============
               existing_elements               
              ------------------- -------------
                       0             119±1μs   
                       10           225±0.7μs  
                      100            1.04±0ms  
                      1000         9.03±0.03ms 
              =================== =============

headtr1ck

I think these different ways of creating a Dataset can be combined into a matrix test by passing a function that creates your data_vars.

asv_bench/benchmarks/merge.py

hmaarrfk · 2022-10-29T17:11:30Z

With the right window size it looks like:

[ 50.00%] ··· ==================== ========== ========== ========== ========== ==========
              --                                           count
              -------------------- ------------------------------------------------------
                    strategy           0          1          10        100        1000
              ==================== ========== ========== ========== ========== ==========
               dict_of_DataArrays   1.32±0ms   5.87±0ms   7.58±0ms   18.7±0ms   98.6±0ms
               dict_of_Variables    2.70±0ms   2.91±0ms   3.01±0ms   3.91±0ms   7.04±0ms
                 dict_of_Tuples     2.84±0ms   3.02±0ms   3.22±0ms   3.42±0ms   3.02±0ms
              ==================== ========== ========== ========== ========== ==========

headtr1ck · 2022-10-29T18:32:40Z

I still think that the DataArray version is "unfair" since you have coords in every dict item, so xarray has to check if they are the same.

hmaarrfk · 2022-10-29T18:57:54Z

What about just specifying "dims"?

hmaarrfk · 2022-10-29T18:58:35Z

[ 50.00%] ··· ==================== ========== ========== ========== ========== ==========
              --                                           count
              -------------------- ------------------------------------------------------
                    strategy           0          1          10        100        1000
              ==================== ========== ========== ========== ========== ==========
               dict_of_DataArrays   1.65±0ms   3.83±0ms   4.03±0ms   6.14±0ms   16.6±0ms
               dict_of_Variables    3.04±0ms   3.24±0ms   3.38±0ms   4.04±0ms   9.91±0ms
                 dict_of_Tuples     2.90±0ms   3.03±0ms   3.32±0ms   3.22±0ms   3.22±0ms
              ==================== ========== ========== ========== ========== ==========

as you though, the numbers improve quite a bit.

I kinda want to understand why a no-op takes 1 ms! ^_^

@Illviljan

Taken from discussions in pydata#7224 (comment) Thank you @Illviljan

asv_bench/benchmarks/merge.py

Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>

hmaarrfk · 2022-10-29T22:11:33Z

Well now the benchmarks look like they make more sense:

[ 50.00%] ··· ==================== ========== ========== ========== ========== ==========
              --                                           count
              -------------------- ------------------------------------------------------
                    strategy           0          1          10        100        1000
              ==================== ========== ========== ========== ========== ==========
               dict_of_DataArrays   1.56±0ms   3.60±0ms   5.83±0ms   16.6±0ms   67.3±0ms
               dict_of_Variables    1.65±0ms   3.11±0ms   4.03±0ms   6.22±0ms   18.9±0ms
                 dict_of_Tuples     2.42±0ms   3.11±0ms   982±0μs    5.17±0ms   17.2±0ms
              ==================== ========== ========== ========== ========== ==========

Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>

for more information, see https://pre-commit.ci

asv_bench/benchmarks/merge.py

for more information, see https://pre-commit.ci

Illviljan · 2022-10-31T15:04:13Z

Thanks @hmaarrfk !

github-actions bot added run-benchmark Run the ASV benchmark workflow topic-performance labels Oct 29, 2022

headtr1ck reviewed Oct 29, 2022

View reviewed changes

asv_bench/benchmarks/merge.py Outdated Show resolved Hide resolved

asv_bench/benchmarks/merge.py Outdated Show resolved Hide resolved

hmaarrfk force-pushed the expand_merge_benchmarks branch from 6c7ff52 to 1a66d38 Compare October 29, 2022 17:10

hmaarrfk force-pushed the expand_merge_benchmarks branch from f8d69f6 to 73d5f79 Compare October 29, 2022 18:02

hmaarrfk force-pushed the expand_merge_benchmarks branch from 6309b9f to 4658836 Compare October 29, 2022 18:59

Expand benchmarks for dataset insertion and creation

0be3712

Taken from discussions in pydata#7224 (comment) Thank you @Illviljan

hmaarrfk force-pushed the expand_merge_benchmarks branch from 4658836 to 0be3712 Compare October 29, 2022 19:11

Illviljan reviewed Oct 29, 2022

View reviewed changes

asv_bench/benchmarks/merge.py Outdated Show resolved Hide resolved

Apply suggestions from code review

e414117

Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>

hmaarrfk and others added 5 commits October 29, 2022 18:20

Update asv_bench/benchmarks/merge.py

be99e2c

Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>

Move data set creation definition

4bacd9c

[pre-commit.ci] auto fixes from pre-commit.com hooks

dd88a2c

for more information, see https://pre-commit.ci

Add attrs

eb398f1

[pre-commit.ci] auto fixes from pre-commit.com hooks

1de4279

for more information, see https://pre-commit.ci

Illviljan reviewed Oct 30, 2022

View reviewed changes

asv_bench/benchmarks/merge.py Outdated Show resolved Hide resolved

Update asv_bench/benchmarks/merge.py

13a5c53

Illviljan reviewed Oct 30, 2022

View reviewed changes

asv_bench/benchmarks/merge.py Outdated Show resolved Hide resolved

Illviljan and others added 2 commits October 30, 2022 05:18

Update asv_bench/benchmarks/merge.py

35132bc

[pre-commit.ci] auto fixes from pre-commit.com hooks

bab7cbb

for more information, see https://pre-commit.ci

Illviljan approved these changes Oct 30, 2022

View reviewed changes

Illviljan added the plan to merge Final call for comments label Oct 30, 2022

headtr1ck approved these changes Oct 30, 2022

View reviewed changes

Illviljan merged commit bc35e39 into pydata:main Oct 31, 2022

Uh oh!

Expand benchmarks for dataset insertion and creation #7236

Expand benchmarks for dataset insertion and creation #7236

Uh oh!

Conversation

hmaarrfk commented Oct 29, 2022

Uh oh!

hmaarrfk commented Oct 29, 2022

Uh oh!

hmaarrfk commented Oct 29, 2022

Uh oh!

headtr1ck left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hmaarrfk commented Oct 29, 2022

Uh oh!

headtr1ck commented Oct 29, 2022

Uh oh!

hmaarrfk commented Oct 29, 2022

Uh oh!

hmaarrfk commented Oct 29, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hmaarrfk commented Oct 29, 2022

Uh oh!

Uh oh!

Uh oh!

Illviljan commented Oct 31, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants