Skip to content

Conversation

@hmaarrfk
Copy link
Contributor

Taken from discussions in #7224 (comment)

Thank you @Illviljan

  • Closes #xxxx
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

@github-actions github-actions bot added run-benchmark Run the ASV benchmark workflow topic-performance labels Oct 29, 2022
@hmaarrfk
Copy link
Contributor Author

$ asv run -E existing --quick --bench merge
· Discovering benchmarks
· Running 5 total benchmarks (1 commits * 1 environments * 5 benchmarks)
[  0.00%] ·· Benchmarking existing-py_home_mark_mambaforge_envs_mcam_dev_bin_python
[ 10.00%] ··· merge.DatasetAddVariable.time_dict_of_dataarrays_to_dataset                                             ok
[ 10.00%] ··· =================== ==========
               existing_elements
              ------------------- ----------
                       0           762±0μs
                       10          7.18±0ms
                      100          12.6±0ms
                      1000         89.1±0ms
              =================== ==========

[ 20.00%] ··· merge.DatasetAddVariable.time_dict_of_tuples_to_dataset                                                 ok
[ 20.00%] ··· =================== ==========
               existing_elements
              ------------------- ----------
                       0           889±0μs
                       10          2.01±0ms
                      100          1.34±0ms
                      1000         605±0μs
              =================== ==========

[ 30.00%] ··· merge.DatasetAddVariable.time_dict_of_variables_to_dataset                                              ok
[ 30.00%] ··· =================== ==========
               existing_elements
              ------------------- ----------
                       0           2.48±0ms
                       10          2.06±0ms
                      100          2.13±0ms
                      1000         2.38±0ms
              =================== ==========

[ 40.00%] ··· merge.DatasetAddVariable.time_merge_two_datasets                                                        ok
[ 40.00%] ··· =================== ==========
               existing_elements
              ------------------- ----------
                       0           814±0μs
                       10          945±0μs
                      100          2.42±0ms
                      1000         5.23±0ms
              =================== ==========

[ 50.00%] ··· merge.DatasetAddVariable.time_variable_insertion                                                        ok
[ 50.00%] ··· =================== ==========
               existing_elements
              ------------------- ----------
                       0           1.10±0ms
                       10          954±0μs
                      100          1.88±0ms
                      1000         5.29±0ms
              =================== ==========

@hmaarrfk
Copy link
Contributor Author

On the CI, it reports similar findings:

[ 67.73%] ··· ...dVariable.time_dict_of_dataarrays_to_dataset                 ok
[ 67.73%] ··· =================== =============
               existing_elements               
              ------------------- -------------
                       0            269±0.9μs  
                       10          2.21±0.01ms 
                      100          16.5±0.07ms 
                      1000          153±0.9ms  
              =================== =============

[ 67.88%] ··· ...etAddVariable.time_dict_of_tuples_to_dataset                 ok
[ 67.88%] ··· =================== ===========
               existing_elements             
              ------------------- -----------
                       0           269±0.6μs 
                       10          289±0.4μs 
                      100           293±1μs  
                      1000         346±0.4μs 
              =================== ===========

[ 68.02%] ··· ...ddVariable.time_dict_of_variables_to_dataset                 ok
[ 68.02%] ··· =================== =============
               existing_elements               
              ------------------- -------------
                       0             270±1μs   
                       10           329±0.6μs  
                      100            636±1μs   
                      1000         3.70±0.01ms 
              =================== =============

[ 68.17%] ··· ...e.DatasetAddVariable.time_merge_two_datasets                 ok
[ 68.17%] ··· =================== =============
               existing_elements               
              ------------------- -------------
                       0            104±0.5μs  
                       10           235±0.6μs  
                      100            1.05±0ms  
                      1000         9.02±0.02ms 
              =================== =============

[ 68.31%] ··· ...e.DatasetAddVariable.time_variable_insertion                 ok
[ 68.31%] ··· =================== =============
               existing_elements               
              ------------------- -------------
                       0             119±1μs   
                       10           225±0.7μs  
                      100            1.04±0ms  
                      1000         9.03±0.03ms 
              =================== =============

Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these different ways of creating a Dataset can be combined into a matrix test by passing a function that creates your data_vars.

@hmaarrfk hmaarrfk force-pushed the expand_merge_benchmarks branch from 6c7ff52 to 1a66d38 Compare October 29, 2022 17:10
@hmaarrfk
Copy link
Contributor Author

With the right window size it looks like:

[ 50.00%] ··· ==================== ========== ========== ========== ========== ==========
              --                                           count
              -------------------- ------------------------------------------------------
                    strategy           0          1          10        100        1000
              ==================== ========== ========== ========== ========== ==========
               dict_of_DataArrays   1.32±0ms   5.87±0ms   7.58±0ms   18.7±0ms   98.6±0ms
               dict_of_Variables    2.70±0ms   2.91±0ms   3.01±0ms   3.91±0ms   7.04±0ms
                 dict_of_Tuples     2.84±0ms   3.02±0ms   3.22±0ms   3.42±0ms   3.02±0ms
              ==================== ========== ========== ========== ========== ==========

@hmaarrfk hmaarrfk force-pushed the expand_merge_benchmarks branch from f8d69f6 to 73d5f79 Compare October 29, 2022 18:02
@headtr1ck
Copy link
Collaborator

I still think that the DataArray version is "unfair" since you have coords in every dict item, so xarray has to check if they are the same.

@hmaarrfk
Copy link
Contributor Author

What about just specifying "dims"?

@hmaarrfk
Copy link
Contributor Author

[ 50.00%] ··· ==================== ========== ========== ========== ========== ==========
              --                                           count
              -------------------- ------------------------------------------------------
                    strategy           0          1          10        100        1000
              ==================== ========== ========== ========== ========== ==========
               dict_of_DataArrays   1.65±0ms   3.83±0ms   4.03±0ms   6.14±0ms   16.6±0ms
               dict_of_Variables    3.04±0ms   3.24±0ms   3.38±0ms   4.04±0ms   9.91±0ms
                 dict_of_Tuples     2.90±0ms   3.03±0ms   3.32±0ms   3.22±0ms   3.22±0ms
              ==================== ========== ========== ========== ========== ==========

as you though, the numbers improve quite a bit.

I kinda want to understand why a no-op takes 1 ms! ^_^

@hmaarrfk hmaarrfk force-pushed the expand_merge_benchmarks branch from 6309b9f to 4658836 Compare October 29, 2022 18:59
@hmaarrfk hmaarrfk force-pushed the expand_merge_benchmarks branch from 4658836 to 0be3712 Compare October 29, 2022 19:11
Co-authored-by: Illviljan <14371165+Illviljan@users.noreply.github.com>
@hmaarrfk
Copy link
Contributor Author

Well now the benchmarks look like they make more sense:

[ 50.00%] ··· ==================== ========== ========== ========== ========== ==========
              --                                           count
              -------------------- ------------------------------------------------------
                    strategy           0          1          10        100        1000
              ==================== ========== ========== ========== ========== ==========
               dict_of_DataArrays   1.56±0ms   3.60±0ms   5.83±0ms   16.6±0ms   67.3±0ms
               dict_of_Variables    1.65±0ms   3.11±0ms   4.03±0ms   6.22±0ms   18.9±0ms
                 dict_of_Tuples     2.42±0ms   3.11±0ms   982±0μs    5.17±0ms   17.2±0ms
              ==================== ========== ========== ========== ========== ==========

@Illviljan Illviljan added the plan to merge Final call for comments label Oct 30, 2022
@Illviljan Illviljan merged commit bc35e39 into pydata:main Oct 31, 2022
@Illviljan
Copy link
Contributor

Thanks @hmaarrfk !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

plan to merge Final call for comments run-benchmark Run the ASV benchmark workflow topic-performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants