Performance tuning #77

kohei-noda-qcrg · 2024-01-19T12:57:26Z

プロファイリングを行ってパフォーマンスチューニングしました
- 具体的にはcopy.deepcopyをpickle.dumps, loadsに置き換えたり、re.splitを正規表現を使わない方法に置き換えたり、dictの検索をstrで結合して単一キーにするのではなく、複数キーにするなどしてプログラムの高速化を図りました
- パフォーマンス測定結果
  - 使用データ: Cm3+_phen.out (216MB)
```
 noda@DESKTOP-1C7AGIU ~/develop/sum_dirac_dfcoef (performance-tuning) $ ls -lh Cm3+_phen.out 
-rw-r--r-- 1 noda noda 216M Jan 19 19:08 Cm3+_phen.out
```
  - パフォーマンス測定用スクリプト
    - 10回同じコマンドを実行して実行時間の平均値を測定しました
```
import subprocess
import timeit

cmd = "sum_dirac_dfcoef -i ./Cm3+_phen.out -g"
print(f"Start running \"{cmd}\" for 10 times...")
t = timeit.timeit("subprocess.run(cmd.split(), check=True)", globals=globals(), number=10)
print(f"Total time: {t} sec")
print(f"Average time: {t / 10} sec")
```
  - 実行結果
    - チューニング後は約1.25倍速になりました
```
noda@DESKTOP-1C7AGIU ~/develop/sum_dirac_dfcoef (main) $ python run.py 
Start running "sum_dirac_dfcoef -i ./Cm3+_phen.out -g" for 10 times...
Total time: 501.0955908120377 sec
Average time: 50.10955908120377 sec
noda@DESKTOP-1C7AGIU ~/develop/sum_dirac_dfcoef (main) $ git checkout performance-tuning 
Switched to branch 'performance-tuning'
noda@DESKTOP-1C7AGIU ~/develop/sum_dirac_dfcoef (performance-tuning) $ python run.py 
Start running "sum_dirac_dfcoef -i ./Cm3+_phen.out -g" for 10 times...
Total time: 398.7336607740144 sec
Average time: 39.87336607740144 sec
```
- プロファイリング方法
  - python -m cProfile -s cumtime -m sum_dirac_dfcoef -i ./Cm3+_phen.out -g > perf_result で関数レベルのプロファイリングが可能
    - 詳細はPython公式ドキュメントを参照してください
- プロファイリング結果
  - 関数呼び出しの回数は約75%に減少した
  - privec_reader.pyのadd_coefficientとその関数内で呼ばれる関数の呼び出し回数が非常に多く、かつ全体の実行時間の75-80%を占めている
    - 今回のCm3_phen.outであれば2,767,590回呼び出されている
    - 呼び出し回数が多いのは、add_coefficientが呼ばれる回数がPRIVECのデータ中のCoeffcientが書かれた行数に等しいためである
    - 各MOのPRIVECのデータごとに並列処理をすれば合計の呼び出し回数は変わらないが、全体の実行時間は短縮されるかもしれない。ただし変更の規模が大きそうなので別のissue,PRで対応予定

プロファイリングのデータ(データが多いのでcumtimeが0.1秒以上のもののみ表示)

本プルリク前の最新のmain, v4.1.1をpip installすれば同じバージョンのソースコードが得られる

208295271 function calls (201462563 primitive calls) in 97.571 seconds

Ordered by: cumulative time

ncalls tottime 305/1 0.010 1 0.000 0.000 1 0.000 0.000 1 0.000 0.000 1 0.000 0.000 1 7.278 7.278 2767590 6.719 2767590 1.522 2767590 9.663 5550790 4.343 2767590 3.119 5551753 2.232 2767590 6.510 2043517 3.635 6708346/185212 4.883 368691/185151 1.523 5551753 8.437 302562/185211 0.781 11070360 3.833 4944208/4944205 2767593 0.917 2767590 1.130 1610 0.017 0.000 5557740 1.913 2767593 2.617 2043517 2.602 1611 0.038 0.000 117412 0.183 5550790 2.187 5535181 2.060 2767590 1.262 2043566 1.268 20040821 1.530 16607799 1.379 11070360 1.322 8170962/8170934 2043530 0.594 2767590 0.887 2775987 0.628 11070360 0.839 8337054/8336850 5554462 0.710 8356750 0.588 790276 0.428 2769740 0.488 5918070 0.341 2767695 0.328 1610 0.006 0.000 2045536 0.323 370302 0.095 2767590 0.311 1610 0.259 0.000 368691 0.275 2044093 0.270 1610 0.008 0.000 1610 0.003 0.000 3220 0.175 0.000 8 0.000 0.000 1 0.130 0.130 117412 0.047 27468 0.068 0.000 185150 0.079 percall cumtime percall filename:lineno(function)
0.000 97.571 97.571 {built-in method builtins.exec}
97.571 97.571 runpy.py:200(run_module)
97.473 97.473 runpy.py:64(_run_code)
97.473 97.473 main.py:1()
97.473 97.473 sum_dirac_dfcoef.py:10(main)
97.253 97.253 privec_reader.py:40(read_privec_data)
0.000 73.422 0.000 privec_reader.py:171(add_coefficient)
0.000 39.345 0.000 coefficient.py:24(get_coefficient)
0.000 37.823 0.000 coefficient.py:38(parse_line)
0.000 20.790 0.000 utils.py:7(space_separated_parsing)
0.000 14.551 0.000 data.py:19(add_coefficient)
0.000 13.552 0.000 re.py:223(split)
0.000 12.505 0.000 coefficient.py:70()
0.000 11.441 0.000 main.py:736(setattr)
0.000 10.822 0.000 copy.py:128(deepcopy)
0.000 9.806 0.000 copy.py:258(_reconstruct)
0.000 8.437 0.000 {method 'split' of 're.Pattern' objects}
0.000 8.355 0.000 copy.py:226(_deepcopy_dict)
0.000 4.673 0.000 utils.py:25(is_float)
2.081 0.000 3.619 0.000 {built-in method builtins.getattr}
0.000 3.534 0.000 main.py:154(init)
0.000 2.970 0.000 atoms.py:38(count_remaining_functions)
2.919 0.002 privec_reader.py:200(add_current_mo_data_to_data_all_mo)
0.000 2.891 0.000 re.py:289(_compile)
0.000 2.617 0.000 {method 'validate_python' of 'pydantic_core._pydantic_core.SchemaValidator' objects}
0.000 2.602 0.000 {method 'validate_assignment' of 'pydantic_core._pydantic_core.SchemaValidator' objects}
2.516 0.002 copy.py:200(_deepcopy_list)
0.000 2.204 0.000 main.py:686(deepcopy)
0.000 2.187 0.000 utils.py:9()
0.000 2.060 0.000 {built-in method builtins.sum}
0.000 1.787 0.000 privec_reader.py:92(is_this_row_for_coefficients)
0.000 1.538 0.000 _model_construction.py:197(getattr)
0.000 1.530 0.000 {method 'get' of 'dict' objects}
0.000 1.379 0.000 {method 'strip' of 'str' objects}
0.000 1.322 0.000 {built-in method builtins.pow}
1.304 0.000 1.305 0.000 {built-in method builtins.isinstance}
0.000 0.917 0.000 _fields.py:262(is_valid_field_name)
0.000 0.887 0.000 atoms.py:28(decrement_function)
0.000 0.880 0.000 privec_reader.py:96(need_to_skip_this_line)
0.000 0.839 0.000 {method 'isdecimal' of 'str' objects}
0.727 0.000 0.727 0.000 {built-in method builtins.len}
0.000 0.710 0.000 {method 'rstrip' of 'str' objects}
0.000 0.588 0.000 {built-in method builtins.id}
0.000 0.558 0.000 copy.py:242(_keep_alive)
0.000 0.488 0.000 {method 'replace' of 'str' objects}
0.000 0.341 0.000 copy.py:182(_deepcopy_atomic)
0.000 0.328 0.000 {method 'isdigit' of 'str' objects}
0.324 0.000 data.py:35(fileter_coefficients_by_threshold)
0.000 0.323 0.000 {method 'startswith' of 'str' objects}
0.000 0.317 0.000 copy.py:263()
0.000 0.311 0.000 {method 'values' of 'collections.OrderedDict' objects}
0.288 0.000 data.py:36()
0.000 0.275 0.000 {method 'reduce_ex' of 'object' objects}
0.000 0.270 0.000 {method 'get' of 'mappingproxy' objects}
0.215 0.000 privec_reader.py:132(start_mo_section)
0.188 0.000 data.py:27(reset)
0.175 0.000 {method 'clear' of 'dict' objects}
0.168 0.021 init.py:1()
0.147 0.147 file_writer.py:39(write_mo_data)
0.000 0.117 0.000 copy.py:66(copy)
0.117 0.000 codecs.py:319(decode)
0.000 0.112 0.000 copyreg.py:94(newobj)

本プルリクの最後のコミット

150940333 function calls (150931166 primitive calls) in 74.187 seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)
305/1 0.011 0.000 74.188 74.188 {built-in method builtins.exec}
1 0.000 0.000 74.187 74.187 runpy.py:200(run_module)
1 0.000 0.000 74.088 74.088 runpy.py:64(_run_code)
1 0.000 0.000 74.088 74.088 main.py:1()
1 0.000 0.000 74.088 74.088 sum_dirac_dfcoef.py:10(main)
1 5.511 5.511 73.873 73.873 privec_reader.py:39(read_privec_data)
2767590 6.378 0.000 59.054 0.000 privec_reader.py:170(add_coefficient)
2767590 1.449 0.000 32.726 0.000 coefficient.py:24(get_coefficient)
2767590 8.835 0.000 31.277 0.000 coefficient.py:38(parse_line)
5535181 3.446 0.000 13.893 0.000 {built-in method builtins.sum}
2767590 2.607 0.000 13.844 0.000 data.py:19(add_coefficient)
2043517 3.617 0.000 11.248 0.000 main.py:736(setattr)
5550790 4.052 0.000 10.709 0.000 utils.py:8(space_separated_parsing)
13837950 6.916 0.000 10.447 0.000 coefficient.py:69()
5550790 3.481 0.000 3.481 0.000 utils.py:9()
2767593 0.894 0.000 3.325 0.000 main.py:154(init)
4089477/4089474 1.843 0.000 3.324 0.000 {built-in method builtins.getattr}
2767590 1.095 0.000 2.928 0.000 atoms.py:38(count_remaining_functions)
2043517 2.564 0.000 2.565 0.000 {method 'validate_assignment' of 'pydantic_core._pydantic_core.SchemaValidator' objects}
5551867 2.494 0.000 2.494 0.000 {method 'split' of 'str' objects}
2767593 2.430 0.000 2.430 0.000 {method 'validate_python' of 'pydantic_core._pydantic_core.SchemaValidator' objects}
93380 0.085 0.000 2.329 0.000 utils.py:16(fast_deepcopy_pickle)
11070360 2.287 0.000 2.287 0.000 utils.py:36(is_float)
2767590 1.204 0.000 1.705 0.000 privec_reader.py:91(is_this_row_for_coefficients)
2043566 1.225 0.000 1.481 0.000 _model_construction.py:197(getattr)
93380 1.361 0.000 1.404 0.000 {built-in method _pickle.dumps}
16607799 1.317 0.000 1.317 0.000 {method 'strip' of 'str' objects}
11070360 1.244 0.000 1.244 0.000 {built-in method builtins.pow}
1610 0.013 0.000 1.062 0.001 privec_reader.py:200(add_current_mo_data_to_data_all_mo)
2043530 0.590 0.000 0.938 0.000 _fields.py:262(is_valid_field_name)
2767590 0.884 0.000 0.884 0.000 atoms.py:28(decrement_function)
2775987 0.620 0.000 0.861 0.000 privec_reader.py:95(need_to_skip_this_line)
93380 0.784 0.000 0.841 0.000 {built-in method _pickle.loads}
8337038/8336836 0.695 0.000 0.695 0.000 {built-in method builtins.len}
5554462 0.683 0.000 0.683 0.000 {method 'rstrip' of 'str' objects}
6139190 0.543 0.000 0.543 0.000 {method 'get' of 'dict' objects}
2769740 0.475 0.000 0.475 0.000 {method 'replace' of 'str' objects}
2045536 0.349 0.000 0.349 0.000 {method 'startswith' of 'str' objects}
1610 0.007 0.000 0.338 0.000 data.py:35(fileter_coefficients_by_threshold)
2767695 0.317 0.000 0.317 0.000 {method 'isdigit' of 'str' objects}
1610 0.272 0.000 0.301 0.000 data.py:36()
2767590 0.289 0.000 0.289 0.000 {method 'values' of 'collections.OrderedDict' objects}
1610 0.009 0.000 0.266 0.000 privec_reader.py:131(start_mo_section)
2065355/2065327 0.265 0.000 0.265 0.000 {built-in method builtins.isinstance}
2044093 0.256 0.000 0.256 0.000 {method 'get' of 'mappingproxy' objects}
1610 0.003 0.000 0.227 0.000 data.py:27(reset)
3220 0.222 0.000 0.222 0.000 {method 'clear' of 'dict' objects}
8 0.000 0.000 0.180 0.022 init.py:1()
1 0.128 0.128 0.146 0.146 file_writer.py:39(write_mo_data)

…function.

…mance reason.

kohei-noda-qcrg added 5 commits January 19, 2024 18:12

Don't use regex for performance when there is alternative processing.

0a478cf

Refactor is_float function to simplify and efficiently code.

e4195a9

Use multiple keys to improve search performance.

4523858

Don't unnecessarily pass an immutable List as an argument to the sum …

fb07f74

…function.

Use pickle.dumps and pickle.loads instead of copy.deepcopy for perfor…

3079169

…mance reason.

kohei-noda-qcrg merged commit 54271c9 into main Jan 19, 2024
6 checks passed

kohei-noda-qcrg deleted the performance-tuning branch January 19, 2024 13:04

kohei-noda-qcrg mentioned this pull request Jan 20, 2024

Supports multi-process parallelization #78

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance tuning #77

Performance tuning #77

kohei-noda-qcrg commented Jan 19, 2024 •

edited

Loading

Performance tuning #77

Performance tuning #77

Conversation

kohei-noda-qcrg commented Jan 19, 2024 • edited Loading

kohei-noda-qcrg commented Jan 19, 2024 •

edited

Loading