Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance tuning #77

Merged
merged 5 commits into from
Jan 19, 2024
Merged

Performance tuning #77

merged 5 commits into from
Jan 19, 2024

Conversation

kohei-noda-qcrg
Copy link
Member

@kohei-noda-qcrg kohei-noda-qcrg commented Jan 19, 2024

  • プロファイリングを行ってパフォーマンスチューニングしました
    • 具体的にはcopy.deepcopyをpickle.dumps, loadsに置き換えたり、re.splitを正規表現を使わない方法に置き換えたり、dictの検索をstrで結合して単一キーにするのではなく、複数キーにするなどしてプログラムの高速化を図りました

    • パフォーマンス測定結果

      • 使用データ: Cm3+_phen.out (216MB)
       noda@DESKTOP-1C7AGIU ~/develop/sum_dirac_dfcoef (performance-tuning) $ ls -lh Cm3+_phen.out 
      -rw-r--r-- 1 noda noda 216M Jan 19 19:08 Cm3+_phen.out
      • パフォーマンス測定用スクリプト
        • 10回同じコマンドを実行して実行時間の平均値を測定しました
      import subprocess
      import timeit
      
      cmd = "sum_dirac_dfcoef -i ./Cm3+_phen.out -g"
      print(f"Start running \"{cmd}\" for 10 times...")
      t = timeit.timeit("subprocess.run(cmd.split(), check=True)", globals=globals(), number=10)
      print(f"Total time: {t} sec")
      print(f"Average time: {t / 10} sec")
      • 実行結果
        • チューニング後は約1.25倍速になりました
      noda@DESKTOP-1C7AGIU ~/develop/sum_dirac_dfcoef (main) $ python run.py 
      Start running "sum_dirac_dfcoef -i ./Cm3+_phen.out -g" for 10 times...
      Total time: 501.0955908120377 sec
      Average time: 50.10955908120377 sec
      noda@DESKTOP-1C7AGIU ~/develop/sum_dirac_dfcoef (main) $ git checkout performance-tuning 
      Switched to branch 'performance-tuning'
      noda@DESKTOP-1C7AGIU ~/develop/sum_dirac_dfcoef (performance-tuning) $ python run.py 
      Start running "sum_dirac_dfcoef -i ./Cm3+_phen.out -g" for 10 times...
      Total time: 398.7336607740144 sec
      Average time: 39.87336607740144 sec
    • プロファイリング方法

      • python -m cProfile -s cumtime -m sum_dirac_dfcoef -i ./Cm3+_phen.out -g > perf_result で関数レベルのプロファイリングが可能
    • プロファイリング結果

      • 関数呼び出しの回数は約75%に減少した
      • privec_reader.pyのadd_coefficientとその関数内で呼ばれる関数の呼び出し回数が非常に多く、かつ全体の実行時間の75-80%を占めている
        • 今回のCm3_phen.outであれば2,767,590回呼び出されている
        • 呼び出し回数が多いのは、add_coefficientが呼ばれる回数がPRIVECのデータ中のCoeffcientが書かれた行数に等しいためである
        • 各MOのPRIVECのデータごとに並列処理をすれば合計の呼び出し回数は変わらないが、全体の実行時間は短縮されるかもしれない。ただし変更の規模が大きそうなので別のissue,PRで対応予定

プロファイリングのデータ(データが多いのでcumtimeが0.1秒以上のもののみ表示)

本プルリク前の最新のmain, v4.1.1をpip installすれば同じバージョンのソースコードが得られる

208295271 function calls (201462563 primitive calls) in 97.571 seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)
305/1 0.010 0.000 97.571 97.571 {built-in method builtins.exec}
1 0.000 0.000 97.571 97.571 runpy.py:200(run_module)
1 0.000 0.000 97.473 97.473 runpy.py:64(_run_code)
1 0.000 0.000 97.473 97.473 main.py:1()
1 0.000 0.000 97.473 97.473 sum_dirac_dfcoef.py:10(main)
1 7.278 7.278 97.253 97.253 privec_reader.py:40(read_privec_data)
2767590 6.719 0.000 73.422 0.000 privec_reader.py:171(add_coefficient)
2767590 1.522 0.000 39.345 0.000 coefficient.py:24(get_coefficient)
2767590 9.663 0.000 37.823 0.000 coefficient.py:38(parse_line)
5550790 4.343 0.000 20.790 0.000 utils.py:7(space_separated_parsing)
2767590 3.119 0.000 14.551 0.000 data.py:19(add_coefficient)
5551753 2.232 0.000 13.552 0.000 re.py:223(split)
2767590 6.510 0.000 12.505 0.000 coefficient.py:70()
2043517 3.635 0.000 11.441 0.000 main.py:736(setattr)
6708346/185212 4.883 0.000 10.822 0.000 copy.py:128(deepcopy)
368691/185151 1.523 0.000 9.806 0.000 copy.py:258(_reconstruct)
5551753 8.437 0.000 8.437 0.000 {method 'split' of 're.Pattern' objects}
302562/185211 0.781 0.000 8.355 0.000 copy.py:226(_deepcopy_dict)
11070360 3.833 0.000 4.673 0.000 utils.py:25(is_float)
4944208/4944205 2.081 0.000 3.619 0.000 {built-in method builtins.getattr}
2767593 0.917 0.000 3.534 0.000 main.py:154(init)
2767590 1.130 0.000 2.970 0.000 atoms.py:38(count_remaining_functions)
1610 0.017 0.000 2.919 0.002 privec_reader.py:200(add_current_mo_data_to_data_all_mo)
5557740 1.913 0.000 2.891 0.000 re.py:289(_compile)
2767593 2.617 0.000 2.617 0.000 {method 'validate_python' of 'pydantic_core._pydantic_core.SchemaValidator' objects}
2043517 2.602 0.000 2.602 0.000 {method 'validate_assignment' of 'pydantic_core._pydantic_core.SchemaValidator' objects}
1611 0.038 0.000 2.516 0.002 copy.py:200(_deepcopy_list)
117412 0.183 0.000 2.204 0.000 main.py:686(deepcopy)
5550790 2.187 0.000 2.187 0.000 utils.py:9()
5535181 2.060 0.000 2.060 0.000 {built-in method builtins.sum}
2767590 1.262 0.000 1.787 0.000 privec_reader.py:92(is_this_row_for_coefficients)
2043566 1.268 0.000 1.538 0.000 _model_construction.py:197(getattr)
20040821 1.530 0.000 1.530 0.000 {method 'get' of 'dict' objects}
16607799 1.379 0.000 1.379 0.000 {method 'strip' of 'str' objects}
11070360 1.322 0.000 1.322 0.000 {built-in method builtins.pow}
8170962/8170934 1.304 0.000 1.305 0.000 {built-in method builtins.isinstance}
2043530 0.594 0.000 0.917 0.000 _fields.py:262(is_valid_field_name)
2767590 0.887 0.000 0.887 0.000 atoms.py:28(decrement_function)
2775987 0.628 0.000 0.880 0.000 privec_reader.py:96(need_to_skip_this_line)
11070360 0.839 0.000 0.839 0.000 {method 'isdecimal' of 'str' objects}
8337054/8336850 0.727 0.000 0.727 0.000 {built-in method builtins.len}
5554462 0.710 0.000 0.710 0.000 {method 'rstrip' of 'str' objects}
8356750 0.588 0.000 0.588 0.000 {built-in method builtins.id}
790276 0.428 0.000 0.558 0.000 copy.py:242(_keep_alive)
2769740 0.488 0.000 0.488 0.000 {method 'replace' of 'str' objects}
5918070 0.341 0.000 0.341 0.000 copy.py:182(_deepcopy_atomic)
2767695 0.328 0.000 0.328 0.000 {method 'isdigit' of 'str' objects}
1610 0.006 0.000 0.324 0.000 data.py:35(fileter_coefficients_by_threshold)
2045536 0.323 0.000 0.323 0.000 {method 'startswith' of 'str' objects}
370302 0.095 0.000 0.317 0.000 copy.py:263()
2767590 0.311 0.000 0.311 0.000 {method 'values' of 'collections.OrderedDict' objects}
1610 0.259 0.000 0.288 0.000 data.py:36()
368691 0.275 0.000 0.275 0.000 {method 'reduce_ex' of 'object' objects}
2044093 0.270 0.000 0.270 0.000 {method 'get' of 'mappingproxy' objects}
1610 0.008 0.000 0.215 0.000 privec_reader.py:132(start_mo_section)
1610 0.003 0.000 0.188 0.000 data.py:27(reset)
3220 0.175 0.000 0.175 0.000 {method 'clear' of 'dict' objects}
8 0.000 0.000 0.168 0.021 init.py:1()
1 0.130 0.130 0.147 0.147 file_writer.py:39(write_mo_data)
117412 0.047 0.000 0.117 0.000 copy.py:66(copy)
27468 0.068 0.000 0.117 0.000 codecs.py:319(decode)
185150 0.079 0.000 0.112 0.000 copyreg.py:94(newobj)

本プルリクの最後のコミット

150940333 function calls (150931166 primitive calls) in 74.187 seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)
305/1 0.011 0.000 74.188 74.188 {built-in method builtins.exec}
1 0.000 0.000 74.187 74.187 runpy.py:200(run_module)
1 0.000 0.000 74.088 74.088 runpy.py:64(_run_code)
1 0.000 0.000 74.088 74.088 main.py:1()
1 0.000 0.000 74.088 74.088 sum_dirac_dfcoef.py:10(main)
1 5.511 5.511 73.873 73.873 privec_reader.py:39(read_privec_data)
2767590 6.378 0.000 59.054 0.000 privec_reader.py:170(add_coefficient)
2767590 1.449 0.000 32.726 0.000 coefficient.py:24(get_coefficient)
2767590 8.835 0.000 31.277 0.000 coefficient.py:38(parse_line)
5535181 3.446 0.000 13.893 0.000 {built-in method builtins.sum}
2767590 2.607 0.000 13.844 0.000 data.py:19(add_coefficient)
2043517 3.617 0.000 11.248 0.000 main.py:736(setattr)
5550790 4.052 0.000 10.709 0.000 utils.py:8(space_separated_parsing)
13837950 6.916 0.000 10.447 0.000 coefficient.py:69()
5550790 3.481 0.000 3.481 0.000 utils.py:9()
2767593 0.894 0.000 3.325 0.000 main.py:154(init)
4089477/4089474 1.843 0.000 3.324 0.000 {built-in method builtins.getattr}
2767590 1.095 0.000 2.928 0.000 atoms.py:38(count_remaining_functions)
2043517 2.564 0.000 2.565 0.000 {method 'validate_assignment' of 'pydantic_core._pydantic_core.SchemaValidator' objects}
5551867 2.494 0.000 2.494 0.000 {method 'split' of 'str' objects}
2767593 2.430 0.000 2.430 0.000 {method 'validate_python' of 'pydantic_core._pydantic_core.SchemaValidator' objects}
93380 0.085 0.000 2.329 0.000 utils.py:16(fast_deepcopy_pickle)
11070360 2.287 0.000 2.287 0.000 utils.py:36(is_float)
2767590 1.204 0.000 1.705 0.000 privec_reader.py:91(is_this_row_for_coefficients)
2043566 1.225 0.000 1.481 0.000 _model_construction.py:197(getattr)
93380 1.361 0.000 1.404 0.000 {built-in method _pickle.dumps}
16607799 1.317 0.000 1.317 0.000 {method 'strip' of 'str' objects}
11070360 1.244 0.000 1.244 0.000 {built-in method builtins.pow}
1610 0.013 0.000 1.062 0.001 privec_reader.py:200(add_current_mo_data_to_data_all_mo)
2043530 0.590 0.000 0.938 0.000 _fields.py:262(is_valid_field_name)
2767590 0.884 0.000 0.884 0.000 atoms.py:28(decrement_function)
2775987 0.620 0.000 0.861 0.000 privec_reader.py:95(need_to_skip_this_line)
93380 0.784 0.000 0.841 0.000 {built-in method _pickle.loads}
8337038/8336836 0.695 0.000 0.695 0.000 {built-in method builtins.len}
5554462 0.683 0.000 0.683 0.000 {method 'rstrip' of 'str' objects}
6139190 0.543 0.000 0.543 0.000 {method 'get' of 'dict' objects}
2769740 0.475 0.000 0.475 0.000 {method 'replace' of 'str' objects}
2045536 0.349 0.000 0.349 0.000 {method 'startswith' of 'str' objects}
1610 0.007 0.000 0.338 0.000 data.py:35(fileter_coefficients_by_threshold)
2767695 0.317 0.000 0.317 0.000 {method 'isdigit' of 'str' objects}
1610 0.272 0.000 0.301 0.000 data.py:36()
2767590 0.289 0.000 0.289 0.000 {method 'values' of 'collections.OrderedDict' objects}
1610 0.009 0.000 0.266 0.000 privec_reader.py:131(start_mo_section)
2065355/2065327 0.265 0.000 0.265 0.000 {built-in method builtins.isinstance}
2044093 0.256 0.000 0.256 0.000 {method 'get' of 'mappingproxy' objects}
1610 0.003 0.000 0.227 0.000 data.py:27(reset)
3220 0.222 0.000 0.222 0.000 {method 'clear' of 'dict' objects}
8 0.000 0.000 0.180 0.022 init.py:1()
1 0.128 0.128 0.146 0.146 file_writer.py:39(write_mo_data)

@kohei-noda-qcrg kohei-noda-qcrg merged commit 54271c9 into main Jan 19, 2024
6 checks passed
@kohei-noda-qcrg kohei-noda-qcrg deleted the performance-tuning branch January 19, 2024 13:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant