-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use parallel processing to speed up obs processing #733
Use parallel processing to speed up obs processing #733
Conversation
|
||
# Check if the converter was successful | ||
# if os.path.exists(yaml_output_file): | ||
# rm_p(yaml_output_file) | ||
|
||
# run all bufr2ioda yamls in parallel | ||
with mp.Pool(num_cores) as pool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, the easiest way to do this was to split them into python and YAML+executable groups. If they are roughly equal in size, that probably is okay, but we may want to combine them all into the same pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not familiar with python multiprocessing. Is running in parallel from python the most efficient approach?
Copied ush/ioda/bufr2ioda/run_bufr2ioda.py
to a working copy of gdas-validation. Ran gdasprepatmiodaobs. Log file indicates jobs ran in parallel.
^[[38;21m2023-11-16 00:54:20,329 - INFO - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_gpsro_bufr_combined.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/gpsro_bufr_combined_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,329 - INFO - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_satwind_scat.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/satwind_scat_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_adpsfc_prepbufr.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/adpsfc_prepbufr_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_adpupa_prepbufr.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/adpupa_prepbufr_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_conventional_prepbufr_ps.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/conventional_prepbufr_ps_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_sfcshp_prepbufr.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/sfcshp_prepbufr_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_satwind_amv_goes.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/satwind_amv_goes_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,331 - INFO - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_acft_profiles_prepbufr.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/acft_profiles_prepbufr_2021080100.json^[[0m
The total run time for the parallel prepatmiodaobs was 06:28 (mm:ss)
The previous serial job took 11:06.
Nice reduction!
@RussTreadon-NOAA I'm not sure if it's the most efficient, but I think this will work fine provided we don't need to run on multiple nodes. This was the fastest way to speed everything up. Next, we may wish to combine the pools so that the |
Using python multiprocessing to generate obs in parallel. The current list of obs goes from 10+ minutes to completing in ~5.5 minutes.