Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add common alias cores (x) for grdlandmask #2944

Merged
merged 6 commits into from
Jan 4, 2024
Merged

Add common alias cores (x) for grdlandmask #2944

merged 6 commits into from
Jan 4, 2024

Conversation

weiji14
Copy link
Member

@weiji14 weiji14 commented Jan 2, 2024

Description of proposed changes

Add alias cores (-x) for grdlandmask, and set limit of cores=2 for test_grdlandmask_no_outgrid to make benchmark execution deviate less.

Preview at https://pygmt-dev--2944.org.readthedocs.build/en/2944/api/generated/pygmt.grdlandmask.html

Extends #1423, may address #2942 partially.

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If wrapping a new module, open a 'Wrap new GMT module' issue and submit reasonably-sized PRs.
  • If adding new functionality, add an example to docstrings or tutorials.
  • Use underscores (not hyphens) in names of Python files and directories.

Slash Commands

You can write slash commands (/command) in the first line of a comment to perform
specific operations. Supported slash commands are:

  • /format: automatically format and lint the code
  • /test-gmt-dev: run full tests on the latest GMT development version

@weiji14 weiji14 added the enhancement Improving an existing feature label Jan 2, 2024
@weiji14 weiji14 added this to the 0.11.0 milestone Jan 2, 2024
@weiji14 weiji14 self-assigned this Jan 2, 2024
@@ -49,7 +49,7 @@ def test_grdlandmask_no_outgrid(expected_grid):
"""
Test grdlandmask with no set outgrid.
"""
result = grdlandmask(spacing=1, region=[125, 130, 30, 35])
result = grdlandmask(spacing=1, region=[125, 130, 30, 35], cores=2)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Linux runner for GitHub Actions has 2 CPU cores according to https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources, but not sure if we should set cores to 2 or 1 for consistent benchmarking.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using two cores is faster and can help check the OpenMP feature in GMT, so I prefer to use two cores if possible.

BTW, the macOS runner for GitHub Actions has 3 cores, so instead of cores=2, cores=True will use all available cores.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is only 1 core being used if we don't set cores, or use cores=None? I'm reading https://docs.generic-mapping-tools.org/6.4/gmt.html#core-full which says [default is to use all available cores], but is that only with cores=True (i.e. -x in GMT)?

The benchmark.yml workflow is running on Linux, but I guess we could set cores=True since this test would also run on macOS in ci_test.yml. My question was more around whether multi-threading with multiple-cores (2 or more) would provide a consistent benchmark compared to using only 1 core, but I guess we need to merge this PR to find out!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is only 1 core being used if we don't set cores, or use cores=None? I'm reading https://docs.generic-mapping-tools.org/6.4/gmt.html#core-full which says [default is to use all available cores], but is that only with cores=True (i.e. -x in GMT)?

Yes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is only 1 core being used if we don't set cores, or use cores=None? I'm reading https://docs.generic-mapping-tools.org/6.4/gmt.html#core-full which says [default is to use all available cores], but is that only with cores=True (i.e. -x in GMT)?

Yes.

Actually it uses all available cores if -x is not used.

$ gmt grdlandmask -R0/10/0/10 -Gabc.nc -I1/1 -Vi
grdlandmask [INFORMATION]: Enable all available threads (up to xxx)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe set cores=1 and see what the flame graph diff shows?

The test_grdlandmask_no_outgrid benchmark speedup went from 3x to 9.6x 🤣 It might be that there is some overhead with using threading (2 cores) vs no threading (1 core). Here's the flame graph diff from https://codspeed.io/GenericMappingTools/pygmt/branches/limit-cores again:

image

The extra gomp_team_barrier_wait_* calls are still there, but also some other stuff (in blue).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reminds me of the warnings message I saw when running the tests in random order (see https://github.com/GenericMappingTools/pygmt/actions/runs/7396374248/job/20121388367?pr=2936):

pygmt/tests/test_session_management.py::test_session_multiprocessing
pygmt/tests/test_session_management.py::test_session_multiprocessing
  /home/runner/micromamba/envs/pygmt/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=2433) is multi-threaded, use of fork() may lead to deadlocks in the child.
    self.pid = os.fork()

There are some related discussions at https://discuss.python.org/t/concerns-regarding-deprecation-of-fork-with-alive-threads/33555 and related PR at python/cpython#100228.

os.fork() in a multi-threaded application is a likely source of deadlocks on many platforms. We should raise a warning when people call os.fork() from a process that we know has other threads running.

Not sure if it's related to the issue here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pygmt/tests/test_session_management.py::test_session_multiprocessing
pygmt/tests/test_session_management.py::test_session_multiprocessing
  /home/runner/micromamba/envs/pygmt/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=2433) is multi-threaded, use of fork() may lead to deadlocks in the child.
    self.pid = os.fork()

That test_session_multiprocessing function is only running basemap and savefig right? It shouldn't have any multi-threading going on, so not sure what's happening.

I tried running some benchmarks using cores from -1 (all cores) to 0 (no-multithreading) to 16 (all cores on my laptop). This was running grdlandmask with a spacing of 0.05 arc degrees. The results do vary between runs, but in general, more cores means less time:

benchmark_grdlandmask_0.05arc_degree

Code to reproduce:

import time
import pandas as pd
import pygmt
import tqdm

timings = {}
for cores in tqdm.tqdm(range(-1, 16)):
    tic = time.perf_counter()
    pygmt.grdlandmask(spacing="0.05d", region=[0, 180, 0, 90], cores=cores)
    toc = time.perf_counter()
    timings[cores] = toc - tic

df = pd.DataFrame.from_dict(data=timings, orient="index", columns=["time"])

# %%
fig = pygmt.Figure()
region = pygmt.info(data=df.reset_index(), per_column=True, spacing=0.05)
fig.plot(
    x=df.index,
    y=df.time,
    region=region,
    pen="thick",
    frame=[
        "WSne+tBenchmark grdlandmask",
        "xaf+lCores",
        "yaf+lTime (s), lower is better",
    ],
)
fig.savefig("benchmark_grdlandmask.png")
fig.show()

If I change the spacing from 0.05 arc degrees to 1 arc degree, more cores actually take more time:

benchmark_grdlandmask_1arc_degree

For that test_grdlandmask_no_outgrid unit test which is using spacing=1 (i.e. 1 arc degree), I think we should just use cores=2, since cores=3 might not be much faster on such a low resolution. As long as we are benchmarking with a fixed number of core counts, it should be ok?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds ok

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, back to using cores=2 in 97637d4.

Copy link

codspeed-hq bot commented Jan 2, 2024

CodSpeed Performance Report

Merging #2944 will improve performances by ×3

Comparing limit-cores (97637d4) with main (3f9d14b)

Summary

⚡ 3 improvements
✅ 61 untouched benchmarks

Benchmarks breakdown

Benchmark main limit-cores Change
test_grdlandmask_no_outgrid 2,473.4 ms 819.2 ms ×3
test_grdsample_dataarray_out 104.1 ms 94.5 ms +10.11%
test_sph2grd_no_outgrid 4.7 s 2.2 s ×2.1

@seisman
Copy link
Member

seisman commented Jan 2, 2024

CodSpeed Performance Report

Merging #2944 will improve performances by ×3

Comparing limit-cores (137a04f) with main (88ab1ca)

Summary

⚡ 3 improvements ✅ 61 untouched benchmarks

Benchmarks breakdown

Benchmark main limit-cores Change
test_grdlandmask_no_outgrid 2,531.2 ms 842.1 ms ×3
test_grdsample_dataarray_out 106.3 ms 91.6 ms +16.05%
test_sph2grd_no_outgrid 4.6 s 2.3 s ×2

The benchmark report is quite confusing. It seems that test_sph2grd_no_outgrid also speedup by 2 times. Does this mean the -x option is a session-level option (similar to GMT's configuration settings) and affect all tests?

@weiji14
Copy link
Member Author

weiji14 commented Jan 2, 2024

The benchmark report is quite confusing. It seems that test_sph2grd_no_outgrid also speedup by 2 times. Does this mean the -x option is a session-level option (similar to GMT's configuration settings) and affect all tests?

Yeah, it seems to confirm the hypothesis in #2942 (comment) though. Does the setting affect GMT_MAX_CORES somehow?

@@ -49,7 +49,7 @@ def test_grdlandmask_no_outgrid(expected_grid):
"""
Test grdlandmask with no set outgrid.
"""
result = grdlandmask(spacing=1, region=[125, 130, 30, 35])
result = grdlandmask(spacing=1, region=[125, 130, 30, 35], cores=1, verbose="i")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the debug info at https://github.com/GenericMappingTools/pygmt/actions/runs/7403072686/job/20142175894?pr=2944#step:8:111 is showing that there are 4 threads on GitHub Actions??

  grdlandmask [INFORMATION]: Enable 1 threads of 4 available
  grdlandmask [INFORMATION]: Selected ice front line as Antarctica coastline
  grdlandmask [INFORMATION]: GSHHG version 2.3.7
  grdlandmask [INFORMATION]: Derived from World Vector Shoreline, CIA WDB-II, and Atlas of the Cryosphere
  grdlandmask [INFORMATION]: Processed by Paul Wessel and Walter H. F. Smith, 1994-2017
  grdlandmask [INFORMATION]: Nodes in the oceans will be set to
  0
  grdlandmask [INFORMATION]: Nodes on land will be set to
  1
  grdlandmask [INFORMATION]: Nodes in lakes will be set to
  0
  grdlandmask [INFORMATION]: Nodes in islands will be set to
  1
  grdlandmask [INFORMATION]: Nodes in ponds will be set to
  0
  grdlandmask [INFORMATION]: Apply linear scale to avoid wrap-around: -Jx100id; this temporarily changes the domain
  grdlandmask [INFORMATION]: Writing grid to file /tmp/pygmt-0c0skk20.nc
  grdlandmask [INFORMATION]: Level 0 set for 33 nodes
  grdlandmask [INFORMATION]: Level 1 set for 3 nodes
  grdlandmask [INFORMATION]: Enable 1 threads of 4 available
  grdlandmask [INFORMATION]: Selected ice front line as Antarctica coastline
  grdlandmask [INFORMATION]: GSHHG version 2.3.7
  grdlandmask [INFORMATION]: Derived from World Vector Shoreline, CIA WDB-II, and Atlas of the Cryosphere
  grdlandmask [INFORMATION]: Processed by Paul Wessel and Walter H. F. Smith, 1994-2017
  grdlandmask [INFORMATION]: Nodes in the oceans will be set to
  0
  grdlandmask [INFORMATION]: Nodes on land will be set to
  1
  grdlandmask [INFORMATION]: Nodes in lakes will be set to
  0
  grdlandmask [INFORMATION]: Nodes in islands will be set to
  1
  grdlandmask [INFORMATION]: Nodes in ponds will be set to
  0
  grdlandmask [INFORMATION]: Apply linear scale to avoid wrap-around: -Jx100id; this temporarily changes the domain
  grdlandmask [INFORMATION]: Writing grid to file /tmp/pygmt-i3n6zkcg.nc
  grdlandmask [INFORMATION]: Level 0 set for 33 nodes
  grdlandmask [INFORMATION]: Level 1 set for 3 nodes

So the 2 CPUs mentioned at https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources might have hyperthreading, and we could potentially set cores=4, though see #2944 (comment) on how more cores might not actually improve runtime.

@weiji14 weiji14 marked this pull request as ready for review January 4, 2024 01:29
@weiji14 weiji14 enabled auto-merge (squash) January 4, 2024 01:38
@weiji14 weiji14 disabled auto-merge January 4, 2024 01:54
@weiji14 weiji14 merged commit 8be628e into main Jan 4, 2024
18 of 25 checks passed
@weiji14 weiji14 deleted the limit-cores branch January 4, 2024 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants