Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set GMT_SESSION_NAME to a unique name on Windows for multiprocessing support #2938

Merged
merged 4 commits into from
Jan 2, 2024

Conversation

seisman
Copy link
Member

@seisman seisman commented Jan 1, 2024

This PR adds the workaround in #217 (comment) as a test to make sure that the workaround for multiprocessing support (i.e., reload pygmt in each process) works as expected.

As mentioned in #217 (comment), the workaround didn't work on Windows. This PR also addresses this issue by setting GMT_SESSION_NAME to a unique name on Windows.

xref: https://github.com/GenericMappingTools/gmt/blob/104bcdbafd13eb96260b445910ce3587a362dbc0/src/gmt_api.c#L1036

The newly added test fails on Windows (https://github.com/GenericMappingTools/pygmt/actions/runs/7378717925/job/20074178686?pr=2938) and the commit 07ae8f1 fixes it (see https://github.com/GenericMappingTools/pygmt/actions/runs/7378788411/job/20074335235?pr=2938).

@seisman seisman force-pushed the windows-multiprocessing branch from 7d09bcd to 12383e1 Compare January 1, 2024 16:55
@seisman seisman changed the title Set GMT_SESSION_NAME to a unique name on Windows Set GMT_SESSION_NAME to a unique name on Window for partial multiprocessing support Jan 1, 2024
@seisman seisman added the bug Something isn't working label Jan 1, 2024
@seisman seisman added this to the 0.11.0 milestone Jan 1, 2024
pygmt/tests/test_session_management.py Outdated Show resolved Hide resolved
pygmt/tests/test_session_management.py Outdated Show resolved Hide resolved
Comment on lines +19 to +21
# On Windows, need to set GMT_SESSION_NAME to a unique value
if sys.platform == "win32":
os.environ["GMT_SESSION_NAME"] = unique_name()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I thought this should best be handled upstream in GMT, but then I saw the note from https://github.com/GenericMappingTools/gmt/blob/104bcdbafd13eb96260b445910ce3587a362dbc0/src/gmt_api.c#L1050-L1057:

	/* OK, the trouble is the following. On Win, if the Windows executables are run from within a bash window
	   gmtapi_get_ppid returns different values for each call, and this completely breaks the idea
	   of using the constant PPID (parent PID) to create unique file names for each session.
	   So, given that we didn't yet find a way to make this work from within MSYS (and likely Cygwin)
	   we are forcing PPID = 0 in all Windows variants unless set via GMT_SESSION_NAME. A corollary of this
	   is that Windows users running many bash windows concurrently should use GMT_SESSION_NAME in their scripts
	   to give unique values to different scripts.  */

Would setting a unique GMT_SESSION_NAME here in PyGMT result in any issues then? I couldn't run git blame on those lines to find out more context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note (https://docs.generic-mapping-tools.org/dev/begin.html#note-on-unix-shells) is also useful. So, the key point for how GMT modern mode works is having a unique and invariable value so that GMT modules know where to find the session directory and the historical information. The value can be either the constant PPID or a number manually set by GMT_SESSION_NAME.

Would setting a unique GMT_SESSION_NAME here in PyGMT result in any issues then?

All tests still pass, so I believe it has no side effects.

Copy link
Member

@weiji14 weiji14 Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but maybe we should check if GMT_SESSION_NAME is set by the user first before overriding it. Something like this maybe (untested):

Suggested change
# On Windows, need to set GMT_SESSION_NAME to a unique value
if sys.platform == "win32":
os.environ["GMT_SESSION_NAME"] = unique_name()
# On Windows, need to set GMT_SESSION_NAME to a unique value
if sys.platform == "win32" and not "GMT_SESSION_NAME" in os.environ:
os.environ["GMT_SESSION_NAME"] = unique_name()

Copy link
Member Author

@seisman seisman Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users should NEVER set GMT_SESSION_NAME as a global environment variable. Even if they set GMT_SESSION_NAME, we should override it in PyGMT, otherwise multiprocessing won't work.

Here are steps to understand why GMT_SESSION_NAME should be overridden. Open a terminal, define the variable:

export GMT_SESSION_NAME=123456

and run gmt begin. Then you'll see the session directory ~/.gmt/sessions/gmt_session.123456/; all information will be saved in this directory. So, if a global GMT_SESSION_NAME is set, processes will write information and output to the same directory, leading to crashes.

Comment on lines +89 to +91
p.map(_gmt_func_wrapper, [f"{prefix}-1.png", f"{prefix}-2.png"])
Path(f"{prefix}-1.png").unlink()
Path(f"{prefix}-2.png").unlink()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably use GMTTempFile or a try-except block to remove the files automatically, even if the multiprocessing function fails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely not necessary here. If anything wrong happens, the two PNG files won't be generated, and Path(...).unlink() will raise an exception.

Copy link

codspeed-hq bot commented Jan 2, 2024

CodSpeed Performance Report

Merging #2938 will degrade performances by 10.27%

⚠️ No base runs were found

Falling back to comparing windows-multiprocessing (04646ad) with main (29f87b7)

Summary

❌ 1 regressions
✅ 63 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main windows-multiprocessing Change
test_grdlandmask_no_outgrid 2.3 s 2.5 s -10.27%

@seisman seisman added enhancement Improving an existing feature needs review This PR has higher priority and needs review. and removed bug Something isn't working labels Jan 2, 2024
weiji14
weiji14 previously approved these changes Jan 2, 2024
@weiji14 weiji14 added final review call This PR requires final review and approval from a second reviewer and removed needs review This PR has higher priority and needs review. labels Jan 2, 2024
@weiji14 weiji14 dismissed their stale review January 2, 2024 06:10

Had another thought

@weiji14 weiji14 added needs review This PR has higher priority and needs review. and removed final review call This PR requires final review and approval from a second reviewer labels Jan 2, 2024
@seisman seisman changed the title Set GMT_SESSION_NAME to a unique name on Window for partial multiprocessing support Set GMT_SESSION_NAME to a unique name on Windows for multiprocessing support Jan 2, 2024
@seisman seisman merged commit 88ab1ca into main Jan 2, 2024
16 of 17 checks passed
@seisman seisman deleted the windows-multiprocessing branch January 2, 2024 06:59
@seisman seisman removed the needs review This PR has higher priority and needs review. label Jan 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving an existing feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants