Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Fix random failures in Windows CI #758

Closed
wants to merge 18 commits into from
Closed

WIP: Fix random failures in Windows CI #758

wants to merge 18 commits into from

Conversation

seisman
Copy link
Member

@seisman seisman commented Dec 21, 2020

Description of proposed changes

All PyGMT tests, excluding a few marked as xfail or xskip, should pass on all platforms (Linux/macOS/Windows).

However, the tests sometimes fail on Windows. Generally, there are two different types of failures:

  1. as reported in Error while using the DEM.grd as a numpy variable #717, calling the GMT library sometimes raises the error
    OSError: exception: access violation reading 0x000001E880146FBC.
    It's difficult to debug, as none of us are developing PyGMT on Windows.
  2. Tests that calling grdcut and/or grdfilter sometimes fail, with the error messages shown below. It turns out that the file exists but has a zero size, so it's an invalid netCDF file. However, it's still puzzling why the file size is zero.
WARNING: D:\a\pygmt\pygmt\examples\tutorials\3d-perspective-image.py failed to execute correctly: Traceback (most recent call last):
  File "D:\a\pygmt\pygmt\examples\tutorials\3d-perspective-image.py", line 12, in <module>
    grid = pygmt.datasets.load_earth_relief(resolution="05m", region=[-108, -103, 35, 40])
  File "C:\Miniconda3\envs\test\lib\site-packages\pygmt\helpers\decorators.py", line 411, in new_module
    return module_func(*args, **kwargs)
  File "C:\Miniconda3\envs\test\lib\site-packages\pygmt\datasets\earth_relief.py", line 103, in load_earth_relief
    grid = grdcut(f"@earth_relief_{resolution}{reg}", region=region)
  File "C:\Miniconda3\envs\test\lib\site-packages\pygmt\helpers\decorators.py", line 270, in new_module
    return module_func(*args, **kwargs)
  File "C:\Miniconda3\envs\test\lib\site-packages\pygmt\helpers\decorators.py", line 411, in new_module
    return module_func(*args, **kwargs)
  File "C:\Miniconda3\envs\test\lib\site-packages\pygmt\gridops.py", line 112, in grdcut
    with xr.open_dataarray(outgrid) as dataarray:
  File "C:\Miniconda3\envs\test\lib\site-packages\xarray\backends\api.py", line 701, in open_dataarray
    dataset = open_dataset(
  File "C:\Miniconda3\envs\test\lib\site-packages\xarray\backends\api.py", line 572, in open_dataset
    store = opener(filename_or_obj, **extra_kwargs, **backend_kwargs)
  File "C:\Miniconda3\envs\test\lib\site-packages\xarray\backends\netCDF4_.py", line 364, in open
    return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
  File "C:\Miniconda3\envs\test\lib\site-packages\xarray\backends\netCDF4_.py", line 314, in __init__
    self.format = self.ds.data_model
  File "C:\Miniconda3\envs\test\lib\site-packages\xarray\backends\netCDF4_.py", line 373, in ds
    return self._acquire()
  File "C:\Miniconda3\envs\test\lib\site-packages\xarray\backends\netCDF4_.py", line 367, in _acquire
    with self._manager.acquire_context(needs_lock) as root:
  File "C:\Miniconda3\envs\test\lib\contextlib.py", line 113, in __enter__
    return next(self.gen)
  File "C:\Miniconda3\envs\test\lib\site-packages\xarray\backends\file_manager.py", line 187, in acquire_context
    file, cached = self._acquire_with_cache_info(needs_lock)
  File "C:\Miniconda3\envs\test\lib\site-packages\xarray\backends\file_manager.py", line 205, in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)
  File "netCDF4\_netCDF4.pyx", line 2357, in netCDF4._netCDF4.Dataset.__init__
  File "netCDF4\_netCDF4.pyx", line 1925, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -51] NetCDF: Unknown file format: b'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\pygmt-ix6yj56l.nc'

Reminders

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst.
  • Write detailed docstrings for all functions/methods.
  • If adding new functionality, add an example to docstrings or tutorials.

Notes

  • You can write /format in the first line of a comment to lint the code automatically

@seisman

This comment has been minimized.

@seisman

This comment has been minimized.

@seisman seisman changed the title WIP: Debug grdcut WIP: Fix random failures in Windows CI Dec 22, 2020
@weiji14
Copy link
Member

weiji14 commented Feb 3, 2021

/test-gmt-dev try 5

@weiji14
Copy link
Member

weiji14 commented Feb 3, 2021

Windows tests still failing with PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\RUNNER~1\\AppData\\Local\\Temp\\pygmt-9nbjsh8w.nc' (see https://github.com/GenericMappingTools/pygmt/runs/1821361588?check_suite_focus=true#step:14:433).

I've been reading up the threads at astropy/astropy#7404 and joblib/joblib#806 and think it might related to how we are reading in the grids using xarray.open_dataarray via a hidden memmap setting. It's really hard to debug because 1) Few of us are good on Windows and 2) The issue is very random/intermittent. Even if we manage to find a way to disable memmap reading, we need a good way to set memmap=False only on Windows. The memmap is actually a useful feature to conserve RAM so we might want to keep it enabled on Linux/macOS if possible.

@seisman
Copy link
Member Author

seisman commented Feb 3, 2021

The issue is very random/intermittent

I think the tests fail randomly with GMT 6.1.1, but always fail with GMT dev, and the error messages are also different.

@weiji14
Copy link
Member

weiji14 commented Feb 3, 2021

The issue is very random/intermittent

I think the tests fail randomly with GMT 6.1.1, but always fail with GMT dev, and the error messages are also different.

Ok, then best to test on GMT dev (6.2.0) which consistently fails (rather than waiting for random failures), the PermissionError I got above is from the GMT Latest/Dev tests actually. I'll need try to resolve #829 locally on my end to be able to debug this 😞

Keep it outside the `with GMTTempFile()` block
@weiji14
Copy link
Member

weiji14 commented Feb 4, 2021

Ok, so the tests on GMT Dev 6.2.0 started failing on 13 January 2021 (see https://github.com/GenericMappingTools/pygmt/runs/1691984838?check_suite_focus=true#step:12:437), the previous one on 12 January 2021 was fine (https://github.com/GenericMappingTools/pygmt/runs/1685025652?check_suite_focus=true#step:12:727):

image

The commits on GMT around that date (see https://github.com/GenericMappingTools/gmt/commits/master?after=1456fa60dd7ff0722e53b1e93e983c35b54b10a4+34&branch=master) are as follows:

image

The Python dependencies (e.g. xarray) doesn't appeared to have changed, and there wasn't any commits to PyGMT's master branch between 8 Jan and 16 Jan 2021. So maybe GenericMappingTools/gmt#4646 or GenericMappingTools/gmt#4647 introduced a bug (leaning towards the latter)?

@seisman
Copy link
Member Author

seisman commented Feb 4, 2021

So maybe GenericMappingTools/gmt#4646 or GenericMappingTools/gmt#4647 introduced a bug (leaning towards the latter)?

We're using the GMT dev build from conda-forge's dev channel. It seems at that date we bumped gmt-6.2.0.dev5+c94e83f to gmt-6.2.0.dev6+98bb060 (conda-forge/gmt-feedstock#129).

@weiji14
Copy link
Member

weiji14 commented Feb 4, 2021

@seisman
Copy link
Member Author

seisman commented Feb 4, 2021

There's 117 commits in between 29 Nov 2020 to 12 January 2021

Perhaps it's easier to test on the GMT side. The GMT CI is building the GMT codes on Windows, so what we need to do is:

  • Install anaconda and pygmt dependencies (except gmt)
  • Install PyGMT
  • Let PyGMT use the GMT built from source codes
  • Run PyGMT tests

Edit: Trying in GenericMappingTools/gmt#4745

@weiji14
Copy link
Member

weiji14 commented Feb 4, 2021

Perhaps it's easier to test on the GMT side. The GMT CI is building the GMT codes on Windows, so what we need to do is:

* Install anaconda and pygmt dependencies (except gmt)

* Install PyGMT

* Let PyGMT use the GMT built from source codes

* Run PyGMT tests

Not quite sure I get you. Do you mean compiling each of 117 GMT dev versions and testing it with PyGMT? That seems like a lot of computing power needed...

@seisman
Copy link
Member Author

seisman commented Feb 4, 2021

Do you mean compiling each of 117 GMT dev versions and testing it with PyGMT? That seems like a lot of computing power needed...

We can use binary search to reduce the number of builds. Ideally, we can find the bug in less than 7 builds (2**7=128 > 117, the math may be wrong).

@weiji14
Copy link
Member

weiji14 commented Feb 4, 2021

Do you mean compiling each of 117 GMT dev versions and testing it with PyGMT? That seems like a lot of computing power needed...

We can use binary search to reduce the number of builds. Ideally, we can find the bug in less than 7 builds (2**7=128 > 117, the math may be wrong).

Very smart! Let us know when you find it.

@seisman
Copy link
Member Author

seisman commented Feb 4, 2021

Based on the tests in GenericMappingTools/gmt#4745, GenericMappingTools/gmt@b16cc28 (i.e., GenericMappingTools/gmt#4581) is the commit that introduced the bug.

@weiji14 weiji14 added the bug Something isn't working label Feb 6, 2021
@weiji14 weiji14 mentioned this pull request Feb 7, 2021
@weiji14 weiji14 added the upstream Bug or missing feature of upstream core GMT label Feb 28, 2021
@weiji14 weiji14 added this to the 0.4.0 milestone Feb 28, 2021
@weiji14
Copy link
Member

weiji14 commented Feb 28, 2021

Just following up, issue appears to be GMT not closing a file properly, should be fixed in GenericMappingTools/gmt#4777.

What we should do here in PyGMT is to remove the xfails for GMT > 6.2.0 (i.e. for the dev tests). The GMT conda-forge dev version merged at conda-forge/gmt-feedstock#134 (and conda-forge/gmt-feedstock#135) should include this fix so we can test it in our CI.

@seisman
Copy link
Member Author

seisman commented Feb 28, 2021

Just following up, issue appears to be GMT not closing a file properly, should be fixed in GenericMappingTools/gmt#4777.

What we should do here in PyGMT is to remove the xfails for GMT > 6.2.0 (i.e. for the dev tests). The GMT conda-forge dev version merged at conda-forge/gmt-feedstock#134 (and conda-forge/gmt-feedstock#135) should include this fix so we can test it in our CI.

This PR tries to fix the random failures that sometimes still pass. These tests are not marked as xfail or xskip. So we can do nothing here. The good news is, I rarely see these random failures in "GMT Dev Tests", so it may mean that these are already fixed.

@weiji14
Copy link
Member

weiji14 commented Feb 28, 2021

Right, got a bit confused as we were trying so many things to see what was broken. So can we close this PR then?

@seisman seisman closed this Feb 28, 2021
@seisman seisman deleted the debug-grdcut branch February 28, 2021 21:59
@seisman seisman modified the milestones: 0.4.0, 0.3.1 Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream Bug or missing feature of upstream core GMT
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants