Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Invalid Polygon from Contouring (Issue #2370) #2373

Closed
wants to merge 11 commits into from

Conversation

Clarmy
Copy link

@Clarmy Clarmy commented Apr 19, 2024

Rationale

Resolve the issue mentioned in #2370

Implications

This change will affect the behavior of contourf and essentially fixes a bug in the contourf function.

@CLAassistant
Copy link

CLAassistant commented Apr 19, 2024

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@rcomer rcomer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Clarmy thanks for digging into this and sorry I didn't get a chance to think about it properly till now. Your change makes sense to me and I just have a couple of minor comments about the code structure.

Ideally this should also get a test but, since this problem is so data dependent, I'm struggling to think how it could be written without uploading a lot of sample data (which we don't want to do!)

One other thing I'm not sure about is which branch the PR should target. @greglucas since you have handled recent releases can you comment on that?

lib/cartopy/mpl/patch.py Outdated Show resolved Hide resolved
lib/cartopy/mpl/patch.py Outdated Show resolved Hide resolved
@rcomer rcomer linked an issue Apr 19, 2024 that may be closed by this pull request
@greglucas greglucas changed the base branch from v0.23.x to main April 20, 2024 13:30
@greglucas
Copy link
Contributor

One other thing I'm not sure about is which branch the PR should target. @greglucas since you have handled recent releases can you comment on that?

We should target main and then backport the commit to the other branch if we want to make a v0.23.1 release. I've updated this PR to target main now.

Do we know why we are getting duplicate points/bad geometries from our other code paths? This is likely OK to fix here, but it might be nice to at some point to figure out why we are producing the bad geometries in the first place and see if we can fix this issue at the source.

@Clarmy
Copy link
Author

Clarmy commented Apr 20, 2024

Do we know why we are getting duplicate points/bad geometries from our other code paths? This is likely OK to fix here, but it might be nice to at some point to figure out why we are producing the bad geometries in the first place and see if we can fix this issue at the source.

It might be somewhat difficult for me to identify the root cause of this issue, but I can provide some helpful clues. Based on the Traceback information I initially encountered:

shapely.errors.GEOSException: TopologyException: side location conflict at 97 16.5. This can occur if the input geometry is invalid.

According to the hint, the conflicting coordinate is at (97, 16.5). Therefore, when I catch the GEOSException, I map the polygon object to geojson and save it, with the file available for download: invalid.geojson.zip

In the invalid.geojson, we can look for clues at the (97, 16.5) coordinate. Here is a segment from the file:

[97.25, 16.517156862745097], [97.0, 16.5], [97.0, 16.25], [97.0, 16.25], [97.0, 16.25], [97.0, 16.5], [96.75, 16.507075471698112]

It is evident that the consecutive points from [97.0, 16.5] to [97.0, 16.5] are actually along the meridian at longitude 97.0 (i.e., on the same line), which is where the topology error in the polygon arises.

@Clarmy Clarmy requested a review from rcomer April 20, 2024 15:00
@Clarmy
Copy link
Author

Clarmy commented Apr 20, 2024

@rcomer I'm not very familiar with the CI process for cartopy, but I see that some tests have failed in the CI, and pre-commit.ci has also been failing continuously. I'm not sure what impact this has on the final merge, and how to check and resolve the issue of inconsistent test drawing results under some versions in the CI?

@lgolston
Copy link
Contributor

pre-commit is failing on the ruff step, with the issue here just being that import shapely should go above import shapely.geometry as sgeom.

For the test failures on two of the builds, it shouldn't affect this PR as it appears related to an unexplained issue where some builds are intermittently failing (currently being investigated in #2367).

@lgolston
Copy link
Contributor

Guess it also is looking for two empty lines after the the last import, instead of one.

@Clarmy
Copy link
Author

Clarmy commented Apr 21, 2024

@lgolston It worked! Thank you for your guidance.

@rcomer
Copy link
Member

rcomer commented Apr 22, 2024

Do we know why we are getting duplicate points/bad geometries from our other code paths? This is likely OK to fix here, but it might be nice to at some point to figure out why we are producing the bad geometries in the first place and see if we can fix this issue at the source.

If we plot the data with Matplotlib's contour and zoom in, we can see the problem loop in the middle:

image

So I agree with @Clarmy that the problem does not originate in Cartopy and the best we can do here is handle it when it comes up. We could open an issue with Contourpy to see if there is anything to be done there though.

@rcomer
Copy link
Member

rcomer commented Apr 22, 2024

I have opened an issue at contourpy/contourpy#377, and in the process came up with an example that I think is small enough to make into a test (though I see you have also added a test in the meantime @Clarmy):

import matplotlib.pyplot as plt
import numpy as np
import cartopy.crs as ccrs

lons = [96.75, 97.  , 97.25]
lats = np.arange(17.25, 15.9, -0.25)

lons, lats = np.meshgrid(lons, lats)

data = [[26.9, 43.7, 26.8],
        [33.2, 65.8, 34.6],
        [54.7, 66.9, 55.5],
        [65.3, 65. , 65.7],
        [65.9, 65. , 65.6],
        [65.8, 65.2, 65.5]]

fig = plt.figure()
ax = fig.add_subplot(projection=ccrs.Mercator())
ax.contourf(lons, lats, data, levels=[60, 65], transform=ccrs.PlateCarree())

@Clarmy
Copy link
Author

Clarmy commented Apr 22, 2024

@rcomer Thank you for your continued effort on this issue; your test case is superbly done. The test case I added was specifically for testing the function path_to_geos, whereas yours more closely matches the actual plotting scenario. However, I'm unsure about the best script to insert your test case into. Could you possibly add a suggestion during the code review?

@rcomer
Copy link
Member

rcomer commented Apr 22, 2024

test case is superbly done

I just took your data and sliced it as much as I could while still reproducing the error! It's great to have the example by the way - the error has been reported previously (e.g. #1546) but we did not have enough information to reproduce.

We have a test file for contour plots here:
https://github.com/SciTools/cartopy/blob/main/lib/cartopy/tests/mpl/test_contour.py

@rcomer
Copy link
Member

rcomer commented Apr 22, 2024

By the way, you can also install pre-commit locally to help with linting each time you commit: https://scitools.org.uk/cartopy/docs/latest/contribute.html

@Clarmy
Copy link
Author

Clarmy commented Apr 22, 2024

Okay, I understand. Thank you very much!

@Clarmy
Copy link
Author

Clarmy commented Apr 24, 2024

Sorry, I've found that the modified code introduced a new bug in actual use, and I'm currently addressing it. I will provide a more detailed description later.

@Clarmy
Copy link
Author

Clarmy commented Apr 24, 2024

The situation we are now facing is more complex than it was at the beginning. I've discovered a new issue: if we use this data to plot a graph, it still causes an error when run on my modified cartopy code. The error it generates is:

Traceback (most recent call last):
  File "/opt/main.py", line 105, in <module>
    rc.plot(
  File "/opt/plot.py", line 237, in plot
    ax.contourf(
  File "/usr/local/lib/python3.9/site-packages/cartopy/mpl/geoaxes.py", line 307, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/cartopy/mpl/geoaxes.py", line 351, in wrapper
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/cartopy/mpl/geoaxes.py", line 1653, in contourf
    self.update_datalim(result.get_datalim(self.transData))
  File "/usr/local/lib/python3.9/site-packages/matplotlib/collections.py", line 267, in get_datalim
    paths = [transform.transform_path_non_affine(p) for p in paths]
  File "/usr/local/lib/python3.9/site-packages/matplotlib/collections.py", line 267, in <listcomp>
    paths = [transform.transform_path_non_affine(p) for p in paths]
  File "/usr/local/lib/python3.9/site-packages/matplotlib/transforms.py", line 2426, in transform_path_non_affine
    return self._a.transform_path_non_affine(path)
  File "/usr/local/lib/python3.9/site-packages/cartopy/mpl/geoaxes.py", line 175, in transform_path_non_affine
    geoms = cpatch.path_to_geos(src_path)
  File "/usr/local/lib/python3.9/site-packages/cartopy/mpl/patch.py", line 216, in path_to_geos
    geom = sgeom.Polygon(external_geom.exterior, exteriors)
AttributeError: 'MultiPolygon' object has no attribute 'exterior'

The reason for this error is that in my modified code, the step collection[-1][0].buffer(0) that fixes topological errors results in a MultiPolygon object (originally it was a Polygon), meaning the polygon object type changes during topological repair.

To investigate the cause, I exported both the original Polygon and the new MultiPolygon objects into GeoJSON files, and then displayed them in QGIS:

image

It seems there is no apparent problem, so why does it turn into multiple polygons? After reading the MultiPolygon object, I found it consists of two separate polygons. The smaller of these polygons is located around the coordinates [53.0, 47.5] and is a small polygon composed of over 20 points. When I zoom into this area in QGIS, I can see that there is actually a breakpoint that has cut the polygon into two parts. This is somewhat similar to the 'conflict points' we encountered before, but the impact of this conflict point is different this time.

image

I'm sharing this information for now, and as for the solution, I am still exploring.

fix_polygon.geojson.zip
raw_polygon.geojson.zip

# is invalid. In this case, we can't use the contains method.
# Therefore, we need to perform a repair on the Polygon object
# and then attempt the contains method again. Related Issue: #2370
polygon = collection[-1][0].buffer(0.001).buffer(-0.001)
Copy link
Author

@Clarmy Clarmy Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have tried using the .buffer(0.001).buffer(-0.001) to perform a dilation and contraction operation on the polygon, and so far, the results are quite satisfactory. Although the code is not particularly elegant, it can solve the actual problem. At the same time, it's important to note that this kind of scaling operation produces a slight rounded effect in the inner corners. When I set the scaling parameter small enough (for example, 0.001), the difference is almost negligible.

image

image

@greglucas
Copy link
Contributor

@Clarmy Are you able to recursively or iteratively add the geometries to the original set of geometries? I think it makes sense to have multiple distinct polygons if we need them and not buffer to add artificial gaps to connect geometries.

if geom is multi-polygon:
    for poly in geom:
        # Might need to check for linestrings here too?
        append to collection
elif geom is Polygon:
    append to collection

Perhaps using "make_valid" is better than try/except as well? contourpy/contourpy#377
Also, see the two examples here:
https://shapely.readthedocs.io/en/stable/manual.html#validation.make_valid
which I think show the two potential issues we're running into: An bowtie geometry creating two different polygons and an extra linestring hanging around.

@rcomer
Copy link
Member

rcomer commented Apr 24, 2024

Just to note that I tried make_valid on @Clarmy's original bad polygon and it took about 50x longer than the buffer approach. Also make_valid only came in at Shapely v1.8 and we currently test against v1.7.

I was also thinking that try/except should be faster than is_valid since the vast majority of these contours are valid, but I don't think I actually checked that.

@Clarmy
Copy link
Author

Clarmy commented Apr 24, 2024

@greglucas I have referred to the example in contourpy/contourpy#377, but unfortunately, based on my actual testing, I found that the make_valid function does not guarantee the return type as a shapely Polygon object. Depending on the input data, it could also return a MultiPolygon or even a GeometryCollection type , which obviously results in uncontrollable outcomes and introduces more complexity and unpredictability to the program. Any data type other than Polygon could cause subsequent program failures. Additionally, regarding the use of try/except, I initially wanted to check each geom object with the .is_valid function, but I found that the check itself incurs significant time overhead, making the function almost unusable, as @rcomer described. If using try/except, it skips the normal polygons and only handles the problematic parts, minimizing the program's additional overhead.

@Clarmy
Copy link
Author

Clarmy commented Apr 24, 2024

I'm not sure if I'm misunderstanding the contour/contourf function, but I've always thought that each Path in the collection object returned by the matplotlib contour/contourf function represents a separate polygon, without any disconnected polygons appearing within the same Path object (or even if there was a "break" as mentioned earlier, they should theoretically belong to the same polygon, just due to some kind of "coincidence"). So, my approach is not to preserve the separated structure when a MultiPolygon occurs, but rather to think about how to merge them together.

@rcomer
Copy link
Member

rcomer commented Apr 25, 2024

contour/contourf used to create one path per contour as you say but there was a change at Matplotlib v3.8 so they now create one path per level.
https://matplotlib.org/stable/api/prev_api_changes/api_changes_3.8.0.html#contourset-is-now-a-single-collection

Note that this part of Cartopy is quite general and handles anything that is drawn with a path (which I think is most things?) I think the main other example of needing to sort out internal/external polygons is for the FeatureArtist, hence the code comments about land masses with lakes in. So at this point in the code we don't actually know if we are working with contours or something else.

It does bother me that, specifically for the contour case the overall process is effectively

  • ContourPy creates individual contours
  • Matplotlib concatenates the contours and makes paths
  • Cartopy splits the paths back down and tries to figure out which contours are inside each other

It would be nice if we could somehow intercept the contours straight out of ContourPy when it is presumably more obvious what the individual contours are. I did have a couple of attempts studying the relevant parts of Matplotlib to figure that out, but I didn't come up with anything.

@Clarmy
Copy link
Author

Clarmy commented Apr 25, 2024

I understand now, so path_to_geos is a very general function that handles all Path objects built with matplotlib, converting them into shapely geometric objects for further projection transformations and other tasks. This function can be applied not only to contours but also to national borders, inland water systems, and more. Is my understanding correct?

If so, I believe that the modifications in this PR should not affect other functionalities (assuming they were previously working properly). The issue of generating invalid polygon objects would likely only occur when generating contours, as these are interpolated on the spot from grid data. For other data like those used in FeatureArtist, which are pre-generated, the input graphic paths should theoretically already be free from topological errors and valid.

Of course, I cannot guarantee that my understanding is completely accurate, so please point out any obvious errors if there are any.

Moreover, if we directly obtain the paths from ContourPy, does it imply that this part of the code needs to be refactored?

@rcomer
Copy link
Member

rcomer commented Apr 25, 2024

I think your understanding is correct

if we directly obtain the paths from ContourPy, does it imply that this part of the code needs to be refactored?

This is really just a vague aspiration and I don't know if it's even possible, so I wouldn't worry about that.

@greglucas
Copy link
Contributor

If we are getting 1 Path per level now from contouring and that is causing issues (?), would we need to somehow create MultiPolygons per level, while also accounting for holes? It looks like we create MultiLineStrings, but not MultiPolygons later in the function. However, I think that is after the errors that are cropping up, so probably not related.

The +/- buffering seems like a hack to me and may not be suitable for all resolutions / scales. One other thought that comes to mind is that this is due to the contains check failing. Is there anything we can say about the contains and set is_inside = False if it raises the exception and avoid modifying the geometry?

@Clarmy
Copy link
Author

Clarmy commented Apr 26, 2024

The +/- buffering seems like a hack to me and may not be suitable for all resolutions / scales. One other thought that comes to mind is that this is due to the contains check failing. Is there anything we can say about the contains and set is_inside = False if it raises the exception and avoid modifying the geometry?

I can start by running some tests to see.

@Clarmy
Copy link
Author

Clarmy commented Apr 26, 2024

@greglucas Regarding your suggestions, I have conducted the following work:

  1. To understand if it is possible to directly set is_inside to False when a GEOSException occurs, I checked the actual value of is_inside after using buffer(0) for repair when the program encounters the exception. I found that in my tests, the actual value is True. This result at least indicates that setting it to False directly when an error occurs could lead to some results that are inconsistent with expectations (even if these inconsistencies might be very slight).
  2. After repairing the polygon to a multipolygon using buffer(0), I tried to perform type judgment and then handle the MultiPolygon separately based on my understanding, attempting to integrate it back into the original collection. However, this ultimately failed, resulting in significant color loss in the rendered images. My current view is that the order of polygons in the collection is important. If we try to integrate the MultiPolygon individually, it will disrupt the original order of the list in the collection, leading to unpredictable graphical issues.
  3. Regarding your comment that a single buffer value is not suitable for all scales, I agree. Therefore, I attempted to write some code to dynamically generate a sufficiently small buffer scale parameter based on the actual size of the polygon to address this issue.

@rcomer
Copy link
Member

rcomer commented Apr 26, 2024

I tried @Clarmy's original example with an old environment we have lying around (Cartopy 0.18, Matplotlib 3.3.4) and the error is the same, so I don't think the recent ContourSet changes caused this.

Does the contains method work on the MultiPolygon and GeometryCollection? If so I wonder if we could use the output of buffer(0) to define is_inside, but not actually replace the original polygon. We don't see errors when plotting the same data with contour, so presumably the actual transformation and plotting steps don't care if the polygon is invalid. Downside would be that we could end up in the except multiple times for a given bad polygon.

[I see see @Clarmy pushed another change while I was typing this so maybe this suggestion is redundant.]

@Clarmy
Copy link
Author

Clarmy commented Apr 26, 2024

If so I wonder if we could use the output of buffer(0) to define is_inside, but not actually replace the original polygon.

I just tried locally to only judge the value of is_inside without modifying the collection, but in the end, it still throws a GEOSException elsewhere, as you mentioned, which might require multiple exception catches across various modules.

@greglucas
Copy link
Contributor

Where does this PR stand? Is it ready for review or were we going to update some of the other areas of code to handle these cases instead?
#2373 (comment)

@MHBalsmeier
Copy link
Contributor

MHBalsmeier commented Jun 25, 2024

Hi all,
I encountered the same problem when plotting wave data from the German CWAM wave model, this is a small script reproducing it (cartopy 0.23.0):

import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import numpy as np
import netCDF4 as nc

ds = nc.Dataset("cwam_sig_wave_height.nc", "r", format = "NETCDF4")
lat_deg = ds["lat_deg"][:]
lon_deg = ds["lon_deg"][:]
swh = ds["wave_height"][:]
ds.close()

fig = plt.figure()
ax = plt.axes()
ax.contourf(lon_deg, lat_deg, swh)
plt.show()

fig = plt.figure()
proj = ccrs.Mercator(central_longitude = 11.8)
ax = plt.axes(projection = proj)
ax.contourf(lon_deg, lat_deg, swh, transform = ccrs.PlateCarree())

cwam_nc.zip

Maybe this helps for testing.
It looks like a complex issue, currently I cannot help any further, but maybe I'll find the time to dig into it as well.

@greglucas greglucas changed the title fix: Issue #2370 fix: Invalid Polygon from Contouring (Issue #2370) Oct 23, 2024
@greglucas
Copy link
Contributor

Sorry for losing track of this @Clarmy, are you interested in coming back to finish this up?

@rcomer just moved/renamed and refactored this function a bit in #2455 and now we depend on shapely>=2.0 so some things might be able to be simplified here now.

@Clarmy
Copy link
Author

Clarmy commented Oct 28, 2024

@greglucas Sorry for not paying attention to this PR for quite some time. It seems that there have been significant changes in Cartopy's code since this PR was submitted. I'm also not sure if the issue that this PR aims to address still exists. I can run some tests with the latest code and the previous data to see if the issue has already been resolved. It might turn out that there's nothing more for me to do.

@Clarmy
Copy link
Author

Clarmy commented Oct 28, 2024

@greglucas @rcomer Due to this PR being outdated, there are some conflicts with the latest code as versions have iterated. Therefore, I made modifications based on the latest code and submitted a new PR: #2463.
I have tested the code myself, and it does solve the previous issues, although some of the implementations are not very elegant. It's up to your discretion; if you prioritize solving the problem over the elegance of the code, you can approve and merge this PR.

@Clarmy
Copy link
Author

Clarmy commented Oct 28, 2024

New PR: #2463

@Clarmy Clarmy closed this Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A GEOSException error occurred when calling contourf
6 participants