-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JVM crash on large polyfills #21
Comments
So part of this is simply the relative memory consumption of bare C versus Java with its object wrapping (the difference between the C and Java tests on how fine of a resolution you're getting to). Part of it is the inefficient (but guaranteed correct) way polyfill operates (it determines the minimum circle necessary to surround the bounding box of your polygon, then determines the "k-ring" sized hexagon-like fill necessary to surround that, and then perform point-in-polygon searches across that set to determine the actual set of hexagons contained within the polygon). And then just the practical consideration that there are over 33 billion hexagons on the planet at resolution 10 and the absolutely most-compact representation (all at the same hexagon resolution) would require over 250GB of RAM to contain. (The highest resolution, 15, has over 569 trillion hexagons on the planet and would require a little over 4 petabytes of RAM to store.) It would be possible to represent this with far fewer hexagons if the output set was mixed-resolution (a compacted set), but I'm a bit hesitant to create a |
It's definitely true that meter-accuracy of that particular shape is unnecessary. I was running polyfill at meter-level accuracy on the census bureau's census blocks for Massachusetts to try to replace the geometries with hexagons -- resolution 10 followed by a compact seemed reasonable to make sure that a good representation of urban center blocks was being made. Unfortunately, it turns out that the census bureau also has the edge of all of Outer Cape Cod represented as a single block, hence the crash. I suppose it's particularly extreme for the circle/k-ring approximation since it's a long thin strip. Does it seem right that a workaround (without asking h3 to have to worry about returning compacted results and open that can of worms) is for me to always first run a guesstimate on the polygon using the same process found in maxPolyfillSize internally, and if that's too large, subdivide the polygon till the guesstimate is acceptable again, polyfill and compact each of those, and potentially uncompact that set if a duplication of maxUncompactSize's estimate seems reasonable? |
@willcohen that would definitely work around this issue for the time being, but the current polyfill is pretty bad in both space and time usage (it was just the simplest to implement correctly across icosahedron boundaries). This pull request is actually a stepping stone to a possible solution. If we can return to a simple 2D representation of the hexagons, then we can first trace "lines" of hexagons around the bounding box and then simply test those hexagons and all of the IJ coordinates within the defined space to determine which belong in the polyfill. (That implementation does not support crossing icosahedron edges, however, so it's only part of the solution.) The big advantage with that is two fold:
The disadvantage is the icosahedron constraint. There was talk of an automatic "unfolding" of icosahedrons and coordinate rotation to handle crossing a single edge, but we couldn't figure out a way to handle multiple icosahedrons simultaneously without weird glitches. Another solution we proposed internally was to just intersect the polygon per icosahedron and perform the point-in-poly on each generated polygon and then merge the results, but we'll have to implement polygon intersections, precise icosahedron polygon generation (maybe we already have that?), and the 2D grid (@isaacbrodsky has already done most of that). Unfortunately I don't work at Uber anymore, so I only get to spend my free time on this. That's a rough outline of one discussed approach to improving polyfill if someone wants to tackle it (or maybe even provide an even better algorithm? ;) ). |
Thanks, @dfellis. I'll probably need to follow along for a while with the discussion before jumping in. Separately, thanks to everyone for your work on the library. We're finding it quite useful! |
Glad to hear you're finding it useful! I think the approach of splitting up the polygon if too large for a single pass makes sense as a workaround. I found the C test case was failing on |
When was that switched from heap to stack and why? |
You can overflow the stack. Didn't realize how large this was. |
@dfellis I prefer stack allocation to heap whenever possible. Yes, some programmers disagree with the practice, but I find stack allocation is potentially faster and much less leak-prone than using |
@isaachier That's not a good motive to put allocation on the stack for arrays, in my mind. You'll get strange stack overflows at some point where a simple refactor of code causes an extra stack frame to be generated and the failure will be nonsensical to most developers, even if you're playing it safe with the stack allocations, because of how much "smaller" the function-call-usable stack becomes. I can understand stack allocations when dealing with a fixed number of structs or predictably-bounded arrays, but doing that on a user-defined input size is just asking for trouble. Can we revert the stack allocations in |
It is a fair point. I will check where the functions are using user-defined sizes and see what I can do. |
See uber/h3#100. |
When the following is added to
TestH3Core.java
, the jvm crashes out onh3Api.polyfill
before completing the tests. The polyfill works at resolution 9.Is this an issue with the java wrapper, or is this impossible to do with the core library?
I don't know C but adding a similar test to h3 naively copying the existing styles to
testPolyfill.c
at level 11 still seems to work. If I increase the resolution up past 11 it eventually segfaults there too.The text was updated successfully, but these errors were encountered: