-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LUCENE-10470: [Tessellator] Prevent bridges that introduce collinear edges #756
Conversation
Thank you for fixing this @iverase. However I just ran into the same Tessellation error using this branch on a different polygon (repro here, and I also attached the geojson to https://issues.apache.org/jira/browse/LUCENE-10470). Interestingly, the error only starts happening after the changes in this branch, and does not happen on the current main branch (tested using my branch that runs different Tessellators), so it's probably a side effect of the changes in this PR. Do you know what might be causing this error? |
@yixunx thanks for the input. I revised the issue again and it seems the fix is much simpler. We are actually tessellating properly the polygons but we are not detecting it properly because at the end of the process we still have some edges but those edges are just a line that can be discarded. I have added a check that test if the final edges have an area and if they don't, it won't fail. Test are passing and visual inspection of the tessellation looks good. |
@iverase As mentioned in my latest comment on Jira, I found a shape that fails after the latest fix because the final edges still have an area. I was able to get that shape to work by re-introducing your |
Thanks @Yinux! your dataset is very interesting because it seems your geometries have many parallel edges which are challenging the tessellator. In order to fix that I went back to the PR that changed the behaviour of the tessellator. In that PR we added the capability of labelling the edges of the triangles to know if they belong to the original polygon or they have been introduced by the tessellation process. One side effect is that the method In this case the issue was a collinear edge where one of the edges was returning over the previous edge and had a point on top of that edge. We can actually remove that collinearity as well so we end up with only one edge. I added that change in line 1306 and now all test are passing. I run this version over my dataset (over 400 million polygons) and all passed too. Let me know if you find more issues. |
# Conflicts: # lucene/CHANGES.txt
@yixunx if there is no further input I am planning to push this change shortly. |
@iverase The latest run of our indexing pipeline revealed some more shapes that fail to tessellate. This time I made changes so that we get all failing shapes instead of just one. There are 1.3 million failing shapes, and I'm planning to submit a sample of them this week. However, I haven't looked into the shapes yet so I'm not sure if the failures are related to this issue. I can open a separate ticket with the new shapes if you are going to merge this PR soon. Thanks! |
Ups I was not expecting so many failures. I prefer to wait until we find what is hopefully some pathological issue that can be fixed. |
Hi @iverase it turns out that many of my failed shapes failed because they contain self-intersections and I set
I ran it through different tessellator versions using my branch and this is the first shape I got that failed in Lucene 8.2.0 (all previous shapes only failed on Lucene 8.3.0 or higher). By the way I have a tangential question: the Tessellator javadoc says the polygon cannot have self-intersections, but it seems that the tessellator runs fine on shapes with self intersections unless I set |
Hi @yixunx, I had a look to the polygon and it seems to be a totally different type of error and I would prefer to handle it as a separate issue. In the meanwhile I propose to push this change as it is as it contains good fixes for the other cases, what do you think?
I was not the original contributor for this class so I cannot answer that question on the javadocs and I have always found estrange that javadocs says that polygons should not have self-intersections and then we have a method called I would porpoase to open another issue with this subject so we can capture the intention of the javadocs and see if we should take any action. |
Sounds good, thank you for looking into this @iverase. I think it makes sense to merge this PR as you suggested, and I'll open separate tickets for the latest failing shape and for my questions about self intersections. |
# Conflicts: # lucene/CHANGES.txt
…r edges (#756) Check if polygon has been successfully tessellated before we fail (we are failing some valid tessellations) and allow filtering edges that fold on top of the previous one
* main: (68 commits) LUCENE-10292: Suggest: Fix FreeTextSuggester so that getCount() returned results consistent with lookup() during concurrent build() LUCENE-10525 Improve WindowsFS emulation to catch invalid file names (apache#829) LUCENE-10508: Use MIN_WIDE_EXTENT for all wide rectangles (apache#845) LUCENE-10470: [Tessellator] Fix some failing polygons due to collinear edges (apache#756) LUCENE-10508: Fix error for rectangles with an extent close to 180 degrees (apache#824) LUCENE-10529: Fix TestTaxonomyFacetAssociations NPE when randomly indexing no documents for dim fix path to jar file in demo documentation LUCENE-10499: reduce unnecessary copy data overhead when growing array size (apache#786) LUCENE-10535: upgrade com.palantir.consistent-versions to 2.10.0 LUCENE-10533: SpellChecker.formGrams is missing bounds check (apache#836) Upgrade spotless and use runToFixMessage for 'gradlew tidy' hint. (apache#834) Fix JVM error branch logic. (apache#835) LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori (apache#805) LUCENE-8836: Speed up TermsEnum#lookupOrd on increasing sequences of ords. (apache#827) LUCENE-10528: use Xvfb in test to avoid messing up user's desktop (apache#828) LUCENE-10315: Speed up DocIdsWriter by ForUtil (apache#797) LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types (apache#812) Add two facet tests (apache#826) fail clearly on too-new JDK (apache#819) improve spotless error to suggest running 'gradlew tidy' (apache#817) ...
@iverase I opened https://issues.apache.org/jira/browse/LUCENE-10563 with the latest failing shape I attached above plus a couple more shapes that I found that also cause tessellation failures. |
Similar to LUCENE-9580, It might happen that the new edge introduced when removing a tile is collinear with edges from the resulting polygon. This situation makes this edges not eligible for filtering because of LUCENE-8620 and it makes the logic fail.
This change makes sure we don't introduce collinear edges when removing holes.