Skip to content

Commit

Permalink
add uniquification of link ref ids with diff URLs and same text
Browse files Browse the repository at this point in the history
  • Loading branch information
vsch committed May 18, 2023
1 parent f63c795 commit b6d484c
Show file tree
Hide file tree
Showing 6 changed files with 86 additions and 16 deletions.
9 changes: 7 additions & 2 deletions VERSION.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,12 @@

## 0.64.8

* [ ] Fix: reference links in html to md conversion use the same ref when link text is the same.
* Fix: reference links in html to md conversion use the same ref when link text is the same. Now
the reference id will be generated by adding `_xxx` suffix, where `xxx` is an increasing
integer, starting at 1, and incremented until a unique reference id is generated.
* Add: `UNIQUE_LINK_REF_ID_GENERATOR`, default `(refId, index) -> String.format("%s_%d",
refId, index)`, a `BiFunction<String, Integer, String>` taking refId string and integer
index and returning a string for the "uniquified" ref id to use for a reference link.

## 0.64.6

Expand Down Expand Up @@ -2165,6 +2170,7 @@
[NodeInsertingPostProcessorSample.java]: https://github.com/vsch/flexmark-java/blob/master/flexmark-java-samples/src/com/vladsch/flexmark/java/samples/NodeInsertingPostProcessorSample.java
[PdfLandscapeConverter.java]: https://github.com/vsch/flexmark-java/blob/master/flexmark-java-samples/src/com/vladsch/flexmark/java/samples/PdfLandscapeConverter.java
[Prevent StringIndexOutOfBounds in ext-resizable-image by MiniDigger · Pull Request #503 · vsch/flexmark-java · GitHub]: https://github.com/vsch/flexmark-java/pull/503 "Prevent StringIndexOutOfBounds in ext-resizable-image by MiniDigger · Pull Request #503 · vsch/flexmark-java · GitHub"
[TextCollectingVisitor works better with code blocks by roxspring · Pull Request #575 · vsch/flexmark-java · GitHub]: https://github.com/vsch/flexmark-java/pull/575 "TextCollectingVisitor works better with code blocks by roxspring · Pull Request #575 · vsch/flexmark-java · GitHub"
[Update to latest maven bundle plugin. Fix for #529 by cziegeler · Pull Request #530 · vsch/flexmark-java · GitHub]: https://github.com/vsch/flexmark-java/pull/530 "Update to latest maven bundle plugin. Fix for #529 by cziegeler · Pull Request #530 · vsch/flexmark-java · GitHub"
[YouTrack: IDEA-207453]: https://youtrack.jetbrains.com/issue/IDEA-207453 "Add Conversion of ref anchor to UrlFilter for file line navigation"
[ext-resizable-image: fix images inside links by e-im · Pull Request #543 · vsch/flexmark-java · GitHub]: https://github.com/vsch/flexmark-java/pull/543 "ext-resizable-image: fix images inside links by e-im · Pull Request #543 · vsch/flexmark-java · GitHub"
Expand All @@ -2175,5 +2181,4 @@
[migrate flexmark-java 0_42_x to 0_50_0.xml]: https://github.com/vsch/flexmark-java/blob/master/assets/migrations/migrate%20flexmark-java%200_42_x%20to%200_50_0.xml
[test parsing long sequence of underscores by niklasf · Pull Request #495 · vsch/flexmark-java · GitHub]: https://github.com/vsch/flexmark-java/pull/495 "test parsing long sequence of underscores by niklasf · Pull Request #495 · vsch/flexmark-java · GitHub"
[update plugins and configure for Reproducible Builds by hboutemy · Pull Request #507 · vsch/flexmark-java · GitHub]: https://github.com/vsch/flexmark-java/pull/507 "update plugins and configure for Reproducible Builds by hboutemy · Pull Request #507 · vsch/flexmark-java · GitHub"
[TextCollectingVisitor works better with code blocks by roxspring · Pull Request #575 · vsch/flexmark-java · GitHub]: https://github.com/vsch/flexmark-java/pull/575 "TextCollectingVisitor works better with code blocks by roxspring · Pull Request #575 · vsch/flexmark-java · GitHub"
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
import org.jsoup.nodes.TextNode;

import java.util.*;
import java.util.function.BiFunction;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

Expand Down Expand Up @@ -80,6 +81,10 @@ public class FlexmarkHtmlConverter {
final public static DataKey<String> EOL_IN_TITLE_ATTRIBUTE = new DataKey<>("EOL_IN_TITLE_ATTRIBUTE", " ");
final public static DataKey<String> THEMATIC_BREAK = new DataKey<>("THEMATIC_BREAK", "*** ** * ** ***");

// Format to resolve duplicate ref id for links, use %s for the RefID and %d for the numeric addition to the text
final public static DataKey<BiFunction<String, Integer, String>> UNIQUE_LINK_REF_ID_GENERATOR = new DataKey<>("UNIQUE_LINK_REF_ID_GENERATOR"
, (refId, index) -> String.format("%s_%d", refId, index));

// Render HTML contents - UNWRAPPED
final public static DataKey<String[]> UNWRAPPED_TAGS = new DataKey<>("UNWRAPPED_TAGS", new String[] {
"article",
Expand Down Expand Up @@ -544,6 +549,7 @@ private class MainHtmlConverter extends HtmlNodeConverterSubContext {
private @Nullable Parser myParser = null;
final private @NotNull HtmlLinkResolver[] myHtmlLinkResolvers;
final private @NotNull HashMap<String, Reference> myReferenceUrlToReferenceMap; // map of URL to reference node
final private @NotNull HashMap<String, Reference> myReferenceIdToReferenceMap; // map of RefId to reference node
final private @NotNull HashSet<Reference> myExternalReferences; // map of URL to reference node

@Override
Expand Down Expand Up @@ -576,6 +582,7 @@ public HtmlConverterState getState() {
//myTrace = true;
myStateStack = new Stack<>();
myReferenceUrlToReferenceMap = new HashMap<>();
myReferenceIdToReferenceMap = new HashMap<>();
myExternalReferences = new HashSet<>();
myState = null;

Expand Down Expand Up @@ -630,16 +637,16 @@ private class SubHtmlNodeConverter extends HtmlNodeConverterSubContext implement
}

@Override
public @NotNull DataHolder getOptions() {return myOptions;}
public @NotNull DataHolder getOptions() { return myOptions; }

@Override
public @NotNull HtmlConverterOptions getHtmlConverterOptions() {return myMainNodeRenderer.getHtmlConverterOptions();}
public @NotNull HtmlConverterOptions getHtmlConverterOptions() { return myMainNodeRenderer.getHtmlConverterOptions(); }

@Override
public @NotNull Document getDocument() {return myMainNodeRenderer.getDocument();}
public @NotNull Document getDocument() { return myMainNodeRenderer.getDocument(); }

@Override
public HtmlConverterPhase getFormattingPhase() {return myMainNodeRenderer.getFormattingPhase();}
public HtmlConverterPhase getFormattingPhase() { return myMainNodeRenderer.getFormattingPhase(); }

@Override
public void render(@NotNull Node node) {
Expand Down Expand Up @@ -765,6 +772,11 @@ public void delegateRender() {
return myMainNodeRenderer.getReferenceUrlToReferenceMap();
}

@Override
public @NotNull HashMap<String, Reference> getReferenceIdToReferenceMap() {
return myMainNodeRenderer.getReferenceIdToReferenceMap();
}

@Override
public @NotNull HashSet<Reference> getExternalReferences() {
return myMainNodeRenderer.getExternalReferences();
Expand Down Expand Up @@ -891,6 +903,11 @@ public void setTrace(boolean trace) {
return myReferenceUrlToReferenceMap;
}

@Override
public @NotNull HashMap<String, Reference> getReferenceIdToReferenceMap() {
return myReferenceIdToReferenceMap;
}

@Override
public @NotNull HashSet<Reference> getExternalReferences() {
return myExternalReferences;
Expand Down Expand Up @@ -938,10 +955,10 @@ public Reference getOrCreateReference(@NotNull String url, @NotNull String text,
// create a new one with URL and if no conflict with text as id
String referenceId = text;

if (myReferenceUrlToReferenceMap.containsKey(referenceId)) {
if (myReferenceIdToReferenceMap.containsKey(referenceId)) {
for (int i = 1; ; i++) {
referenceId = text + "_" + i;
if (!myReferenceUrlToReferenceMap.containsKey(referenceId)) {
referenceId = myHtmlConverterOptions.uniqueLinkRefIdGenerator.apply(text, i);
if (!myReferenceIdToReferenceMap.containsKey(referenceId)) {
break;
}
}
Expand All @@ -958,6 +975,7 @@ public Reference getOrCreateReference(@NotNull String url, @NotNull String text,
if (firstChild instanceof Reference) {
reference = (Reference) firstChild;
myReferenceUrlToReferenceMap.put(url, reference);
myReferenceIdToReferenceMap.put(referenceId, reference);
return reference;
}
return null;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
import org.jetbrains.annotations.NotNull;

import java.util.Map;
import java.util.function.BiFunction;
import java.util.regex.Pattern;

@SuppressWarnings({ "WeakerAccess" })
Expand Down Expand Up @@ -61,6 +62,7 @@ public class HtmlConverterOptions implements MutableDataSetter {
public String nbspText;
public String thematicBreak;
public String outputAttributesNamesRegex;
public BiFunction<String, Integer, String> uniqueLinkRefIdGenerator;
public Pattern outputAttributesNamesRegexPattern;
public String outputIdAttributeRegex;
public Pattern outputIdAttributeRegexPattern;
Expand Down Expand Up @@ -125,6 +127,7 @@ public HtmlConverterOptions(HtmlConverterOptions other) {
nbspText = other.nbspText;
thematicBreak = other.thematicBreak;
outputAttributesNamesRegex = other.outputAttributesNamesRegex;
uniqueLinkRefIdGenerator = other.uniqueLinkRefIdGenerator;
outputAttributesNamesRegexPattern = other.outputAttributesNamesRegexPattern;
tableCellAlignmentMap = other.tableCellAlignmentMap;
tableOptions = other.tableOptions;
Expand Down Expand Up @@ -193,6 +196,7 @@ public HtmlConverterOptions(DataHolder options) {
thematicBreak = FlexmarkHtmlConverter.THEMATIC_BREAK.get(options);
outputAttributesNamesRegex = FlexmarkHtmlConverter.OUTPUT_ATTRIBUTES_NAMES_REGEX.get(options);
outputAttributesNamesRegexPattern = Pattern.compile(outputAttributesNamesRegex);
uniqueLinkRefIdGenerator = FlexmarkHtmlConverter.UNIQUE_LINK_REF_ID_GENERATOR.get(options);
outputIdAttributeRegex = FlexmarkHtmlConverter.OUTPUT_ID_ATTRIBUTE_REGEX.get(options);
outputIdAttributeRegexPattern = Pattern.compile(outputIdAttributeRegex);
tableCellAlignmentMap = FlexmarkHtmlConverter.TABLE_CELL_ALIGNMENT_MAP.get(options);
Expand Down Expand Up @@ -254,6 +258,7 @@ public MutableDataHolder setIn(@NotNull MutableDataHolder dataHolder) {
dataHolder.set(FlexmarkHtmlConverter.NBSP_TEXT, nbspText);
dataHolder.set(FlexmarkHtmlConverter.THEMATIC_BREAK, thematicBreak);
dataHolder.set(FlexmarkHtmlConverter.OUTPUT_ATTRIBUTES_NAMES_REGEX, outputAttributesNamesRegex);
dataHolder.set(FlexmarkHtmlConverter.UNIQUE_LINK_REF_ID_GENERATOR, uniqueLinkRefIdGenerator);
dataHolder.set(FlexmarkHtmlConverter.TABLE_CELL_ALIGNMENT_MAP, tableCellAlignmentMap);
dataHolder.set(FlexmarkHtmlConverter.OUTPUT_ID_ATTRIBUTE_REGEX, outputIdAttributeRegex);
dataHolder.set(FlexmarkHtmlConverter.EXT_MATH, extMath);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,8 @@ public interface HtmlNodeConverterContext extends NodeContext<Node, HtmlNodeConv

@NotNull HashMap<String, Reference> getReferenceUrlToReferenceMap();

@NotNull HashMap<String, Reference> getReferenceIdToReferenceMap();

@NotNull HashSet<Reference> getExternalReferences();

boolean isTrace();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ public abstract class HtmlConverterTest extends ComboSpecTestCase {
optionsMap.put("links-none", new MutableDataSet().set(FlexmarkHtmlConverter.EXT_INLINE_LINK, LinkConversion.NONE));
optionsMap.put("links-exp", new MutableDataSet().set(FlexmarkHtmlConverter.EXT_INLINE_LINK, LinkConversion.MARKDOWN_EXPLICIT));
optionsMap.put("links-ref", new MutableDataSet().set(FlexmarkHtmlConverter.EXT_INLINE_LINK, LinkConversion.MARKDOWN_REFERENCE));
optionsMap.put("links-ref-uniquifier", new MutableDataSet().set(FlexmarkHtmlConverter.UNIQUE_LINK_REF_ID_GENERATOR, (refId, index) -> String.format("%s - %d", refId, index)));
optionsMap.put("links-text", new MutableDataSet().set(FlexmarkHtmlConverter.EXT_INLINE_LINK, LinkConversion.TEXT));
optionsMap.put("links-html", new MutableDataSet().set(FlexmarkHtmlConverter.EXT_INLINE_LINK, LinkConversion.HTML));
optionsMap.put("img-none", new MutableDataSet().set(FlexmarkHtmlConverter.EXT_INLINE_IMAGE, LinkConversion.NONE));
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1399,9 +1399,47 @@ As ref
````````````````````````````````


References with text as ref id

```````````````````````````````` example(Links: 31) options(links-ref)
[Link 1][]
[Link 1]: http://example.com
.
<a href="http://example.com">Link 1</a>
````````````````````````````````


References with text as ref id, duplicated

```````````````````````````````` example(Links: 32) options(links-ref)
[Link 1][] [Link 1][Link 1_1]
[Link 1]: http://example.com
[Link 1_1]: http://example.com/link2
.
<a href="http://example.com">Link 1</a> <a href="http://example.com/link2">Link 1</a>
````````````````````````````````


References with text as ref id, duplicated

```````````````````````````````` example(Links: 33) options(links-ref, links-ref-uniquifier)
[Link 1][] [Link 1][Link 1 - 1]
[Link 1]: http://example.com
[Link 1 - 1]: http://example.com/link2
.
<a href="http://example.com">Link 1</a> <a href="http://example.com/link2">Link 1</a>
````````````````````````````````


custom resolver

```````````````````````````````` example(Links: 31) options(links-ref, link-resolver)
```````````````````````````````` example(Links: 34) options(links-ref, link-resolver)
[http://example.com][]
[http://example.com]: https://example.com 'Title'
Expand All @@ -1411,14 +1449,14 @@ custom resolver
````````````````````````````````


```````````````````````````````` example(Links: 32) options(links-ref, link-resolver)
```````````````````````````````` example(Links: 35) options(links-ref, link-resolver)
<https://example.com>
.
<a href="http://example.com">http://example.com</a>
````````````````````````````````


```````````````````````````````` example(Links: 33) options(links-ref)
```````````````````````````````` example(Links: 36) options(links-ref)
[\[Text **Bold**\]][]
[\[Text **Bold**\]]: http://example.com
Expand All @@ -1428,7 +1466,7 @@ custom resolver
````````````````````````````````


```````````````````````````````` example(Links: 34) options(links-ref)
```````````````````````````````` example(Links: 37) options(links-ref)
[![alt](image.png)](http://example.com)
.
<a href="http://example.com"><img src="image.png" alt="alt"></a>
Expand All @@ -1437,20 +1475,20 @@ custom resolver

As ref re-use document

```````````````````````````````` example(Links: 35) options(no-autolinks, links-ref, for-document)
```````````````````````````````` example(Links: 38) options(no-autolinks, links-ref, for-document)
[http://example.com][example.com]
.
<a href="http://example.com">http://example.com</a>
````````````````````````````````


```````````````````````````````` example(Links: 36) options(links-none)
```````````````````````````````` example(Links: 39) options(links-none)
.
<a href="http://example.com">http://example.com</a>
````````````````````````````````


```````````````````````````````` example Links: 37
```````````````````````````````` example Links: 40
[](#30xxx93---bug-fix-release)
.
<a href="#30xxx93---bug-fix-release"></a>
Expand Down Expand Up @@ -3544,6 +3582,7 @@ special symbols: \*~^&<>[]|`
</code></pre>
````````````````````````````````


## Skipped Fenced Code

```````````````````````````````` example(Skipped Fenced Code: 1) options(skip-fenced-code)
Expand Down

0 comments on commit b6d484c

Please sign in to comment.