Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java.Interop.Tools.JavaSource] Improve <a> parsing #1126

Merged
merged 3 commits into from
Jul 12, 2023

Conversation

pjcollins
Copy link
Member

@pjcollins pjcollins commented Jun 16, 2023

Parsing for <a/> elements has been improved to fix all 83 cases where
conversion to a <see/> element would fail. Some examples of such
failures include:

## Unable to parse HTML element: <see href=https://www.sqlite.org/pragma.html#pragma_journal_mode>here</see>
System.Xml.XmlException: 'https' is an unexpected token. The expected token is '"' or '''. Line 1, position 11.

## Unable to parse HTML element: <see href="https://github.com/google/libphonenumber>libphonenumber</see>
System.Xml.XmlException: '<', hexadecimal value 0x3C, is an invalid attribute character. Line 1, position 67.

## Unable to parse HTML element: <see href="https://material.io/guidelines/components/progress-activity.html#progress-activity-types-of-indicators">
Progress & activity</see>
System.Xml.XmlException: An error occurred while parsing EntityName. Line 2, position 11.

## Unable to parse HTML element: <see href=#BROKEN> broken</see>
System.Xml.XmlException: '#' is an unexpected token. The expected token is '"' or '''. Line 1, position 11.

## Unable to parse HTML element: <see href=http://www.ietf.org/rfc/rfc2045.txt>RFC 2045</see>
System.Xml.XmlException: 'http' is an unexpected token. The expected token is '"' or '''. Line 1, position 11.

When we encounter an <a/> element that points to code or a local path
we will now only include the element value in the javadoc, and not the
full href attribute value.

Parsing for `<a/>` elements has been improved to fix all 83 cases where
conversion to a `<see/>` element would fail.  Some examples of such
failures include:

    ## Unable to parse HTML element: <see href=https://www.sqlite.org/pragma.html#pragma_journal_mode>here</see>
    System.Xml.XmlException: 'https' is an unexpected token. The expected token is '"' or '''. Line 1, position 11.

    ## Unable to parse HTML element: <see href="https://github.com/google/libphonenumber>libphonenumber</see>
    System.Xml.XmlException: '<', hexadecimal value 0x3C, is an invalid attribute character. Line 1, position 67.

    ## Unable to parse HTML element: <see href="https://material.io/guidelines/components/progress-activity.html#progress-activity-types-of-indicators">
    Progress & activity</see>
    System.Xml.XmlException: An error occurred while parsing EntityName. Line 2, position 11.

    ## Unable to parse HTML element: <see href=#BROKEN> broken</see>
    System.Xml.XmlException: '#' is an unexpected token. The expected token is '"' or '''. Line 1, position 11.

    ## Unable to parse HTML element: <see href=http://www.ietf.org/rfc/rfc2045.txt>RFC 2045</see>
    System.Xml.XmlException: 'http' is an unexpected token. The expected token is '"' or '''. Line 1, position 11.

When we encounter an `<a/>` element that points to code or a local path
we will now only include the element value in the javadoc, and not the
full `href` attribute value.
@pjcollins pjcollins requested a review from jonpryor June 28, 2023 20:06
@pjcollins pjcollins changed the title [Java.Interop.Tools.JavaSource] Improve <a/> parsing [Java.Interop.Tools.JavaSource] Improve <a> parsing Jul 11, 2023
@jonpryor
Copy link
Member

Parsing of `<a/>` elements would occasionally fail when they didn't
match our expectations/requirements:

  * Unquoted URLs, a'la `android/database/sqlite/SQLiteDatabase.java`:

        * <p> See <a href=https://www.sqlite.org/pragma.html#pragma_journal_mode>here</a> for more

    Resulting in:

        System.Xml.XmlException: 'https' is an unexpected token. The expected token is '"' or '''. Line 1, position 11.

    or a'la `java/io/PipedOutputStream.java`

        * @exception IOException if the pipe is <a href=#BROKEN> broken</a>,

    resulting in:

        System.Xml.XmlException: '#' is an unexpected token. The expected token is '"' or '''. Line 1, position 11.


  * Improperly quoted attributes, a'la `android/telephony/PhoneNumberUtils.java`:

        * Matching is based on <a href="https://github.com/google/libphonenumber>libphonenumber</a>.

    Resulting in:

        System.Xml.XmlException: '<', hexadecimal value 0x3C, is an invalid attribute character. Line 1, position 67.

  * Use of "raw" `&`, a'la `android/widget/ProgressBar.java`:

        * <a href="https://material.io/guidelines/components/progress-activity.html#progress-activity-types-of-indicators">
        * Progress & activity</a>.

    Resulting in:

        System.Xml.XmlException: An error occurred while parsing EntityName. Line 2, position 11.


Fix this by updating updating the `InlineHyperLinkOpenTerm` terminal
to *not* require `href`, and updating the `InlineHyperLinkDeclaration`
rule to better deal with whatever chaos is there.

When we encounter an `<a/>` element that points to code or a local
path we will now only include the element value in the javadoc, and
not the full `href` attribute value.

Replace the `IgnorableDeclaration` rule with an
`IgnorableCharTerminal` terminal.  TODO: why?  What's this *do*?

@pjcollins
Copy link
Member Author

pjcollins commented Jul 12, 2023

Replace the IgnorableDeclaration rule with an
IgnorableCharTerminal terminal. TODO: why? What's this do?

I was trying to lean on what was done in UnknownHtmlElementStartTerminal to "skip over" reserved characters with a low priority rule. The Irony docs are sparse and I didn't see a way to do this without extending Terminal.

This change fixed parsing for the <a href="mailto:nobody@google.com">nobody</a> cases, which were otherwise failing with:

 ParserMessages:
  Error (1:23): Syntax error, expected: #PCDATA, <tt>, <TT>, <i>, <I>, <a attr=, <code>, <CODE>, {@code, {@docRoot}, {@inheritDoc}, {@link, {@linkplain, {@literal, {@see, {@value}, {@value, @ , {, }, {@param, UnknownHtmlElementStart, </a>, </A>, </tt>, </TT>, </i>, </I>, </code>, </CODE>
<a href="mailto:nobody@google.com">nobody</a>
                        ^
ParserTrace:
  input=`<a href= (<a attr=)`; error? False; message=Shift to S2
  input=`"mailto:nobody (#PCDATA)`; error? False; message=Shift to S8
  input=``; error? False; message=Reduce on '<html inline decl> -> #PCDATA '
  input=`<html inline decl>`; error? False; message=Popped state from stack, pushing <html inline decl>
  input=``; error? False; message=Reduce on 'InlineDeclaration -> <html inline decl> '
  input=`InlineDeclaration`; error? False; message=Popped state from stack, pushing InlineDeclaration
  input=``; error? False; message=Reduce on 'InlineDeclaration+ -> InlineDeclaration '
  input=`InlineDeclaration+`; error? False; message=Popped state from stack, pushing InlineDeclaration+
  input=`@google.com">nobody</a> (InlineValue)`; error? True; message=Syntax error, expected: #PCDATA, <tt>, <TT>, <i>, <I>, <a attr=, <code>, <CODE>, {@code, {@docRoot}, {@inheritDoc}, {@link, {@linkplain, {@literal, {@see, {@value}, {@value, @ , {, }, {@param, UnknownHtmlElementStart, </a>, </A>, </tt>, </TT>, </i>, </I>, </code>, </CODE>
  input=`@google.com">nobody</a> (InlineValue)`; error? False; message=RECOVERING: popping stack, looking for state with error shift
  input=`@google.com">nobody</a> (InlineValue)`; error? False; message=FAILED TO RECOVER

@jonpryor jonpryor merged commit 6a9f5cb into main Jul 12, 2023
@jonpryor jonpryor deleted the dev/pjc/javadoc-ahref branch July 12, 2023 17:48
jonpryor pushed a commit to dotnet/android that referenced this pull request Jul 14, 2023
Fixes: dotnet/java-interop#1071

Context: dotnet/java-interop@d0231c5

Changes: dotnet/java-interop@e1121ea...151b03e

  * dotnet/java-interop@151b03ee: [Java.Interop.Tools.JavaSource] Improve `<code>` parsing  (dotnet/java-interop#1125)
  * dotnet/java-interop@6a9f5cbb: [Java.Interop.Tools.JavaSource] Improve `<a>` parsing (dotnet/java-interop#1126)
  * dotnet/java-interop@d0231c5c: [generator] Override methods should match base deprecated info (dotnet/java-interop#1130)

Commit dotnet/java-interop@d0231c5c updated `override` methods'
"obsolete" data to match its base method.  This results in several
`[ObsoleteOSPlatform]` attributes being switched to `[Obsolete]`
attributes.  Updated `acceptable-breakages` to allow this.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jonathan Pobst <jonathan.pobst@microsoft.com>
@github-actions github-actions bot locked and limited conversation to collaborators Apr 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants