Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Extended_Pictographic in UnicodeProperty subclasses #97

Open
macchiati opened this issue Jul 16, 2021 · 7 comments
Open

Fix Extended_Pictographic in UnicodeProperty subclasses #97

macchiati opened this issue Jul 16, 2021 · 7 comments
Assignees

Comments

@macchiati
Copy link
Member

macchiati commented Jul 16, 2021

This is both in the JSPs and in the regular unicodetools.

There is a special hack for performance that works for most properties. Only one character in the unassigned range is tested for the property in some of the internals if the property is marked as hasUniformUnassigned() = true.

If the value is false, then every unassigned character is tested. It is the responsibility of each UnicodeProperty subclass to mark any exceptions. For example,

protected ICUProperty(String propName, int propEnum) {
      setName(propName);
      this.propEnum = propEnum;
      setType(internalGetPropertyType(propEnum));
      if (propEnum == UProperty.DEFAULT_IGNORABLE_CODE_POINT
              || propEnum == UProperty.BIDI_CLASS
              || propEnum == UProperty.BLOCK
              || propEnum == UProperty.EAST_ASIAN_WIDTH
              || propEnum == UProperty.LINE_BREAK
              || propEnum == UProperty.NONCHARACTER_CODE_POINT
              || propEnum == UProperty.PATTERN_SYNTAX
              || propEnum == UProperty.PATTERN_WHITE_SPACE
              || propEnum == UProperty.CHANGES_WHEN_CASEFOLDED
              || propEnum == UProperty.EMOJI
              || propEnum == UProperty.EMOJI_MODIFIER
              || propEnum == UProperty.EMOJI_MODIFIER_BASE
              || propEnum == UProperty.EMOJI_PRESENTATION
              ) {
        setUniformUnassigned(false);
      }
}

New properties like Extended_Pictographic need to be added to lists like this. The way to fix it is to search for setUniformUnassigned(false) and check that the lists are right.

NOTE: This performance hack was from a long time ago, and may not be needed anymore, but we should analyze the impact before removing.

@srl295 srl295 self-assigned this Jul 16, 2021
@srl295
Copy link
Member

srl295 commented Jul 16, 2021

what would be best is to have a test that verifies that the heuristic is correct… test all unassigned codepoints.

@macchiati
Copy link
Member Author

I agree. Could be pretty simple.

  1. Create a subclass of UnicodeProperty that wraps any other, but resets setUniformUnassigned(true).
  2. Walk through all the properties that have hasUniformUnassigned() = true, and create a wrapped class.
  3. For each such case, walk through all Unicode code points to verify that both instances have identical values.

@nedley
Copy link
Contributor

nedley commented Jul 20, 2021

Is this issue distinct from #53?

@macchiati
Copy link
Member Author

macchiati commented Jul 20, 2021 via email

@srl295
Copy link
Member

srl295 commented Jul 21, 2021

@macchiati done the quick fix, but need to do unit test also, so keeping this open

@macchiati
Copy link
Member Author

macchiati commented Jul 21, 2021 via email

@macchiati
Copy link
Member Author

macchiati commented Jul 21, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants