Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nicer API based on a CodePoint class #18

Merged
merged 2 commits into from
Jan 31, 2023
Merged

Add nicer API based on a CodePoint class #18

merged 2 commits into from
Jan 31, 2023

Conversation

cketti
Copy link
Owner

@cketti cketti commented Jan 27, 2023

This PR adds kotlin-codepoints-deluxe (I'm open to suggestions for a better name), a separate library built on top of kotlin-codepoints.

Copy link
Contributor

@JakeWharton JakeWharton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I would ship this as the regular and only library. It's the superior API!

Comment on lines +45 to +61
/**
* `true` if this code point is a surrogate code unit.
*/
val isSurrogate: Boolean
get() = !isSupplementary && value.toChar().isSurrogate()

/**
* `true` if this code point is a high surrogate code unit.
*/
val isHighSurrogate: Boolean
get() = !isSupplementary && value.toChar().isHighSurrogate()

/**
* `true` if this code point is a low surrogate code unit.
*/
val isLowSurrogate: Boolean
get() = !isSupplementary && value.toChar().isLowSurrogate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would these be used for? They seem to go against my intuition of what a code point is since I thought they were exclusively properties of characters.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think of the value of a code point as the index into the Unicode data table. Surrogate characters are part of that table, too.

Most of the time we don't care about surrogate pairs and want to get the code point encoded by the two surrogate code points. But occasionally you might be interested in dealing with surrogate characters on their own. Maybe we're missing a combineWith() method that returns the value of CodePoints.toCodePoint(high, low) when called like high.combineWith(low).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Unicode standard contains this:

Not all assigned code points represent abstract characters; only Graphic, Format, Control and Private-use do. Surrogates and Noncharacters are assigned code points but are not assigned to abstract characters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know!

Uses the basic functionality in `kotlin-codepoints` to provide a nicer API to work with Unicode code points.
@cketti cketti merged commit 1b30575 into main Jan 31, 2023
@cketti cketti deleted the value_class branch January 31, 2023 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants