Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LibUnicode+LibWeb: Rework bidirectional_class() API a bit #25458

Merged
merged 2 commits into from
Nov 24, 2024

Conversation

nico
Copy link
Contributor

@nico nico commented Nov 23, 2024

Before this commit, GenerateUnicodeData generated a
BidirectionalClass enum directly from UnicodeData.txt.

Now, GenerateUnicodeData instead generates BidirectionalClassInternal
and CharacterTypes.h has an explicit enum BidiClass with nicer names
-- no underscores, but also names more similar to ICUs names.

(Since CharacterTypes.h is included in some of LibUnicode's generators,
it can't refer to generated BidirectionalClassInternal, so we instead
have a big manual mapping switch in UnicodeData.cpp.)

bidirectional_class() used to return an Optional<BidirectionalClass>
for when unicode data was disabled. Now we return BidiClass and
just return BidiClass::LeftToRight instead of making every client
do this.

It also updates LibWeb's Element.cpp to this new system.

Also remove now-unused bidirectional_class_from_string().

This kind of cherry-picks the 4th commit of
LadybirdBrowser/ladybird#239 (aa3a30870b58c47cb37bce1418d7e6bee7af71d9):
The CharacterTypes.h, Element.cpp and TestUnicodeCharacterTypes.cpp changes are
by trflynn.

Co-authored-by: Tim Flynn trflynn89@serenityos.org

Changes the generated UnicodeData.h from

    enum class BidirectionalClass : BidirectionalClassUnderlyingType {
        AL,
        AN,
        B,
        ...
    };

to

    enum class BidirectionalClass : BidirectionalClassUnderlyingType {
        AL,
        AN,
        B,
        ...
        Arabic_Letter = AL,
        Arabic_Number = AN,
        Boundary_Neutral = BN,
        ...
    };
@nico nico requested a review from trflynn89 as a code owner November 23, 2024 19:24
@github-actions github-actions bot added 👀 pr-needs-review PR needs review from a maintainer or community member labels Nov 23, 2024
@nico nico force-pushed the unicode-bidi-alias branch from 473063c to 499c074 Compare November 23, 2024 19:34
@nico
Copy link
Contributor Author

nico commented Nov 23, 2024

@trflynn89 Do you want to look at LibUnicode changes over here, or should I just merge?

@trflynn89
Copy link
Member

Code LGTM. This does break the ability to build with UCD downloads disabled - I generally tried to keep that config building/non-crashing, but if we want to step away from that, that's fine. (I don't know if the build even currently works with UCD downloads disabled, it was just something I tested every couple months or so, or if someone using the option pointed out that it broke).

@nico
Copy link
Contributor Author

nico commented Nov 23, 2024

I didn't test it, but I intended that that keeps working. There's still a weak symbol of the generated function. Is there more I have to take care of?

(I suppose I could actually try building with it disabled...)

@nico
Copy link
Contributor Author

nico commented Nov 23, 2024

...oh, probably because I'm using the generated enum values unconditionally. I'll tweak that, thanks!

Before this commit, GenerateUnicodeData generated a
`BidirectionalClass` enum directly from UnicodeData.txt.

Now, GenerateUnicodeData instead generates `BidirectionalClassInternal`
and CharacterTypes.h has an explicit enum `BidiClass` with nicer names
-- no underscores, but also names more similar to ICUs names.

(Since CharacterTypes.h is included in some of LibUnicode's generators,
it can't refer to generated `BidirectionalClassInternal`, so we instead
have a big manual mapping switch in UnicodeData.cpp.)

`bidirectional_class()` used to return an `Optional<BidirectionalClass>`
for when unicode data was disabled. Now we return `BidiClass` and
just return `BidiClass::LeftToRight` instead of making every client
do this.

It also updates LibWeb's Element.cpp to this new system.

Also remove now-unused `bidirectional_class_from_string()`.

This kind of cherry-picks the 4th commit of
LadybirdBrowser/ladybird#239 (aa3a30870b58c47cb37bce1418d7e6bee7af71d9):
The CharacterTypes.h, Element.cpp and TestUnicodeCharacterTypes.cpp changes are
by trflynn.

Co-authored-by: Tim Flynn <trflynn89@serenityos.org>
@nico nico force-pushed the unicode-bidi-alias branch from 499c074 to b28a411 Compare November 24, 2024 00:13
@nico
Copy link
Contributor Author

nico commented Nov 24, 2024

Done!

(I did try changing ENABLE_UNICODE_DATABASE_DOWNLOAD to OFF in Build/lagom/CMakeCache.txt and checked that Meta/serenity.sh build lagom pdf failed to build, and after the tweak it builds fine.)

@nico nico merged commit 0197bfc into SerenityOS:master Nov 24, 2024
12 checks passed
@nico nico deleted the unicode-bidi-alias branch November 24, 2024 00:24
@github-actions github-actions bot removed the 👀 pr-needs-review PR needs review from a maintainer or community member label Nov 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants