chore: Some tokenizer as language module #1186

miurahr · 2024-11-10T23:52:55Z

Tokenizer is used for the project in which the source or target language is the same as the provided one.
It is feasible to move tokenizers to the corresponding language module.

Pull request type

Build/release
refactoring

Which ticket is resolved?

What does this PR change?

Move Japanese, Chinese and Polish tokenizers to the corresponding language module.
There are derived from 3rd party analyzer library, that are not mandatory for other languages.

Other information

Depends on #1183

github-actions · 2024-11-25T03:43:40Z

❌ Quality checks failed.

Please look a Gradle Scan page for details:
https://gradle.com/s/pumbahasjqq7o

github-actions · 2024-11-25T03:43:54Z

❌ Acceptance Tests failed.

Please look a Gradle Scan page for details:
https://gradle.com/s/ovcrp7uznerco

github-actions · 2024-11-25T05:29:58Z

❌ Acceptance Tests failed.

Please look a Gradle Scan page for details:
https://gradle.com/s/fzg4iulhtvxnk

github-actions · 2024-11-25T05:30:02Z

❌ Quality checks failed.

Please look a Gradle Scan page for details:
https://gradle.com/s/c3koek5ldd6em

miurahr · 2024-11-25T05:34:15Z

There is no obvious benefit to move tokenizer classes to language-module and change package paths. Most tokenizers come from Lucene-analyzer-common library that is a mandatory library of OmegaT core. There are three languages that can be justificated for the change; Japanese, Chinese and Polish that are provided their special analyzer.

Signed-off-by: Hiroshi Miura <miurahr@linux.com>

github-actions · 2024-11-25T06:00:25Z

❌ Quality checks failed.

Please look a Gradle Scan page for details:
https://gradle.com/s/ky7t64iz5p6rw

github-actions · 2024-11-25T06:00:36Z

❌ Acceptance Tests failed.

Please look a Gradle Scan page for details:
https://gradle.com/s/omkyj6vdomduu

miurahr · 2024-11-25T08:58:31Z

This is not feasible to split tokenizer to module. I want to give up here.

miurahr force-pushed the topic/miurahr/tokenizer/tokenizer-as-language-modules branch 2 times, most recently from 50fac34 to 3ce1c96 Compare November 25, 2024 03:42

omegat-org deleted a comment from github-actions bot Nov 25, 2024

chore: move JA, ZH and PT tokenizer to language-module

3492567

Signed-off-by: Hiroshi Miura <miurahr@linux.com>

miurahr force-pushed the topic/miurahr/tokenizer/tokenizer-as-language-modules branch from e4b94c2 to 3492567 Compare November 25, 2024 05:45

miurahr changed the title ~~tokenizer as language module~~ chore: Some tokenizer as language module Nov 25, 2024

This comment was marked as outdated.

Sign in to view

Update tests and dependencies

649e8e0

Signed-off-by: Hiroshi Miura <miurahr@linux.com>

miurahr added the refactoring label Nov 25, 2024

miurahr closed this Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: Some tokenizer as language module #1186

chore: Some tokenizer as language module #1186

miurahr commented Nov 10, 2024 •

edited

Loading

github-actions bot commented Nov 25, 2024

github-actions bot commented Nov 25, 2024

github-actions bot commented Nov 25, 2024

github-actions bot commented Nov 25, 2024

miurahr commented Nov 25, 2024 •

edited

Loading

This comment was marked as outdated.

This comment was marked as outdated.

github-actions bot commented Nov 25, 2024

github-actions bot commented Nov 25, 2024

miurahr commented Nov 25, 2024

chore: Some tokenizer as language module #1186

chore: Some tokenizer as language module #1186

Conversation

miurahr commented Nov 10, 2024 • edited Loading

Pull request type

Which ticket is resolved?

What does this PR change?

Other information

github-actions bot commented Nov 25, 2024

github-actions bot commented Nov 25, 2024

github-actions bot commented Nov 25, 2024

github-actions bot commented Nov 25, 2024

miurahr commented Nov 25, 2024 • edited Loading

This comment was marked as outdated.

This comment was marked as outdated.

github-actions bot commented Nov 25, 2024

github-actions bot commented Nov 25, 2024

miurahr commented Nov 25, 2024

miurahr commented Nov 10, 2024 •

edited

Loading

miurahr commented Nov 25, 2024 •

edited

Loading