Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Some tokenizer as language module #1186

Closed

Conversation

miurahr
Copy link
Member

@miurahr miurahr commented Nov 10, 2024

Tokenizer is used for the project in which the source or target language is the same as the provided one.
It is feasible to move tokenizers to the corresponding language module.

Pull request type

  • Build/release
  • refactoring

Which ticket is resolved?

What does this PR change?

  • Move Japanese, Chinese and Polish tokenizers to the corresponding language module.
  • There are derived from 3rd party analyzer library, that are not mandatory for other languages.

Other information

Depends on #1183

@miurahr miurahr force-pushed the topic/miurahr/tokenizer/tokenizer-as-language-modules branch 2 times, most recently from 50fac34 to 3ce1c96 Compare November 25, 2024 03:42
@omegat-org omegat-org deleted a comment from github-actions bot Nov 25, 2024
@omegat-org omegat-org deleted a comment from github-actions bot Nov 25, 2024
@omegat-org omegat-org deleted a comment from github-actions bot Nov 25, 2024
@omegat-org omegat-org deleted a comment from github-actions bot Nov 25, 2024
@omegat-org omegat-org deleted a comment from github-actions bot Nov 25, 2024
@omegat-org omegat-org deleted a comment from github-actions bot Nov 25, 2024
Copy link

❌ Quality checks failed.

Please look a Gradle Scan page for details:
https://gradle.com/s/pumbahasjqq7o

Copy link

❌ Acceptance Tests failed.

Please look a Gradle Scan page for details:
https://gradle.com/s/ovcrp7uznerco

Copy link

❌ Acceptance Tests failed.

Please look a Gradle Scan page for details:
https://gradle.com/s/fzg4iulhtvxnk

Copy link

❌ Quality checks failed.

Please look a Gradle Scan page for details:
https://gradle.com/s/c3koek5ldd6em

@miurahr
Copy link
Member Author

miurahr commented Nov 25, 2024

There is no obvious benefit to move tokenizer classes to language-module and change package paths. Most tokenizers come from Lucene-analyzer-common library that is a mandatory library of OmegaT core. There are three languages that can be justificated for the change; Japanese, Chinese and Polish that are provided their special analyzer.

Signed-off-by: Hiroshi Miura <miurahr@linux.com>
@miurahr miurahr force-pushed the topic/miurahr/tokenizer/tokenizer-as-language-modules branch from e4b94c2 to 3492567 Compare November 25, 2024 05:45
@miurahr miurahr changed the title tokenizer as language module chore: Some tokenizer as language module Nov 25, 2024

This comment was marked as outdated.

This comment was marked as outdated.

Signed-off-by: Hiroshi Miura <miurahr@linux.com>
Copy link

❌ Quality checks failed.

Please look a Gradle Scan page for details:
https://gradle.com/s/ky7t64iz5p6rw

Copy link

❌ Acceptance Tests failed.

Please look a Gradle Scan page for details:
https://gradle.com/s/omkyj6vdomduu

@miurahr
Copy link
Member Author

miurahr commented Nov 25, 2024

This is not feasible to split tokenizer to module. I want to give up here.

@miurahr miurahr closed this Nov 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant