-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
normalize_ligature not having the rigth format #74
Comments
Thank you for your comment, |
Thank you for your response, but I didn't really understand the function's role. In the documentation, it is stated as 'Normalize Lam Alef ligatures into two letters.' Does this mean it is supposed to separate them? it seams that the input and output are always the same. |
Hello, It's important to note that this function addresses the encoding of ligatures of Lam Alif in certain contexts and software. In these cases, Lam Alif ligatures may be represented as a single character, potentially causing confusion during word processing. The function is designed to convert such ligatures, defined by char codes like: # Ligatures
LAM_ALEF = u'\ufefb'
LAM_ALEF_HAMZA_ABOVE = u'\ufef7'
LAM_ALEF_HAMZA_BELOW = u'\ufef9'
LAM_ALEF_MADDA_ABOVE = u'\ufef5' into two separate letters, Lam and Alif, represented by char codes like: """
SIMPLE_LAM_ALEF = u'\u0644\u0627'
SIMPLE_LAM_ALEF_HAMZA_ABOVE = u'\u0644\u0623'
SIMPLE_LAM_ALEF_HAMZA_BELOW = u'\u0644\u0625'
SIMPLE_LAM_ALEF_MADDA_ABOVE = u'\u0644\u0622'
""" This conversion ensures proper handling of Lam Alif ligatures in contexts where individual letters are required. |
I see. Just Perfect.
output :
my question is are these the only ligatures or i can add on my own ? |
|
i'm using pypfd to extract arabic text and there are some ligatures that are nor managed very well as :
so i'm trying to find a way to add them in the LIGUATURES whithout touching the library. is there a way to extend the list of the constants |
Hi, let me suggest the https://docs.python.org/3/library/unicodedata.html#unicodedata.normalize implementation to solve the problem in general. Some Arabic-specific functionality seems to be provided by https://camel-tools.readthedocs.io/en/latest/api/utils/normalize.html on top of that. |
@otakar-smrz |
The ligatures actually need the NFKC or NFKD normalization mode to be broken down to the standard letters: |
@otakar-smrz Thank you. |
i'm trying the exemple below but i'm getting the same result as the input text
from pyarabic.araby import normalize_ligature
text = u"لانها لالء الاسلام"
normalize_ligature(text)
i'm getting output : لانها لالء الاسلام instead of "لانها لالئ الاسلام"
And thanks for your help - very helpfull library
The text was updated successfully, but these errors were encountered: