Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Arabic Text Similarity and Sentiment Analyzer #151

Open
HussamHallak opened this issue Jan 2, 2025 · 1 comment
Open

[QUESTION] Arabic Text Similarity and Sentiment Analyzer #151

HussamHallak opened this issue Jan 2, 2025 · 1 comment
Assignees
Labels

Comments

@HussamHallak
Copy link

Thanks Camel tools team for your great work ..

I am wondering if Camel tools set provide a way to measure the similarity between two texts, sentences or documents, in Arabic.

I have another question about the default sentiment analyzer in Camel tools (AraBERT). Can it be used for Classical Arabic? Mainly for Holy Quran verses, Prophetic Hadiths, and Fatwas?

I am trying to see if it is possible to utilize Camel tools to identify Quranic verses, Prophetic Hadiths, and Previous Fatwas that are related to a query of one or more sentences. A question answering system but instead of generating text, it simply provides stored text that is semantically similar to the question.

@nizarhabash1
Copy link
Collaborator

Hi Hussam -

(1) textual similarity: there is no specific direct utility for this in Camel Tools; but today text similarity is easy to measure using any model that provides embeddings. (see https://www.newscatcherapi.com/blog/ultimate-guide-to-text-similarity-with-python). From Camel Tools point of view, you can use our BERT models to get such embeddings. But you can also use other models. One thought is that if you are interested in Classical Arabic, CamelBert CA may be a good model to test. Alternatively, you can use Camel Tools to do lemmatization, and compare in the lemma space (but this is an old-fashioned approach).

(2) sentiment analysis on classical Arabic: sure, you can use the tools; just be aware that the results may be weak since they were not trained on classical Arabic texts. Benchmarking for classical Arabic would be great to see (a research question).

Regarding your project, it sounds very interesting .... it may be useful to also generate offline clusters of verses of the Quran and Hadith around common themes automatically. It sounds like a reasonable thing to do with the ideas I suggested in (1) above.

Best wishes
Nizar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants