-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Not showing proper results #2
Comments
We must have a recommended way to add words to the dictionaries. |
@asdofindia नमस्ते is not present in the dictionary. So it is because of that it is giving wrong outputs.. So we should be chosing better dictionary since it is a very common word? |
I think we can make dictionary using the wikipedia dump for hindi |
@stultus , Can i work on making better dictionaries using the wikipedia dump or any other large corpus? |
I already wrote scripts to get data from wikipedia. But I don't think wikipedia will be a good data set for spell check. |
@jishnu7 May be wiktionary will be. Also since you have added priority can we use that in anyway?. |
I was thinking if we could merge shabdanjali dictionary with the outputs of big corpus of crawled data? |
If you are planning to use 3rd party dictionaries then first step is making sure the dictionary is licensed under free software license (FSF OSI approved). Otherwise we can not use it. |
@copyninja AFAIK list of words are doesn't have copyrights. so as long as we are not replicating an actual dictionary (ie as long as we don't use the 'word - meaning' pattern) we can take the list of words from third party dictionaries. |
@stultus is there any article which confirms this?. I don't want legal hassle at later point. :) |
If we have a list of words how can someone prove that it is taken from "abc dictionary" and not from Wiktionary? and if the 'word:meaning' pairs are not distributed how is it going to affect potential commercial/personal gain of a dictionary curator. (I'll try to get citations, but I'm posting the above arguments are after confirming with a lawyer friend) |
Update : I also got the opinion that this can be a copyright infringement. so lets hold this till this is more clear to us. |
@stultus @copyninja we can use the merged outputs of the datasets which are publically available? like ILCI, ILMT, WikiDump, IITB Hindi Wordnet ? And make dictionaries from them? |
The text was updated successfully, but these errors were encountered: