Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not showing proper results #2

Open
arushidogra opened this issue Mar 5, 2016 · 13 comments
Open

Not showing proper results #2

arushidogra opened this issue Mar 5, 2016 · 13 comments

Comments

@arushidogra
Copy link
Contributor

screenshot from 2016-03-05 11 38 18

@asdofindia
Copy link
Collaborator

We must have a recommended way to add words to the dictionaries.

@arushidogra
Copy link
Contributor Author

@asdofindia नमस्ते is not present in the dictionary. So it is because of that it is giving wrong outputs.. So we should be chosing better dictionary since it is a very common word?

@arushidogra
Copy link
Contributor Author

I think we can make dictionary using the wikipedia dump for hindi

@arushidogra
Copy link
Contributor Author

@stultus , Can i work on making better dictionaries using the wikipedia dump or any other large corpus?

@jishnu7
Copy link
Member

jishnu7 commented Mar 8, 2016

I already wrote scripts to get data from wikipedia.
https://github.com/androidtweak/dictionaries

But I don't think wikipedia will be a good data set for spell check.

@copyninja
Copy link
Member

@jishnu7 May be wiktionary will be. Also since you have added priority can we use that in anyway?.

@arushidogra
Copy link
Contributor Author

I was thinking if we could merge shabdanjali dictionary with the outputs of big corpus of crawled data?

@copyninja
Copy link
Member

If you are planning to use 3rd party dictionaries then first step is making sure the dictionary is licensed under free software license (FSF OSI approved). Otherwise we can not use it.

@stultus
Copy link
Member

stultus commented Mar 9, 2016

@copyninja AFAIK list of words are doesn't have copyrights. so as long as we are not replicating an actual dictionary (ie as long as we don't use the 'word - meaning' pattern) we can take the list of words from third party dictionaries.

@copyninja
Copy link
Member

@stultus is there any article which confirms this?. I don't want legal hassle at later point. :)

@stultus
Copy link
Member

stultus commented Mar 9, 2016

If we have a list of words how can someone prove that it is taken from "abc dictionary" and not from Wiktionary? and if the 'word:meaning' pairs are not distributed how is it going to affect potential commercial/personal gain of a dictionary curator.

(I'll try to get citations, but I'm posting the above arguments are after confirming with a lawyer friend)

@stultus
Copy link
Member

stultus commented Mar 9, 2016

Update : I also got the opinion that this can be a copyright infringement. so lets hold this till this is more clear to us.

ref: http://paste.debian.net/plain/413510

@arushidogra
Copy link
Contributor Author

@stultus @copyninja we can use the merged outputs of the datasets which are publically available? like ILCI, ILMT, WikiDump, IITB Hindi Wordnet ? And make dictionaries from them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants