-
-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mutiple Homonym from Morphology Request #135
Comments
I think this will be a good solution. I don't know if we really need to track if words are homonyms or not. If not, we can simply rename This solution will require minimum amount of work because all components are already capable of working with Homonym. Even if we still need to track what words in the object are homonyms, we can probably create some special objects (like maps or arrays) within that renamed object to track that. That will arguably be a lower effort than to create a wrapper around a group of homonyms. Creating another object that will group Homonyms is a valid solution, but it will complicate things a lot, I'm afraid. We will have to change so many libraries (ClientAdapters, Components, Inflection tables, probably Data Models) to accommodate for that. That will be pretty big. Also, that will add another layer of complexity (a grouping object for homonyms) that we might not need. I think we should find a simplest solution that will satisfy our requirements and will be extensible. What do you think? I need to think about the UI part a little, will try to write on that tomorrow. I'm wondering if we need to show homonyms together in the UI or not. @balmas, do you know about that? |
I am not quitely agree, because I believe that homonym has its own sense. But we need to get several Homonyms for different target words (each could have more than 1 lexeme) for Chinese. @balmas gave a fully explained exmaple for that here So I don't agree that we could simply make Homonym a little more complex. |
Thanks for pointing to the example, that was really helpful. I just don't understand why 安 would produce two different homonyms. Is that because a homonym is "a word that is spelled the same and sounds the same as another, but is different in meaning or origin" (took that def from a Longman dictionary) and two 安, while obviously spelled the same, are probably pronounced differently? Do you know an answer? Anyway, I agree that if keeping homonym as a unit is important for us then grouping several homonyms into a higher object as you suggests is the best way to go. I think an object (as opposed to an array) is the best choice because we will probably need to have additional props or methods on it. It must be extendable. Technically speaking, Array is an object too and we can attach props and methods to it as well, but that will be super confusing and we should never ever do that. Do you agree? |
As I know, lexeme is a unit that has its own morphology properties + definitions + inflections. So if we add pronunciation to morphology, than we could say that some of the form has such pronunciation - for example, 'run' - lexeme, 'ran' - another form of this lexeme - both have different pronunciation and different morphology form of the word. Homonym - is only a union of lexemes that has the same form (targetWord and language) or we could get such an example I don't know if Homonym is an official term (I learnt only Russian linguistics many years ago and English without really deep linguistic info - and I didn't face with such a term before) That's why I think about Homonym as about a set of lexemes from the same targetWord inside the same Language. |
Completely! That's why I think that a simple object with keys (only because Map is not correctly trackable by Vue components and Vuex) is better here - as a result from the adapter. |
I'm not sure what the best UI for multiple homonyms would be so I would like to throw some of my thought in for discussion and I think together we'll find what will work best. I like the idea of reusing the current morph component for each individual homonym. This is a very scalable approach and it allows to do with minimal code changes. I would maybe just rename the UI component to something like Ideally, I think, in a tabbed-like UI it would be best to show not just a word in a tab title. The problem with tab-based or other types of UI where a part of content is hidden is that user has to click on each tab to understand if that's the word (i.e. the meaning of the word) that he or she is expecting. If query returns, for example, two 安 words, there will be two tabs with the 安 character in them and there will be no way to know which is the appropriate one without clicking on them both first. If it would somehow possible to avoid that, that will be a chance for us to make our UI better. I also think that tabs are not so mobile friendly. Multiple tabs on small screens spill into several rows and that is not nice at all, on my opinion. Tabs are not often used in mobile UIs and people are not used to them. The more natural solution, on my opinion, would be collapsible sections similar to the ones used on Android settings pages. The similar thing is an accordion element from Bootstrap: https://getbootstrap.com/docs/4.3/components/collapse/#accordion-example. Because the header control elements are wide, there will be plenty of space to put the word itself along with maybe a short meaning or a pinyin or other info that will hint the user if this is the word he or she is interested in or not. It also scales well on mobile devices with different screen widths. Another advantage of accordions is that we can use a variant that will allow to open several sections at once. That is definitely not possible with tabs. Because of that the user will be able to have two different words open at the same time and do a comparison between them, for example. In case of mobile users, if text be long, they have to scroll up or down which is a naturally accepted behavior for mobile devices. If we decide to use a different UI for desktop, we can use a side tabs approach maybe. I'm using a custom shall for a Longman dictionary. It was created by a Japanese teacher, Taku Fukada, and is very ergonomic in use, on my opinion; the reason why it was created in the first place was that the original Longman's UI is so bad. Her is how it looks like: The only drawback of this approach, on my opinion, is that it is not possible to open two dictionary entries at the same time. That can be helpful for comparing words between each other. Please let me know what are your thoughts on the best UI approach. |
I think we need to design it to be expandable because who knows what uses and functions will we need to have for it the future? It might be hard to foresee now. So it's better to have some space to expand than not and object will give us that space. Even now (I can be wrong and maybe not all or even not any uses listed below would make sense so then please correct me) we might be interested in:
|
Each homonym has its own targetWord, so it couldn't be a situation that two tabs would have the same targetWord on the tab title.
I like this idea. But I don't think that it is useful to place there anything besides targetWord - because we are sure that only targetWord and language are bligatory the same for all lexemes
For now we don't have scrolable popups (I believe)
It seems to me that it is not similiar data. Because I could see lexemes / forms / examples - much more detailed information inside the tab. For both variants we have advantages and disadvantages may be - tabs for desktop and collapsible for mobile? |
If that holds true then a 安 word from @balmas example given at alpheios-project/client-adapters#40 (comment) must form the same Homonym because targetWord for both is the same and language is the same too. Yet Bridget suggests they should form two different homonyms. Based on wikipedia definition the term meaning and the practical use of a homonym term are broad. But mostly, as I understand, homonym means the same spelling and the same pronunciation. So I was thinking that maybe in case of 安 it's the pronunciation that sets those two words apart. Maybe it's does not matter much and I'm not a linguist, but I'm just trying to understand specifics the best way I could to construct the underlying data model so that it will reflect all the complexities and nuances of the actual data correctly. |
It seems to me that all this abilities are for working with texts and not single words. For now we have features and tools to work with single targetWords If we are going to increase features for analyzing the whole text, it of course could be useful |
@kirlat , oh you are right According to the analyze workflow - we have different only pinyin and definition |
But according to the Cambridge dictionary Each other meaning should have its own homonym |
Oh, I guess what does homonym really means in Russian :) Here it is an article from philologist about it |
I think that we need @balmas expert answer :) |
I would think this will be the best approach at the moment. Let's wait to see what @balmas would say.
I'm just trying to think ahead and I think it's good to be able to add additional capabilities in the future. If you're not against using the object then that will be the safest bet, on my opinion. |
Of course I am not against. |
assigning this to myself so that I remember I need to comment on it! I will try to provide some feedback soon. |
Ok, first to clarify what we are talking about here.
That isn't quite right. To clarify the original example, the following is a compound chinese word: 安之若素 (a verb meaning "to bear hardship with equamity) However, the same 4 characters are also 4 distinct words 安 (a noun meaning "peace") When a user encounters a compound chinese word like this we want to give them as accurate a response as possible. In this case, the only meaning of those 4 characters when they appear together in that order is the single compound word 安之若素 . However, there could be other cases of compound words where multiple words are possible and only from reading the full context is is possible to know which is meant. My chinese isn't good enough to give an example of this but take the following contrived example in english: "con man" is compound word meaning a person who deceives if you look at those two words in isolation, that is the only probably meaning but in a sentence they could (even if it's unlikely) be used as separate words: "it was a con man I was fooled" (which makes more sense if you have a comma after "con", but I think it makes the point) "con" and "man" are not homonyms, they are individual words, which taken together make a third word. So back to the original example, when a user mouses over the 安 in 安之若素 by including the context forward, we get 2 distinct words: 安 (安之 and 安之若 are not valid words) Further, in the old cedict source 安 had 2 distinct entries, which is why I said it should produce 2 lexemes. I see that has been corrected in the newer source so that isn't the case here, but it IS possible that a single chinese character is a homonym with multiple lexemes (in the same way "sum" in latin is a homonym with 3 lexemes) |
So, that means we need to account for the possibility that multiple words might be analyzed together. This is required for Chinese, but can also be very useful for other languages as well (for example, to parse an entire sentence at once). I think our data model as it stands is correct, i.e. we represent a Homonym as a distinct word made up of one or more Lexemes, and I don't think we want to change that. What we need to do is account for the following scenarios:
(the case of 安之若素)
For scenario 2, we actually don't handle this correctly right now. We treat the entire user input "veni vidi vici" as if it was a single word. Our morphological parser is smart enough to understand that it is 3 words and we actually get back parses for all three of them, but we treat them as a single Homonym with the target word "veni vidi vici" which is incorrect. |
Thanks for a very detailed explanation! It puts many pieces of information into the right places for me. So, if I understand correctly, the change needed right now is to create a superstructure over the Homonym object that will group several Homonyms together (i.e. something like a word group)? And then we would need to change all our components so that they will work with this word group, not with individual homonyms as of now. |
Yes, I think so. But it is probably worth thinking carefully about. It might be a good time for us to step back and document the current architecture a bit better so that we can see it more clearly before jumping into refactoring. |
I think it's a great idea |
we need this feature also for treebank queries, where sometimes a single word in a source text is linked to multiple words in a treebank. E.g. "virumque" is an enclytic that is a combination of the 2 words "virum" and "que", and in a treebank, these would likely be reflected in two separate tokens. |
Another example of compound words that needs to be supported from #541 For the greek word προἀπολέγω the morpheus parser returns 3 possible lexemes, each of which has a compound word as the lemma : πρό,ἀπό-λέγω1 => 3 separate words: πρό , ἀπό , and λέγω1 For a total of 5 distinct lemmas, each of which has an entry in our shortdefs file: "πρό|gen. before", Both our data model and the query needs to be able to account for this For each lexeme, we should be able to show the user the separate components, with their definitions. @kirlat this is something that needs to be handled properly in the lexical query refactoring and annotation code. |
That's an interesting case to test with Thanks for bringing it in here! |
for compound words I think we need to modify our data model to introduce the concept of a CompoundLexeme, which is a type of Lexeme which has constituent lexemes |
For the feature - alpheios-project/client-adapters#40
We need to add an ability
We need to refactor:
I see the following steps to do here:
I suggest object here, because Map is not trackable bu VueComponents
@balmas and @kirlat, what do you think?
The text was updated successfully, but these errors were encountered: