Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mutiple Homonym from Morphology Request #135

Open
irina060981 opened this issue Nov 13, 2019 · 27 comments
Open

Mutiple Homonym from Morphology Request #135

irina060981 opened this issue Nov 13, 2019 · 27 comments
Labels
chinese chinese-specific components in the core alpheios components enhancement New feature or request treebank

Comments

@irina060981
Copy link
Member

For the feature - alpheios-project/client-adapters#40

We need to add an ability

  • to get several homonyms from the lexicalRequest,
  • publish all of them to the popup (and may be panel - full definitions, but for now we have not that ability for Chinese)
  • and save all of the WordList.

We need to refactor:

  • LexicalQuery
  • UIController modules, Vuex parameters
  • WordListController
  • Popup/Panel components

I see the following steps to do here:

  • as we have less space - I suggest to create tabbed popups, each tab would be a separate morph.vue component with current behaviour
  • as for UIController modules and Vuex - I suggest to convert single homonym and other related attributes to objects (key = targetWord + LangugeCode, value = homonym) - may be we should create here a specific Class - like HomonymSet (in data-modules)?
    I suggest object here, because Map is not trackable bu VueComponents
  • convert LexicalQuery result to Homonym sets
  • update WordlistController to homonym sets

@balmas and @kirlat, what do you think?

@irina060981 irina060981 self-assigned this Nov 13, 2019
@kirlat
Copy link
Member

kirlat commented Nov 13, 2019

as for UIController modules and Vuex - I suggest to convert single homonym and other related attributes to objects (key = targetWord + LangugeCode, value = homonym) - may be we should create here a specific Class - like HomonymSet (in data-modules)?
I suggest object here, because Map is not trackable bu VueComponents

I think this will be a good solution. I don't know if we really need to track if words are homonyms or not. If not, we can simply rename Homonym to something like a WordGroup, or WordSet, or WordList. The current Homonym already can hold a list of words, and this is exactly what we need. Its name implies that the words inside are homonyms. This was true for the languages we supported up until now, but we need to extend that meaning for Chinese, so the renaming can do that.

This solution will require minimum amount of work because all components are already capable of working with Homonym. Even if we still need to track what words in the object are homonyms, we can probably create some special objects (like maps or arrays) within that renamed object to track that. That will arguably be a lower effort than to create a wrapper around a group of homonyms.

Creating another object that will group Homonyms is a valid solution, but it will complicate things a lot, I'm afraid. We will have to change so many libraries (ClientAdapters, Components, Inflection tables, probably Data Models) to accommodate for that. That will be pretty big. Also, that will add another layer of complexity (a grouping object for homonyms) that we might not need. I think we should find a simplest solution that will satisfy our requirements and will be extensible.

What do you think?

I need to think about the UI part a little, will try to write on that tomorrow. I'm wondering if we need to show homonyms together in the UI or not. @balmas, do you know about that?

@irina060981
Copy link
Member Author

irina060981 commented Nov 14, 2019

I am not quitely agree, because I believe that homonym has its own sense.
Homonym is a set of the lexemes for the same targetWord.

But we need to get several Homonyms for different target words (each could have more than 1 lexeme) for Chinese.

@balmas gave a fully explained exmaple for that here
alpheios-project/client-adapters#40

So I don't agree that we could simply make Homonym a little more complex.
I believe that we should work with array/objects or specific class for a set of homonyms that are returned from the service.

@kirlat
Copy link
Member

kirlat commented Nov 14, 2019

@balmas gave a fully explained exmaple for that here
alpheios-project/client-adapters#40

Thanks for pointing to the example, that was really helpful. I just don't understand why 安 would produce two different homonyms. Is that because a homonym is "a word that is spelled the same and sounds the same as another, but is different in meaning or origin" (took that def from a Longman dictionary) and two 安, while obviously spelled the same, are probably pronounced differently? Do you know an answer?

Anyway, I agree that if keeping homonym as a unit is important for us then grouping several homonyms into a higher object as you suggests is the best way to go. I think an object (as opposed to an array) is the best choice because we will probably need to have additional props or methods on it. It must be extendable. Technically speaking, Array is an object too and we can attach props and methods to it as well, but that will be super confusing and we should never ever do that. Do you agree?

@irina060981
Copy link
Member Author

irina060981 commented Nov 14, 2019

I just don't understand why 安 would produce two different homonyms. Is that because a homonym is "a word that is spelled the same and sounds the same as another, but is different in meaning or origin" (took that def from a Longman dictionary) and two 安, while obviously spelled the same, are probably pronounced differently? Do you know an answer?

As I know, lexeme is a unit that has its own morphology properties + definitions + inflections. So if we add pronunciation to morphology, than we could say that some of the form has such pronunciation - for example, 'run' - lexeme, 'ran' - another form of this lexeme - both have different pronunciation and different morphology form of the word.

Homonym - is only a union of lexemes that has the same form (targetWord and language)
for example,
work - as a noun
work - as a verb
they have the same source - that's why they have similiar meaning, but quite different morphology properties

or we could get such an example
bat - as animal
bat - as wooden thing
both nouns, but has quite different meaning (may be somedays it was more closer to each other)

I don't know if Homonym is an official term (I learnt only Russian linguistics many years ago and English without really deep linguistic info - and I didn't face with such a term before)

That's why I think about Homonym as about a set of lexemes from the same targetWord inside the same Language.
Hope it was useful for you, @kirlat

@irina060981
Copy link
Member Author

irina060981 commented Nov 14, 2019

Technically speaking, Array is an object too and we can attach props and methods to it as well, but that will be super confusing and we should never ever do that. Do you agree?

Completely!
I think that specific class is better if we need some specific methods for them.
But I am not sure that we really have some specific attributes or methods, because these homonyms from Chinese doesn't have any real common parts (similiar to homonym - targetWord and LanguageID)
They are united together only because these words could be created from the chars from the text and there are no any other initing reasons.
Such union is not natural and has no real source.
As we get such set of homonyms - we won't be able to repeat it - only if we made the same selection from the same text.

That's why I think that a simple object with keys (only because Map is not correctly trackable by Vue components and Vuex) is better here - as a result from the adapter.

@kirlat
Copy link
Member

kirlat commented Nov 14, 2019

I'm not sure what the best UI for multiple homonyms would be so I would like to throw some of my thought in for discussion and I think together we'll find what will work best.

I like the idea of reusing the current morph component for each individual homonym. This is a very scalable approach and it allows to do with minimal code changes. I would maybe just rename the UI component to something like HomonymMorph or even Homonym to reflect the fact that it serves to display an info about a single homonym.

Ideally, I think, in a tabbed-like UI it would be best to show not just a word in a tab title. The problem with tab-based or other types of UI where a part of content is hidden is that user has to click on each tab to understand if that's the word (i.e. the meaning of the word) that he or she is expecting. If query returns, for example, two 安 words, there will be two tabs with the 安 character in them and there will be no way to know which is the appropriate one without clicking on them both first. If it would somehow possible to avoid that, that will be a chance for us to make our UI better.

I also think that tabs are not so mobile friendly. Multiple tabs on small screens spill into several rows and that is not nice at all, on my opinion. Tabs are not often used in mobile UIs and people are not used to them. The more natural solution, on my opinion, would be collapsible sections similar to the ones used on Android settings pages. The similar thing is an accordion element from Bootstrap: https://getbootstrap.com/docs/4.3/components/collapse/#accordion-example. Because the header control elements are wide, there will be plenty of space to put the word itself along with maybe a short meaning or a pinyin or other info that will hint the user if this is the word he or she is interested in or not. It also scales well on mobile devices with different screen widths.

Another advantage of accordions is that we can use a variant that will allow to open several sections at once. That is definitely not possible with tabs. Because of that the user will be able to have two different words open at the same time and do a comparison between them, for example. In case of mobile users, if text be long, they have to scroll up or down which is a naturally accepted behavior for mobile devices.

If we decide to use a different UI for desktop, we can use a side tabs approach maybe. I'm using a custom shall for a Longman dictionary. It was created by a Japanese teacher, Taku Fukada, and is very ergonomic in use, on my opinion; the reason why it was created in the first place was that the original Longman's UI is so bad. Her is how it looks like:
image
It has tabs at the left that also show some additional information about the word and helps to select the one easier. It has a nicely integrated search bar at the top. It also has a cool widget at the top right that links to related resources. I also like how it uses colors and other formatting to separate different types of information from each other. It's very easy to find what you need in almost no time.

The only drawback of this approach, on my opinion, is that it is not possible to open two dictionary entries at the same time. That can be helpful for comparing words between each other.

Please let me know what are your thoughts on the best UI approach.

@kirlat
Copy link
Member

kirlat commented Nov 14, 2019

I think that specific class is better if we need some specific methods for them.
But I am not sure that we really have some specific attributes or methods, because these homonyms from Chinese doesn't have any real common parts (similiar to homonym - targetWord and LanguageID)
They are united together only because these words could be created from the chars from the text and there are no any other initing reasons.
Such union is not natural and has no real source.
As we get such set of homonyms - we won't be able to repeat it - only if we made the same selection from the same text.
That's why I think that a simple object with keys (only because Map is not correctly trackable by Vue components and Vuex) is better here - as a result from the adapter.

I think we need to design it to be expandable because who knows what uses and functions will we need to have for it the future? It might be hard to foresee now. So it's better to have some space to expand than not and object will give us that space.

Even now (I can be wrong and maybe not all or even not any uses listed below would make sense so then please correct me) we might be interested in:

  • Storing a selection and context (forward and backward) for the whole word group (why would we duplicate it within each homonym?)
  • Having a method that will return a word that is closest to the selection (or the opposite of that)
  • Some service methods that will extract all lexemes (or any other lower level information) from all homonyms within the group and return them as a flat array. We might really need to flatten information from within the group for several purposes.

@irina060981
Copy link
Member Author

If query returns, for example, two 安 words, there will be two tabs with the 安 character in them and there will be no way to know which is the appropriate one without clicking on them both first. If it would somehow possible to avoid that, that will be a chance for us to make our UI better.

Each homonym has its own targetWord, so it couldn't be a situation that two tabs would have the same targetWord on the tab title.

The more natural solution, on my opinion, would be collapsible sections similar to the ones used on Android settings pages.

I like this idea. But I don't think that it is useful to place there anything besides targetWord - because we are sure that only targetWord and language are bligatory the same for all lexemes

Another advantage of accordions is that we can use a variant that will allow to open several sections at once.

For now we don't have scrolable popups (I believe)
Whith such an ability we should change the complex workflow of aligning the popup inside the viewport.

It has tabs at the left that also show some additional information about the word and helps to select the one easier. It has a nicely integrated search bar at the top. It also has a cool widget at the top right that links to related resources. I also like how it uses colors and other formatting to separate different types of information from each other. It's very easy to find what you need in almost no time.

It seems to me that it is not similiar data. Because I could see lexemes / forms / examples - much more detailed information inside the tab.
In our case we have only homonyms that has very short data, that has all lexemes inside homonym

For both variants we have advantages and disadvantages
tabs - is not good for mobile, but useful for desktop
collapsible - is good for mobile, needs complex aligning workflow on the desktop

may be - tabs for desktop and collapsible for mobile?

@kirlat
Copy link
Member

kirlat commented Nov 14, 2019

That's why I think about Homonym as about a set of lexemes from the same targetWord inside the same Language.

If that holds true then a 安 word from @balmas example given at alpheios-project/client-adapters#40 (comment) must form the same Homonym because targetWord for both is the same and language is the same too. Yet Bridget suggests they should form two different homonyms.

Based on wikipedia definition the term meaning and the practical use of a homonym term are broad. But mostly, as I understand, homonym means the same spelling and the same pronunciation. So I was thinking that maybe in case of 安 it's the pronunciation that sets those two words apart.

Maybe it's does not matter much and I'm not a linguist, but I'm just trying to understand specifics the best way I could to construct the underlying data model so that it will reflect all the complexities and nuances of the actual data correctly.

@irina060981
Copy link
Member Author

Storing a selection and context (forward and backward) for the whole word group (why would we duplicate it within each homonym?)
Having a method that will return a word that is closest to the selection (or the opposite of that)
Some service methods that will extract all lexemes (or any other lower level information) from all homonyms within the group and return them as a flat array. We might really need to flatten information from within the group for several purposes.

It seems to me that all this abilities are for working with texts and not single words.

For now we have features and tools to work with single targetWords
- its morph properties
- its definition (without taking into account the surrounding words)
- its inflections
- its wordUsage
- its current context

If we are going to increase features for analyzing the whole text, it of course could be useful

@irina060981
Copy link
Member Author

If that holds true then a 安 word from @balmas example given at alpheios-project/client-adapters#40 (comment) must form the same Homonym because targetWord for both is the same and language is the same too. Yet Bridget suggests they should form two different homonyms.

@kirlat , oh you are right
but it is really strange - because

image

According to the analyze workflow - we have different only pinyin and definition
other properties are given to the whole word in one source row
And @balmas said , that it is connected to homonym
I will try to find her comment

@irina060981
Copy link
Member Author

But according to the Cambridge dictionary
https://dictionary.cambridge.org/us/dictionary/english/homonym

Each other meaning should have its own homonym
But it is not applicable to our workflow, because we don't have the meaning as a property of the homonym - it is a property for lexeme

@irina060981
Copy link
Member Author

irina060981 commented Nov 14, 2019

Oh, I guess what does homonym really means in Russian :)
In our linguistic approach - not all words are homonyms, only some could be named as homonym
(similiar in a english - bat and bat, as I wrote before).
But In our extension we name homonym all the words - even if they doesn't have the another word with the same spell but other meaning.
I think it is the source of the misunderstood.

Here it is an article from philologist about it
https://moluch.ru/archive/112/28258/

@irina060981
Copy link
Member Author

I think that we need @balmas expert answer :)

@irina060981
Copy link
Member Author

irina060981 commented Nov 14, 2019

Here it is one more example
sum
image

you could see that the first lexeme and the second lexeme has the same form = targetWord
and different meaning. both are verbs.
According to Homonym definition and Chinese example - they shouldn't be inside the same homonym.

@kirlat
Copy link
Member

kirlat commented Nov 14, 2019

may be - tabs for desktop and collapsible for mobile?

I would think this will be the best approach at the moment. Let's wait to see what @balmas would say.

If we are going to increase features for analyzing the whole text, it of course could be useful

I'm just trying to think ahead and I think it's good to be able to add additional capabilities in the future. If you're not against using the object then that will be the safest bet, on my opinion.

@irina060981
Copy link
Member Author

I'm just trying to think ahead and I think it's good to be able to add additional capabilities in the future. If you're not against using the object then that will be the safest bet, on my opinion.

Of course I am not against.
If you and Bridget think that this additional work is useful - I would create such an object and integrate it in the multiple Homonym task.
I simply tried to explain my thoughts about it!

@balmas balmas self-assigned this Nov 18, 2019
@balmas
Copy link
Member

balmas commented Nov 18, 2019

assigning this to myself so that I remember I need to comment on it! I will try to provide some feedback soon.

@balmas
Copy link
Member

balmas commented Nov 20, 2019

Ok, first to clarify what we are talking about here.

If that holds true then a 安 word from @balmas example given at alpheios-project/client-adapters#40 (comment) must form the same Homonym because targetWord for both is the same and language is the same too. Yet Bridget suggests they should form two different homonyms.

That isn't quite right.

To clarify the original example, the following is a compound chinese word:

安之若素 (a verb meaning "to bear hardship with equamity)

However, the same 4 characters are also 4 distinct words

安 (a noun meaning "peace")
之 (a particle)
若 ( verb meaning "to seem")
素 (a noun meaning "vegetable")

When a user encounters a compound chinese word like this we want to give them as accurate a response as possible.

In this case, the only meaning of those 4 characters when they appear together in that order is the single compound word 安之若素 .

However, there could be other cases of compound words where multiple words are possible and only from reading the full context is is possible to know which is meant.

My chinese isn't good enough to give an example of this but take the following contrived example in english:

"con man" is compound word meaning a person who deceives

if you look at those two words in isolation, that is the only probably meaning

but in a sentence they could (even if it's unlikely) be used as separate words:

"it was a con man I was fooled" (which makes more sense if you have a comma after "con", but I think it makes the point)

"con" and "man" are not homonyms, they are individual words, which taken together make a third word.

So back to the original example, when a user mouses over the 安 in 安之若素 by including the context forward, we get 2 distinct words:


and
安之若素

(安之 and 安之若 are not valid words)

Further, in the old cedict source 安 had 2 distinct entries, which is why I said it should produce 2 lexemes. I see that has been corrected in the newer source so that isn't the case here, but it IS possible that a single chinese character is a homonym with multiple lexemes (in the same way "sum" in latin is a homonym with 3 lexemes)

@balmas
Copy link
Member

balmas commented Nov 20, 2019

So, that means we need to account for the possibility that multiple words might be analyzed together. This is required for Chinese, but can also be very useful for other languages as well (for example, to parse an entire sentence at once).

I think our data model as it stands is correct, i.e. we represent a Homonym as a distinct word made up of one or more Lexemes, and I don't think we want to change that. What we need to do is account for the following scenarios:

  1. analysis of a single word results in more than 1 Homonym when its context is taken into account

(the case of 安之若素)

  1. analysis of multiple words results in more than 1 Homonym for each word
    (e.g. if a user enters "veni vidi vici" in the lookup, that should return 3 Homonyms).

For scenario 2, we actually don't handle this correctly right now. We treat the entire user input "veni vidi vici" as if it was a single word. Our morphological parser is smart enough to understand that it is 3 words and we actually get back parses for all three of them, but we treat them as a single Homonym with the target word "veni vidi vici" which is incorrect.

@kirlat
Copy link
Member

kirlat commented Nov 20, 2019

Thanks for a very detailed explanation! It puts many pieces of information into the right places for me.

So, if I understand correctly, the change needed right now is to create a superstructure over the Homonym object that will group several Homonyms together (i.e. something like a word group)? And then we would need to change all our components so that they will work with this word group, not with individual homonyms as of now.

@balmas
Copy link
Member

balmas commented Nov 20, 2019

So, if I understand correctly, the change needed right now is to create a superstructure over the Homonym object that will group several Homonyms together (i.e. something like a word group)? And then we would need to change all our components so that they will work with this word group, not with individual homonyms as of now.

Yes, I think so. But it is probably worth thinking carefully about. It might be a good time for us to step back and document the current architecture a bit better so that we can see it more clearly before jumping into refactoring.

@kirlat
Copy link
Member

kirlat commented Nov 20, 2019

It might be a good time for us to step back and document the current architecture a bit better so that we can see it more clearly before jumping into refactoring.

I think it's a great idea

@balmas balmas transferred this issue from alpheios-project/components Feb 11, 2020
@balmas balmas added chinese chinese-specific components in the core alpheios components labels Feb 11, 2020
@balmas balmas added the enhancement New feature or request label Feb 11, 2020
@balmas
Copy link
Member

balmas commented Apr 8, 2020

we need this feature also for treebank queries, where sometimes a single word in a source text is linked to multiple words in a treebank.

E.g. "virumque" is an enclytic that is a combination of the 2 words "virum" and "que", and in a treebank, these would likely be reflected in two separate tokens.

@balmas balmas changed the title Mutiple Homonym from Morphology Request (Chinese) Mutiple Homonym from Morphology Request Apr 8, 2020
@balmas balmas added the treebank label Apr 8, 2020
@balmas
Copy link
Member

balmas commented Dec 10, 2020

Another example of compound words that needs to be supported from #541

For the greek word προἀπολέγω the morpheus parser returns 3 possible lexemes, each of which has a compound word as the lemma :

πρό,ἀπό-λέγω1 => 3 separate words: πρό , ἀπό , and λέγω1
πρό-ἀπολέγω1 => 2 separate words πρό, and ἀπολέγω1
πρό-ἀπολέγω2 => 2 separate words πρό, and ἀπολέγω2

For a total of 5 distinct lemmas, each of which has an entry in our shortdefs file:

"πρό|gen. before",
"ἀπό|gen from",
"λέγω1|say, speak",
"ἀπολέγω1|to pick out from",
"ἀπολέγω2|to decline, refuse"

Both our data model and the query needs to be able to account for this

For each lexeme, we should be able to show the user the separate components, with their definitions.

@kirlat this is something that needs to be handled properly in the lexical query refactoring and annotation code.

@kirlat
Copy link
Member

kirlat commented Dec 11, 2020

That's an interesting case to test with Thanks for bringing it in here!

This was referenced Dec 11, 2020
@balmas
Copy link
Member

balmas commented Dec 17, 2020

for compound words I think we need to modify our data model to introduce the concept of a CompoundLexeme, which is a type of Lexeme which has constituent lexemes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chinese chinese-specific components in the core alpheios components enhancement New feature or request treebank
Projects
None yet
Development

No branches or pull requests

3 participants