-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
provide human-readable sense IDs in *tab
mapping files for Princeton WN
#14
Comments
Remark: I think the numerical IDs are synset IDs, the human-readable IDs are sense IDs. If this is right, the question can be rephrased as "extend the ILI concept mapping to sense IDs". |
Note: The pull request provides a linking between sense IDs and original synset IDs. This can be used in conjunction with ILI mappings, but is not directly integrated into ILI mapping files. |
These mappings are already released as part of the existing Princeton WordNet releases (in the |
Thanks, @chiarcos for your work on this issue. My opinion is the same as @jmccrae's except that I would be against including the changes here. CILI is meant to be an interlingual resource and not tied to any one wordnet (even though the descriptions are in English and there are mappings to WordNet synset IDs so that they may be used with wordnets produced via the "expand" methodology). So linking the ILIs to words in one of the English wordnets seems misplaced. As @jmccrae mentioned, this data is encoded in the >>> import wn
>>> en = wn.Wordnet('omw-en') # wn.download('omw-en') if you don't have it
>>> s = en.senses()[0] # just get the first sense as an example
>>> s # sense ids are not sense keys
Sense('omw-en--apos-hood-08641944-n')
>>> s.metadata() # but for omw-en and omw-en31 they are stored in the metadata
{'identifier': "'hood%1:15:00::"}
>>> sense_key_map = { # build the mapping
... s.metadata()['identifier']: s
... for s in en.senses()
... if 'identifier' in s.metadata() # in case some senses do not have keys
... }
>>> sense_key_map['election%1:04:01::']
Sense('omw-en-election-00181781-n') This provides a mapping from the sense keys to the Sense objects, but you can then get to the synsets for other kinds of mappings: >>> sense_key_map['election%1:04:01::'].synset() # synset objects
Synset('omw-en-00181781-n')
>>> sense_key_map['election%1:04:01::'].synset().metadata() # NLTK-style identifiers
{'identifier': 'election.n.01'}
>>> sense_key_map['election%1:04:01::'].synset().ili # ILI
ILI('i36368') Building this mapping may be a bit more manual of a process than it should be. I'm not sure Wn needs a custom function to build this mapping, but it could be useful to include it as a recipe in the documentation. Hope this helps! I'm also interested to hear @fcbond's opinion. |
Sure, decide as you see fit. I'm also not sure whether CILI is the best place to provide that information, but the sad truth is that such a declarative mapping for older WordNets in a conventional format (it is included in the Also, the problem is not so much PWN 3.0 or newer resources as these are easily accessible. For the data at hand (SemCor), I need that for PWN 1.6, so that's why I created these mappings. (Or, have these been more stable than synset IDs so that I can just use [P]WN3.0 sense ids with PWN 1.6?) PS: One part that I couldn't figure out was how to create correct |
I could see the value of having mappings between sense keys and ILI IDs included in this repository, as many resources use these instead of the offset identifier. @chiarcos yes, the |
Traditionally, Princeton WordNet used two concurrent types of sense identification:
00182630-n
election%1:04:01::
In the mapping files, only the former are covered.
Request:
older-wn-mappings
(and, potentially, all others)Objective:
Use case:
The text was updated successfully, but these errors were encountered: