Semantic dependency relationship extractor untuk bahasa Indonesia (termasuk bahasa gaul dan alay ;))
Optional: mendukung ekstraksi dari bahasa Indonesia ke struktur sentence RelEx (English).
i(first, singular) 'aku', saya' you(second, singular) 'kamu', 'Anda' he(third, singular) 'dia', 'ia' she(third, singular) 'dia', 'ia' we_sm(first, plural) 'kami' we_lg(first, plural) 'kita' you_plural(second, plural) 'kalian' they(third, plural) 'mereka'
@prefix rdf: <> . @prefix rdfs: <> . @prefix owl: <> . @prefix lemon: <> . @prefix dbpedia-owl: <> . @prefix dbpedia: <> . @prefix schema: <> . @prefix wordnet-ontology: <> @prefix wn31: <> . @prefix wn20: <> . @prefix uby: <> .
Note: it’s hard to find dataset for @prefix wn30: .
and nobody uses the wn30 version, so we’re using wn31.
If we need to use wn-msa someday then we’ll need to make it wn31-compliant.
But currently wn31 already provides ind
translations. :)
BTW namespace always redirects to wn31
dbpedia-owl:Animal 'binatang', 'hewan'
Semua resources di sini diasumsikan a dbpedia-owl:Animal
dbpedia:Elephant 'gajah'
Aku melihat binatang gajah di kebun binatang.
noun verb 'binatang' dbpedia-owl:Animal 'di' 'kebun binatang' . => S( (NP 1) (VP 2 (NP 3 4)) (at dbpedia:Zoo) . )
Kamu melihat binatang apa di kebun binatang?
noun verb 'binatang' 'apa' 'di' 'kebun binatang' ? => S( (NP 1) (VP 2 (NP 3 what=dbpedia-owl:Animal)) (at dbpedia:Zoo) ? )
di sini ditentukan memiliki jenis dbpedia-owl:Animal
untuk membatasi search space jawaban.
Gajah memiliki 4 jumlah kaki.
Animal 'memiliki' int 'jumlah' 'kaki' . => S( (NP 1) (VP has (NP 4 feet)) .)
Gajah suaranya "auuuk…"
Gajah memiliki hidung panjang yang sering disebut belalai.
Hewan di samping namanya ular.
Tubuhnya panjang dan tidak memiliki kaki dan tangan.
Ular memiliki racun yang sering disebut bisa.
Warna ular disamping adalah hijau.
Kakak menunggangi hewan yang bernama kuda.
Kuda suaranya "yihheeekk".
Kuda suka mengangkut andong / bendi.
Kuda suka makan rumput.
Tubuh Nelly berwarna oranye/jingga/oren/kuning yang indah.
Nelly memiliki kuku/cakar yang runcing/tajam untuk mencakar mangsa.
Nelly suka makan ikan asin.
Hewan di samping namanya adalah kucing.
Cuicit adalah burung pipit Kesayangan Icha
Warnanya kuning dan oranye.
Cuicit suka berkicau/bernyanyi/bersuara di pagi hari.
Suara nya cit cit cit / cuit cuit cuit / merdu. :)
[WordNet 3.1 RDF]( (77 MB gzip, 1.2 GB expanded) comes in N-Triples format which is too big to parse anyway. So please convert it first to TURTLE using [rdfcat]( using following techniques.
This is slow and generates unusable data anyway. Skip to TDB + arq.
time JVM_ARGS='-Xms6g -Xmx6g' ~/apache-jena-2.11.2/bin/rdfcat -out ttl -rdfxml ~/Downloads/wn31.nt > ~/Downloads/wn31.ttl
That will take 7m15s on i7, you can use 4 GB heap too, but no less. And generates 480 MB TURTLE file without nsPrefixes (sigh!). :(
You need to use tdbloader2
to load the WordNet 3.1 data.
ceefour@amanah:/media/ceefour/passport/project_passport/Lumen/wn31 > tdbloader2 --loc ~/tmp/wn31 wn31.nt
This took 108 seconds on i7 :) (8,574,807 tuples!) and generates 735 MB data.
ceefour@amanah:~ > tdbquery --loc=$HOME/wn31_tdb --file ~/git/relex-id/core/elephant.sparql ------------------------------------------------------------------ | y | z | ================================================================== | rdf:type | wordnet-ontology:Synset | | wordnet-ontology:translation | "象"@zho | | wordnet-ontology:translation | "éléphant"@fra | | wordnet-ontology:translation | "elefante"@glg | | wordnet-ontology:translation | "elefante"@ita | | wordnet-ontology:translation | "biram"@zsm | | wordnet-ontology:translation | "elefante"@por | | wordnet-ontology:translation | "elefant"@dan | | wordnet-ontology:translation | "elefanta"@por | | wordnet-ontology:translation | "biram"@ind | | wordnet-ontology:translation | "ゾウ"@jpn | | wordnet-ontology:translation | "elefant"@nob | | wordnet-ontology:translation | "ช้าง"@tha | | wordnet-ontology:translation | "فیل"@fas | | wordnet-ontology:translation | "gajah"@zsm | | wordnet-ontology:translation | "elefante"@spa | | wordnet-ontology:translation | "ช้างสาร"@tha | | wordnet-ontology:translation | "פִּיל"@heb | | wordnet-ontology:translation | "象さん"@jpn | | wordnet-ontology:translation | "elefante"@eus | | wordnet-ontology:translation | "gajah"@ind | | wordnet-ontology:translation | "象"@jpn | | wordnet-ontology:translation | "norsu"@fin | | wordnet-ontology:translation | "elefantti"@fin | | wordnet-ontology:translation | "پیل"@fas | | wordnet-ontology:translation | "Elefantes"@por | | wordnet-ontology:translation | "elephantidae"@spa | | wordnet-ontology:translation | "éléphantidés"@fra | | wordnet-ontology:translation | "elefant"@nno | | wordnet-ontology:translation | "elefant"@cat | | wordnet-ontology:translation | "หัตถี"@tha | | wordnet-ontology:hyponym | wn31:102507401-n | | wordnet-ontology:hyponym | wn31:102506644-n | | wordnet-ontology:hyponym | wn31:102509414-n | | wordnet-ontology:hyponym | wn31:102506387-n | | wordnet-ontology:hyponym | wn31:102507089-n | | wordnet-ontology:synset_member | wn31:elephant-n | | wordnet-ontology:gloss | "five-toed pachyderm"@eng | | wordnet-ontology:part_of_speech | wordnet-ontology:noun | | owl:sameAs | wn20:synset-elephant-noun-1 | | owl:sameAs | uby:WN_Synset_13287 | | rdfs:label | "elephant"@eng | | wordnet-ontology:lexical_domain | wordnet-ontology:noun.animal | | wordnet-ontology:hypernym | wn31:102505758-n | | wordnet-ontology:hypernym | wn31:102455739-n | | wordnet-ontology:part_holonym | wn31:101468354-n | | wordnet-ontology:part_holonym | wn31:102455598-n | | wordnet-ontology:member_meronym | wn31:102505944-n | ------------------------------------------------------------------
Yay! :)
Put in ~/.bashrc
export PATH=$PATH:$HOME/apache-jena-2.11.2/bin:$HOME/jena-fuseki-1.0.2
Then execute:
chmod +x ~/jena-fuseki-1.0.2/s-* ~/jena-fuseki-1.0.2/fuseki-server --update --loc ~/wn31_tdb /ds
Go to http://localhost:3030/sparql.tpl and upload WordNet 3.1 data.
(You can also use tdbloader2
to load the WordNet 3.1 data.)
> s-query --output text --service http://localhost:3030/ds/query --file ~/git/relex-id/core/elephant.sparql
--------------------------------------------------------------------------------------------------------------------------- | s | p | o | =========================================================================================================================== | | rdf:type | wordnet-ontology:Synset | | | wordnet-ontology:translation | "象"@zho | | | wordnet-ontology:translation | "éléphant"@fra | | | wordnet-ontology:translation | "elefante"@glg | | | wordnet-ontology:translation | "elefante"@ita | | | wordnet-ontology:translation | "biram"@zsm | | | wordnet-ontology:translation | "elefante"@por | | | wordnet-ontology:translation | "elefant"@dan | | | wordnet-ontology:translation | "elefanta"@por | | | wordnet-ontology:translation | "biram"@ind | | | wordnet-ontology:translation | "ゾウ"@jpn | | | wordnet-ontology:translation | "elefant"@nob | | | wordnet-ontology:translation | "ช้าง"@tha | | | wordnet-ontology:translation | "فیل"@fas | | | wordnet-ontology:translation | "gajah"@zsm | | | wordnet-ontology:translation | "elefante"@spa | | | wordnet-ontology:translation | "ช้างสาร"@tha | | | wordnet-ontology:translation | "פִּיל"@heb | | | wordnet-ontology:translation | "象さん"@jpn | | | wordnet-ontology:translation | "elefante"@eus | | | wordnet-ontology:translation | "gajah"@ind | | | wordnet-ontology:translation | "象"@jpn | | | wordnet-ontology:translation | "norsu"@fin | | | wordnet-ontology:translation | "elefantti"@fin | | | wordnet-ontology:translation | "پیل"@fas | | | wordnet-ontology:translation | "Elefantes"@por | | | wordnet-ontology:translation | "elephantidae"@spa | | | wordnet-ontology:translation | "éléphantidés"@fra | | | wordnet-ontology:translation | "elefant"@nno | | | wordnet-ontology:translation | "elefant"@cat | | | wordnet-ontology:translation | "หัตถี"@tha | | | wordnet-ontology:hyponym | wn31:102507401-n | | | wordnet-ontology:hyponym | wn31:102506644-n | | | wordnet-ontology:hyponym | wn31:102509414-n | | | wordnet-ontology:hyponym | wn31:102506387-n | | | wordnet-ontology:hyponym | wn31:102507089-n | | | wordnet-ontology:synset_member | wn31:elephant-n | | | wordnet-ontology:gloss | "five-toed pachyderm"@eng | | | wordnet-ontology:part_of_speech | wordnet-ontology:noun | | | owl:sameAs | wn20:synset-elephant-noun-1 | | | owl:sameAs | uby:WN_Synset_13287 | | | rdfs:label | "elephant"@eng | | | wordnet-ontology:lexical_domain | wordnet-ontology:noun.animal | | | wordnet-ontology:hypernym | wn31:102505758-n | | | wordnet-ontology:hypernym | wn31:102455739-n | | | wordnet-ontology:part_holonym | wn31:101468354-n | | | wordnet-ontology:part_holonym | wn31:102455598-n | | | wordnet-ontology:member_meronym | wn31:102505944-n | | wn31:101468354-n | wordnet-ontology:part_meronym | | | wn31:102505944-n | wordnet-ontology:member_holonym | | | wn31:102505758-n | wordnet-ontology:hyponym | | | wn31:102455739-n | wordnet-ontology:hyponym | | | wn31:102455598-n | wordnet-ontology:part_meronym | | | wn31:102507401-n | wordnet-ontology:hypernym | | | wn31:102506644-n | wordnet-ontology:hypernym | | | wn31:102509414-n | wordnet-ontology:hypernym | | | wn31:102506387-n | wordnet-ontology:hypernym | | | wn31:102507089-n | wordnet-ontology:hypernym | | | <> | lemon:reference | | ---------------------------------------------------------------------------------------------------------------------------
Yay! :)
WordNet only contains nouns, verbs, adjectives, and adverbs. For other part-of-speeches, we need to use something else (probably [DBpedia Wiktionary]( or create our own data (but still using ontology).
However there still needs to be corrections, especially false inclusions:
tdbupdate -v --loc ~/wn31_tdb --update ~/git/relex-id/core/wn31patch.sparql
s-update -v --service http://localhost:3030/ds/update --file ~/git/relex-id/core/wn31patch.sparql
tdbquery --results text --loc ~/wn31_tdb --file ~/git/relex-id/core/me.sparql
Required to run
Extract []( to
Extract the indexes to $HOME (will create subdirectories inside
. For testing you can use the small indexes only:-
BabelNet API v1.0.1 + Path indexes v1.0.1: (I think we can use 1.1.1 instead, but not 2.0+)
and setbabelnet.dir
. -
and setknowledge.graph.pathIndex
[WordNet 3.1 RDF](
