ipavec

This is the home of my bachelor thesis, IPA alignment using vector representations, which presents several different methods for obtaining embeddings of IPA segments in vector space for the purposes of aligning IPA-encoded sound sequences. The repo contains the code implementation, the training and evaluation data, and the thesis paper.

setup

# clone this repository
git clone https://github.com/pavelsof/ipavec
cd ipavec

# create a python3 virtual environment
python3 -m venv meta/venv
source meta/venv/bin/activate

# install the dependencies
pip install -r requirements.txt

# check the unit tests
python -m unittest discover code

usage

# activate the virtual env if it is not already
source meta/venv/bin/activate

# ensure reproducibility of the results
export PYTHONHASHSEED=42

# use train.py to produce embeddings for one of the trainable methods
python train.py --help

# use run.py to run a method on a dataset and output aligned pairs
python run.py --help

# use eval.py to evaluate the output if you have the gold-standard alignments
python eval.py --help

phoible

python run.py data/bdpa/slavic.psa --vectors phoible --output output/slavic-phoible.psa
python eval.py data/bdpa/slavic.psa output/slavic-phoible.psa | less

phon2vec

python train.py phon2vec data/northeuralex/ipa --output models/phon2vec
python run.py data/bdpa/slavic.psa --vectors phon2vec --output output/slavic-phon2vec.psa
python eval.py data/bdpa/slavic.psa output/slavic-phon2vec.psa | less

nn+rnn

python train.py nn data/northeuralex/ipa --extra epochs=10,batch_size=128
python run.py data/bdpa/slavic.psa --vectors nn --output output/slavic-nn.psa
python eval.py data/bdpa/slavic.psa output/slavic-nn.psa | less

python train.py rnn data/northeuralex/word_pairs --extra batch_size=128,from_model=models/nn
python run.py data/bdpa/slavic.psa --vectors rnn --output output/slavic-rnn.psa
python eval.py data/bdpa/slavic.psa output/slavic-rnn.psa | less

evaluation

The table summarises the accuracy achieved by the implemented methods on the BDPA datasets. The last two columns list the results of the PMI and SCA methods.

	one-hot	phoible	phon2vec	nn	nn+rnn	pmi	sca
covington	60.61%	82.42%	80.18%	82.52%	82.52%	87.80%	90.24%
andean	85.66%	87.31%	97.25%	99.34%	99.50%	95.21%	99.67%
bai	52.55%	62.77%	61.25%	74.72%	75.52%	-	83.45%
bulgarian	60.54%	80.54%	77.98%	82.55%	86.70%	81.70%	89.34%
dutch	14.16%	25.65%	26.00%	32.50%	32.50%	36.67%	42.20%
french	42.94%	62.92%	68.94%	74.30%	77.04%	71.98%	80.90%
germanic	39.93%	51.78%	54.59%	71.83%	72.55%	75.32%	83.48%
japanese	53.56%	65.04%	73.74%	62.71%	71.08%	68.26%	82.19%
norwegian	59.39%	78.87%	73.69%	83.53%	89.06%	78.11%	91.77%
ob-ugrian	59.58%	77.87%	73.35%	78.04%	82.55%	82.09%	86.04%
romance	40.48%	71.28%	63.16%	76.37%	77.55%	84.51%	95.62%
sinitic	27.34%	28.57%	30.75%	72.46%	74.04%	-	98.95%
slavic	76.96%	90.73%	84.22%	89.89%	96.81%	89.36%	94.15%
global	51.83%	66.64%	66.99%	75.88%	78.45%	-	84.84%

license

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
code		code
data		data
paper		paper
scripts		scripts
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
requirements.txt		requirements.txt
run.py		run.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ipavec

setup

usage

phoible

phon2vec

nn+rnn

evaluation

license

About

Languages

License

pavelsof/ipavec

Folders and files

Latest commit

History

Repository files navigation

ipavec

setup

usage

phoible

phon2vec

nn+rnn

evaluation

license

About

Resources

License

Stars

Watchers

Forks

Languages