Skip to content

tarekeldeeb/GloVe-Arabic

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GloVe: Global Vectors for Word Representation (Arabic Model)

البعد جتا أقاربها الكلمة
التقوى
0.571273 الاخلاص
0.556925 الايمان
0.536685 الطاعة
0.524090 الورع
0.523659 الاستقامة
0.512803 الخشية
كلب
0.568289 حيوان
0.547309 حمار
0.519542 اسد
0.498536 صيد
0.482568 الذئب
كبير -> اكبر رجل -> رجال علاقة كلمتين
جميل->؟ اجمل امراة ->؟ نساء
عالي->؟ اعلى قول ->؟ اقوال
رائع->؟ اروع

We provide an implementation of the GloVe model for learning word representations, and describe how to download web-dataset vectors or train your own. See the paper for more information on glove vectors.

Download pre-trained word vectors

The links below contain word vectors obtained from the respective corpora. If you want word vectors trained on other data sets, feel free to edit this script. Pre-trained word vectors are made available under the Waqf v2.0 Public License.

Train word vectors on a new corpus

If the web datasets above don't match the semantics of your end use case, you can train word vectors on your own corpus.

$ git clone https://github.com/tarekeldeeb/GloVe-Arabic
$ cd GloVe-Arabic && make
$ ./demo.sh

The demo.sh script downloads an arabic corpus, consisting of a mix of different sources. It collects unigram counts, constructs and shuffles cooccurrence data, and trains a simple version of the GloVe model. It also runs a word analogy evaluation script in python to verify word vector quality. More details about training on your own corpus can be found by reading demo.sh or the src/README.md

License

All sources contained in this package is licensed under the Apache License, Version 2.0. See the include LICENSE file.

About

GloVe model for distributed arabic word representation

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 72.0%
  • Python 12.0%
  • MATLAB 11.4%
  • Shell 3.6%
  • Makefile 1.0%