Skip to content

Commit

Permalink
Merge pull request PaddlePaddle#18 from joey12300/add_google_news
Browse files Browse the repository at this point in the history
add google news word embedding
  • Loading branch information
ZeyuChen authored Feb 9, 2021
2 parents 21dd000 + 4b826bb commit 5f0821c
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 1 deletion.
7 changes: 7 additions & 0 deletions docs/embeddings.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,12 @@ PaddleNLP提供多个开源的预训练词向量模型,用户仅需在使用`p

## 英文词向量

### Word2Vec

| 语料 | 名称 |
|------|------|
| Google News | w2v.google_news.target.word-word.dim300.en |

### GloVe

| 语料 | 25维 | 50维 | 100维 | 200维 | 300 维 |
Expand Down Expand Up @@ -129,6 +135,7 @@ token_embedding = TokenEmbedding(embedding_name="fasttext.wiki-news.target.word-
| w2v.sikuquanshu.target.word-bigram.dim300 | 20.77 MB | 19529 |
| w2v.mixed-large.target.word-char.dim300 | 1.35 GB | 1292552 |
| w2v.mixed-large.target.word-word.dim300 | 1.35 GB | 1292483 |
| w2v.google_news.target.word-word.dim300.en | 1.61 GB | 3000000 |
| glove.wiki2014-gigaword.target.word-word.dim50.en | 73.45 MB | 400002 |
| glove.wiki2014-gigaword.target.word-word.dim100.en | 143.30 MB | 400002 |
| glove.wiki2014-gigaword.target.word-word.dim200.en | 282.97 MB | 400002 |
Expand Down
3 changes: 2 additions & 1 deletion paddlenlp/embeddings/constant.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,8 @@
# Mix-large
"w2v.mixed-large.target.word-char.dim300",
"w2v.mixed-large.target.word-word.dim300",

# GOOGLE NEWS
"w2v.google_news.target.word-word.dim300.en",
# GloVe
"glove.wiki2014-gigaword.target.word-word.dim50.en",
"glove.wiki2014-gigaword.target.word-word.dim100.en",
Expand Down

0 comments on commit 5f0821c

Please sign in to comment.