Skip to content

Latest commit

 

History

History
62 lines (40 loc) · 1.84 KB

File metadata and controls

62 lines (40 loc) · 1.84 KB

ASReview multilingual feature extractor

This extension to ASReview implements a multilingual feature extractor algorithm. This algorithm allows for the usage of records in multiple languages. These languages are:

Arabic, Chinese, Dutch, English, French, German, Italian, Korean, Polish, Portuguese, Russian, Spanish, Turkish.

The extension implements sentence-transformers/distiluse-base-multilingual-cased-v1. This is a sentence-transformers model and maps sentences to a 512 dimensional dense vector space and is multilingual. For more information about the feature extraction method, see

Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. ArXiv, abs/1908.10084. https://arxiv.org/abs/1908.10084

Installation

Install the multilingual feature extractor with:

pip install .

or

pip install git+https://github.com/asreview/asreview-multilingual-feature-extractor.git

Usage

ASReview LAB

ASReview LAB users can select the model in the Model Selection step of the project setup. Select "Multilingual Sentence Transformer" under "Feature extraction".

Simulation

The new feature extractor Multilingual Sentence Transformer is defined in asreviewcontrib/models/distiluse-base-multilingual.py and can be used in a simulation.

asreview simulate example_data_file.csv -e multilingual

Test the feature extractor with:

asreview simulate benchmark:van_de_Schoot_2017 -e multilingual -m svm

License

MIT license

Contact

For any questions or remarks, please send an email to asreview@uu.nl or open an issue.