Rust SBert

Rust port of sentence-transformers using rust-bert and tch-rs.

Supports both rust-tokenizers and Hugging Face's tokenizers.

Supported models

distiluse-base-multilingual-cased: Supported languages: Arabic, Chinese, Dutch, English, French, German, Italian, Korean, Polish, Portuguese, Russian, Spanish, Turkish. Performance on the extended STS2017: 80.1
DistilRoBERTa-based classifiers

Usage

Example

The API is made to be very easy to use and enables you to create quality multilingual sentence embeddings in a straightforward way.

Load SBert model with weights by specifying the directory of the model:

let mut home: PathBuf = env::current_dir().unwrap();
home.push("path-to-model");

You can use different versions of the models that use different tokenizers:

// To use Hugging Face tokenizer
let sbert_model = SBertHF::new(home.to_str().unwrap(), None);

// To use Rust-tokenizers
let sbert_model = SBertRT::new(home.to_str().unwrap(), None);

Now, you can encode your sentences:

let texts = ["You can encode",
             "As many sentences",
             "As you want",
             "Enjoy ;)"];

let batch_size = 64;

let output = sbert_model.forward(texts.to_vec(), batch_size).unwrap();

The parameter batch_size can be left to None to let the model use its default value.

Then you can use the output sentence embedding in any application you want.

Convert models from Python to Rust

Firstly, get a model provided by UKPLabs (all models are here):

mkdir -p models/distiluse-base-multilingual-cased

wget -P models https://public.ukp.informatik.tu-darmstadt.de/reimers/sentence-transformers/v0.2/distiluse-base-multilingual-cased.zip

unzip models/distiluse-base-multilingual-cased.zip -d models/distiluse-base-multilingual-cased

Then, you need to convert the model in a suitable format (requires pytorch):

python utils/prepare_distilbert.py models/distiluse-base-multilingual-cased

A dockerized environment is also available for running the conversion script:

docker build -t tch-converter -f utils/Dockerfile .

docker run \
  -v $(pwd)/models/distiluse-base-multilingual-cased:/model \
  tch-converter:latest \
  python prepare_distilbert.py /model

Finally, set "output_attentions": true in distiluse-base-multilingual-cased/0_distilbert/config.json.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
benches		benches
src		src
tests		tests
utils		utils
.dockerignore		.dockerignore
.gitignore		.gitignore
.travis.yml		.travis.yml
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rust SBert

Supported models

Usage

Example

Convert models from Python to Rust

About

Releases 1

Packages

Contributors 5

Languages

License

cpcdoy/rust-sbert

Folders and files

Latest commit

History

Repository files navigation

Rust SBert

Supported models

Usage

Example

Convert models from Python to Rust

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 5

Languages

Packages