Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Train and set up a custom machine translation model docs #281

Merged
merged 4 commits into from
Sep 1, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions src/tools/pontoon/managing_pretranslation.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,10 +22,20 @@ Note that disabling a project would always involve a conversation with reviewers

Access Pontoon’s [admin console](https://pontoon.mozilla.org/admin/), and select the project: at the bottom of the page there is a section dedicated to *Pretranslation*.

**IMPORTANT**: if this is the first project for a locale, the first step is to [train and set up the custom engine model](#train-and-set-up-a-custom-engine-model) in Google AutoML Translation.
**IMPORTANT**: if this is the first project for a locale, the first step is to [train and set up the custom machine translation model](#train-and-set-up-a-custom-machine-translation-model) in Google AutoML Translation.

Use the checkbox `PRETRANSLATION ENABLED` to enable the feature for the project, then move the requested locales from the `Available` list to `Chosen`. Clicking the `PRETRANSLATE` button will pretranslate immediately all missing strings in enabled locales, otherwise pretranslation will run automatically as soon as new strings are added to the project.

## Train and set up a custom engine model
## Train and set up a custom machine translation model

(TBD)
To improve performance of the machine translation engine powering the pretranslation feature, custom machine translation models are trained for each locale using Pontoon’s translation memory. That results in better translation quality than what’s provided by the generic machine translation engine.

To create a custom translation model, first go to the [team page](https://mozilla-l10n.github.io/localizer-documentation/tools/pontoon/teams_projects.html#team-page) of the locale you are creating custom translation model for and download its [translation memory file](https://mozilla-l10n.github.io/localizer-documentation/tools/pontoon/translate.html#downloading-and-uploading-translations).

Next, go to the [Google Cloud console](https://console.cloud.google.com/translation/datasets?project=moz-fx-pontoon-prod) (requires permission) and follow the [official instructions](https://cloud.google.com/translate/automl/docs/create-machine-translation-model) for creating a translation dataset from the uploaded translation memory file and training an AutoML translation model.

When choosing the name for the dataset, follow the pattern used by existing datasets - `dataset_LOCALE_YYYY_MM_DD`. When choosing the Cloud Storage path where the uploaded files are to be stored, pick `pontoon-prod-model-data-c1107144`.

Note that creating the model is a background job which takes a few hours, and models for at most 4 locales can be trained concurrently. When the model is created, store its name (usally starting with `NM`, followed by a series of integers) under *Google automl model* in the [Django’s admin interface](https://pontoon.mozilla.org/a/) of the locale.

From that point on, Machinery will start using the custom machine translation model instead of the generic one and you’ll be set to enable pretranslation for the locale.
Loading