From c3157cc7aa923adb1f2da0b2f8f936e5f83370ee Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matja=C5=BE=20Horvat?= Date: Thu, 31 Aug 2023 22:23:21 +0200 Subject: [PATCH 1/4] Add Train and set up a custom machine translation model docs --- src/tools/pontoon/managing_pretranslation.md | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/src/tools/pontoon/managing_pretranslation.md b/src/tools/pontoon/managing_pretranslation.md index 5c11d8e2..cd64406a 100644 --- a/src/tools/pontoon/managing_pretranslation.md +++ b/src/tools/pontoon/managing_pretranslation.md @@ -22,10 +22,20 @@ Note that disabling a project would always involve a conversation with reviewers Access Pontoon’s [admin console](https://pontoon.mozilla.org/admin/), and select the project: at the bottom of the page there is a section dedicated to *Pretranslation*. -**IMPORTANT**: if this is the first project for a locale, the first step is to [train and set up the custom engine model](#train-and-set-up-a-custom-engine-model) in Google AutoML Translation. +**IMPORTANT**: if this is the first project for a locale, the first step is to [train and set up the custom machine translation model](#train-and-set-up-a-custom-machine-translation-model) in Google AutoML Translation. Use the checkbox `PRETRANSLATION ENABLED` to enable the feature for the project, then move the requested locales from the `Available` list to `Chosen`. Clicking the `PRETRANSLATE` button will pretranslate immediately all missing strings in enabled locales, otherwise pretranslation will run automatically as soon as new strings are added to the project. -## Train and set up a custom engine model +## Train and set up a custom machine translation model -(TBD) +To improve performance of the machine translation engine powering the pretranslation feature, custom machine translation models are trained for each locale using Pontoon’s translation memory. That results in better translation quality than what’s suggested by the generic machine translation engine. + +To create a custom translation model, first go to the [team page](https://mozilla-l10n.github.io/localizer-documentation/tools/pontoon/teams_projects.html#team-page) of the locale you are creating custom translation model for and download its [translation memory file](https://mozilla-l10n.github.io/localizer-documentation/tools/pontoon/translate.html#downloading-and-uploading-translations). + +Next, go to the [Google Cloud console](https://console.cloud.google.com/translation/datasets?project=moz-fx-pontoon-prod) and follow the [official instructions](https://cloud.google.com/translate/automl/docs/create-machine-translation-model) for creating a translation dataset from the uploaded translation memory file and training an AutoML translation model. + +When choosing the name for the dataset, follow the pattern used by existing datasets - `dataset_LOCALE_YYYY_MM_DD`. When choosing the Cloud Storage path where the uploaded files are to be stored, pick `pontoon-prod-model-data-c1107144`. + +Note that creating the model takes a few hours. When the model is created, store its name (usally starting with `NM`, followed by a series of integers) under *Google automl model* in the [Django’s admin interface](https://pontoon.mozilla.org/a/) of the locale. + +From that point on, Machiney will start using the custom machine translation model instead of the generic one and you’ll be set to enable pretranslation for the locale. From 7d2a195ed9437841917fd52d5cfdcf65b601146b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matja=C5=BE=20Horvat?= Date: Thu, 31 Aug 2023 22:27:33 +0200 Subject: [PATCH 2/4] suggested -> provided --- src/tools/pontoon/managing_pretranslation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/tools/pontoon/managing_pretranslation.md b/src/tools/pontoon/managing_pretranslation.md index cd64406a..23602524 100644 --- a/src/tools/pontoon/managing_pretranslation.md +++ b/src/tools/pontoon/managing_pretranslation.md @@ -28,7 +28,7 @@ Use the checkbox `PRETRANSLATION ENABLED` to enable the feature for the project, ## Train and set up a custom machine translation model -To improve performance of the machine translation engine powering the pretranslation feature, custom machine translation models are trained for each locale using Pontoon’s translation memory. That results in better translation quality than what’s suggested by the generic machine translation engine. +To improve performance of the machine translation engine powering the pretranslation feature, custom machine translation models are trained for each locale using Pontoon’s translation memory. That results in better translation quality than what’s provided by the generic machine translation engine. To create a custom translation model, first go to the [team page](https://mozilla-l10n.github.io/localizer-documentation/tools/pontoon/teams_projects.html#team-page) of the locale you are creating custom translation model for and download its [translation memory file](https://mozilla-l10n.github.io/localizer-documentation/tools/pontoon/translate.html#downloading-and-uploading-translations). From 35d91b2d7c73fddd1a75677155ef0166691f74fb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matja=C5=BE=20Horvat?= Date: Thu, 31 Aug 2023 22:28:01 +0200 Subject: [PATCH 3/4] Add (requires permission) --- src/tools/pontoon/managing_pretranslation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/tools/pontoon/managing_pretranslation.md b/src/tools/pontoon/managing_pretranslation.md index 23602524..2e43fc46 100644 --- a/src/tools/pontoon/managing_pretranslation.md +++ b/src/tools/pontoon/managing_pretranslation.md @@ -32,7 +32,7 @@ To improve performance of the machine translation engine powering the pretransla To create a custom translation model, first go to the [team page](https://mozilla-l10n.github.io/localizer-documentation/tools/pontoon/teams_projects.html#team-page) of the locale you are creating custom translation model for and download its [translation memory file](https://mozilla-l10n.github.io/localizer-documentation/tools/pontoon/translate.html#downloading-and-uploading-translations). -Next, go to the [Google Cloud console](https://console.cloud.google.com/translation/datasets?project=moz-fx-pontoon-prod) and follow the [official instructions](https://cloud.google.com/translate/automl/docs/create-machine-translation-model) for creating a translation dataset from the uploaded translation memory file and training an AutoML translation model. +Next, go to the [Google Cloud console](https://console.cloud.google.com/translation/datasets?project=moz-fx-pontoon-prod) (requires permission) and follow the [official instructions](https://cloud.google.com/translate/automl/docs/create-machine-translation-model) for creating a translation dataset from the uploaded translation memory file and training an AutoML translation model. When choosing the name for the dataset, follow the pattern used by existing datasets - `dataset_LOCALE_YYYY_MM_DD`. When choosing the Cloud Storage path where the uploaded files are to be stored, pick `pontoon-prod-model-data-c1107144`. From 2464eb9db862b5e6e3549efbbae2d978570a4f6e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Matja=C5=BE=20Horvat?= Date: Fri, 1 Sep 2023 07:54:01 +0200 Subject: [PATCH 4/4] Fix typo and mention concurrency --- src/tools/pontoon/managing_pretranslation.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/tools/pontoon/managing_pretranslation.md b/src/tools/pontoon/managing_pretranslation.md index 2e43fc46..3a2d65b5 100644 --- a/src/tools/pontoon/managing_pretranslation.md +++ b/src/tools/pontoon/managing_pretranslation.md @@ -36,6 +36,6 @@ Next, go to the [Google Cloud console](https://console.cloud.google.com/translat When choosing the name for the dataset, follow the pattern used by existing datasets - `dataset_LOCALE_YYYY_MM_DD`. When choosing the Cloud Storage path where the uploaded files are to be stored, pick `pontoon-prod-model-data-c1107144`. -Note that creating the model takes a few hours. When the model is created, store its name (usally starting with `NM`, followed by a series of integers) under *Google automl model* in the [Django’s admin interface](https://pontoon.mozilla.org/a/) of the locale. +Note that creating the model is a background job which takes a few hours, and models for at most 4 locales can be trained concurrently. When the model is created, store its name (usally starting with `NM`, followed by a series of integers) under *Google automl model* in the [Django’s admin interface](https://pontoon.mozilla.org/a/) of the locale. -From that point on, Machiney will start using the custom machine translation model instead of the generic one and you’ll be set to enable pretranslation for the locale. +From that point on, Machinery will start using the custom machine translation model instead of the generic one and you’ll be set to enable pretranslation for the locale.