Skip to content
This repository has been archived by the owner on Aug 11, 2022. It is now read-only.

Try automatic machine translation #6

Closed
zeke opened this issue May 30, 2017 · 14 comments
Closed

Try automatic machine translation #6

zeke opened this issue May 30, 2017 · 14 comments

Comments

@zeke
Copy link
Contributor

zeke commented May 30, 2017

Crowdin can automatically apply machine translation to new content. This could be a big time-saver.

In order to use this, we'll need to add a machine translation API key to our Crowdin account from Google or Microsoft (or Yandex?)

screen shot 2017-05-30 at 12 31 33 pm

https://support.crowdin.com/advanced-workflows/#accessing-the-workflows-feature

cc @alebourne

@alebourne
Copy link

HI @zeke

I did some research on the translation memory and Crowdin. Both Google and Microsoft have neural machine translation and are fairly similar, however Microsoft is free up to 2 million characters in one month. I think that amount can cover the Electron use of machine translation, especially in the beginning when there are only a few languages.

@zeke
Copy link
Contributor Author

zeke commented May 31, 2017

I just set up Microsoft Translator ,added the API key to our Crowdin project, and set up automatic machine translation for the Spanish locale.

@alebourne can you take a look at https://crowdin.com/project/electron/es-ES and tell me what you think of the translated Spanish?

Notice also that the projects/electron page has a nice callout saying that the machine translations need to be approved:

screen shot 2017-05-30 at 9 20 20 pm

Cost

MS Translate is free for the first 2 million characters, and $10 for each million characters after that. They also have bundles for larger amounts.

This is this word count:

screen shot 2017-05-30 at 9 16 19 pm

(40,000 words) * (~5 letters per English word) = 200,000

So we can auto-translate roughly ten locales before we have to start paying for it. The next question I guess will be figuring out which languages to prioritize for translation.

@Toinane
Copy link
Member

Toinane commented May 31, 2017

I suppose it most important to pre-translate languages which haven't translators yet or not much than other's languages.

@alebourne
Copy link

@zeke
I think that @Toinane makes a good point about pre-translating some, but not all languages. Especially since it is possible for the person translating to use the machine translation on a string by string basis as needed.
With that said, machine translation is still not as accurate, so translations will need to be cleaned up.
I will definitely have a look.

@alebourne
Copy link

@zeke, I checked 3 files and here are my observations:

  • More than 95% of the translations in each file need to be "retranslated" (aka cleaned up). So doing the machine pre-translation is not really saving us from translating the text.
  • The cleanup can be done in 2 ways. I can "vote down" the machine translation and propose a new one OR I can delete the machine translation and then type another one. In either case, another person needs to review and vote for the "new translation" that was added. Also; I'm not sure if everyone can "delete" translations, or if that is specific to my permissions in Crowdin.

Here are some additional thoughts:

  • I think the best leverage/re-use will come from the translation memory. This will be available after the first Electron project is completed.

@zeke
Copy link
Contributor Author

zeke commented Jun 2, 2017

doing the machine pre-translation is not really saving us

Thanks for this feedback. I wonder if we can easily undo the machine translation in bulk..

@Toinane
Copy link
Member

Toinane commented Jun 2, 2017

I'm in the same case of @alebourne without permissions to delete pre-translation

@zeke
Copy link
Contributor Author

zeke commented Jun 2, 2017

Question to Crowdin support:

Is there a way to undo machine translations in bulk? We have done an experiment with machine translating Spanish and French, and would like to revert the machine translations without losing the human translations.

Crowdin response:

Yes. Please select a language on the project page -> click the Activity tab -> click Undo across pre-translation action. Direct URLs:
https://crowdin.com/project/electron/fr/activity
https://crowdin.com/project/electron/es-ES/activity
No worries about human translations, they will be preserved

@zeke
Copy link
Contributor Author

zeke commented Jun 2, 2017

screen shot 2017-06-02 at 8 30 50 am

@Toinane I've removed the machine translations for French.

@funkyboy
Copy link
Contributor

I went manually, basically editing the suggestions provided by crowdin. It's not that bad but it takes some time :) I finished just a file so far but I have no idea of the process.
Will a PR be sent to GH when the whole translation is finished (and I guess approved) on crowdin? Or you can approve/merge just one file (or a few) at a time?

@zeke
Copy link
Contributor Author

zeke commented Jun 16, 2017

Will a PR be sent to GH when the whole translation is finished

Crowdin opens a single PR and automatically updates it every time a new translation is submitted on Crowdin to any file in any language. There's a way to configure Crowdin to require manager approval first, but that is disabled (by default). When a PR is merged, Crowdin opens a new one automatically.

Yesterday I set up some automated tests with Travis to make sure that translated files coming from Crowdin are properly named and formatted.

Here's an example of the most recent automated PR: #20

@zeke
Copy link
Contributor Author

zeke commented Jun 16, 2017

Gonna close this out.

TLDR: Bulk machine translation seems to require just as much human work as translating from scratch, if not more. So for now we're disabling it.

@zeke zeke closed this as completed Jun 16, 2017
@zeke
Copy link
Contributor Author

zeke commented Jun 29, 2017

The Azure trial account I created for Microsoft's Machine Translation service is expiring. No immediate plans to renew it.

screen shot 2017-06-29 at 11 56 33 am

@vaartis
Copy link

vaartis commented Jul 8, 2017

Google translate actually works pretty good at least with English->Russian

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants