Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Translation Workflow for Docs #10119

Closed
franzihei opened this issue Oct 27, 2020 · 14 comments
Closed

[DOCS] Translation Workflow for Docs #10119

franzihei opened this issue Oct 27, 2020 · 14 comments
Assignees

Comments

@franzihei
Copy link
Member

franzihei commented Oct 27, 2020

Problem

We have a lot of rather outdated (and often incomplete) translations of the Solidity docs, which are a great start but require quite a few updates and additions. Currently, these translations are not actively maintained and some of them not integrated into the readthedocs page.

Here are the rather outdated issues that tried to keep track of the translations:

(Maybe we can consider closing these once we agreed on a new approach)

Solution Approach / Ideas 💡

The ethereum.org page has a vast amount of translations which seem to be working great.

The ethereum.org team is using Crowdin to keep track of translations. You can see their project here: https://crowdin.com/project/ethereumfoundation.

They advertise the translation program here: https://ethereum.org/en/contributing/translation-program/
And here is a recent blog post on their milestone of reaching 30 languages: https://blog.ethereum.org/2020/07/29/ethdotorg-translation-milestone/

Next steps

I will set up a call with the guys from ethereum.org to learn more about how they do it. :)

@franzihei
Copy link
Member Author

franzihei commented Nov 2, 2020

Update

This is how the translation workflow works for the ethereum.org translations:

[Translation]->[Review]->[Sync]->[Upload online]

  1. Uploading the content files on Crowdin (manually). The Progress bar shows 0%.
  2. People come to the Crowdin Project page through various channels. (Ethereum.org Translation Program page, Github, etc).
  3. People choose languages and files to translate.
  4. When the translation rate reaches 100%, we ask our professional translation company (paid reviewers) to verify the translation quality, spelling and grammar, etc.
  5. When the review is completed, download the translated file and make a GH PR.
  6. A GitHub developer who speaks the language synchronizes the translation file with the original website.
  7. When synchronization is complete, finally upload it online.

The questions / considerations for us are:

  • Do we want to use crowdin as a translation tracking tool or not? (Here is a link to the Crowdin feature overview page)
  • Without crowdin, there is no easy way to check how much of the original document has been translated and in which state it is (no % coverage or indication on which version it is based)
  • Most of our translators might be more interested in keeping the whole workflow in Github though, since crowdin would require them to "sign up to an additional tool"

Once we agree if we want to use crowdin or not we can define the next steps.

@franzihei
Copy link
Member Author

Questions from the standup:

  • Which languages do we need / How can we validate the need?
  • Are there any translation related tools or plugins for readthedocs?
  • Could we get a dedicated translation lead contributor for the languages?
  • How do we do version management?

@franzihei
Copy link
Member Author

Insights into language usage:

@franzihei
Copy link
Member Author

Read the Docs Workflow

Localizing the content would mean that the different translations would be included in the original documentation and not in separate repos. One can then select the desired language in the bottom flyout menu.
image

@Karocyt
Copy link
Contributor

Karocyt commented Nov 10, 2020

French Doc State

I started the French Docs during my first year as an IT student.

I translated half of it in one go in late 2018 and only got it up to date last June (synced with release v0.6.8), but still in a partially translated state.

Do we need it

I experienced firsthand the English skills of French IT engineers, sometimes giving up on a bare Readme.
I do believe that having a French documentation pushes adoption (including in some parts of Africa) and will help bring great projects to life.
So do we (as the Ethereum ecosystem) need it, I don't know, but we (as French developers) definitely do.

Usage

  • 1000 pages views every month
  • about 300 montly users
  • half are recuring ones.

I recently heard that my school was working on certifying degrees on the Ethereum blockchain, and that they were using... "the official French documentation"! I also learned that some of our students referred to my doc on a regular basis.
It doesn't sound like much, but it's crazy rewarding to know that people are indeed using it. This is part of what pushed me to bring it up to date last June, even if it created "translation holes" and involved deleting French outdated parts.

Surprisingly, no one ever complained about the "half translated" side of things.
I personally rather have a partially translated version than an outdated one, but I'm quite biased as I'm comfortable with English.

Hosting

Two guys manifested their will to contribute (#5250) but even if I added them as contributors, neither of them submitted a single commit. They never gave any sign of life but I suspect that they were disappointed to be told to contribute on some fork owned by a random student...
So whatever the solution chosen, it would be nice to bring it all back under the Ethereum Organisation umbrella to boost both involvement and visibility.

But as long as there is not enough translators on a version, we would either need a good version management system to accommodate inconsistent updates, or keep such initiatives separated/isolated as long as they can't keep up.

Workflow

Readthedocs is unclear about this, but for what I understand, the two ways of managing localization are mutually exclusive.

Keeping things on multiple repos

  • The actual process with one repository by langage seems to fit the How to localize content way of doing things and seems to be the simpler way.
  • For what I understand, it allows for independent versioning between langages, for better or for worse, depending of what we're after, while still having them in the bottom flyout menu.
  • A good way of simplifying things would be to have the documentation in it's own repository for clarity, as forking the whole shebang feels messy.
  • Once everything is translated once, merge conflicts should be sufficient to keep an eye of what's to do relative to last updates in the main repo.

The centralized way of things

  • For what I read in How to manage translations in RtD, the "manual" way would involve basic Sphinx commands that could be automated using git hooks or Github Actions.
  • The .po file structure adds a layer of complexity over the reST notation and seems to involve a fresh start, but other than that this should come down to the same kind of grunt work that we're actually doing, with the benefit of having the English reference text included in the files.
  • This looks like "the good way".

Crowdin

  • Their interface looks very user friendly and they seem to offer a free plan to open-source projects
  • It allows several people to collaborate on the same document without stepping on each other's toes and to keep an overview of the status of the translation.
  • The overhead involved seems manageable on either side (maintainers/contributors) and might allow a couple of the 600+ ethereum.org translation contributors to give a hand (let me dream).
  • As it's only manually synced with the repo, this solution could allow to keep all translations in one place, even the ongoing/partial ones that we don't necessarily wish to include in the main Solidity repo.

New translations

  • We need a section (or at the very least a Readme) on how to start a new translation.
  • In a mono-repo situation, new versions could start as independent forks as the existing ones, including one more langage that would be set as default under readthedocs. That way, when/if it finally meet our standards, it's just about adding the folder to the main repo.
  • I really see it as conter-productive to restrict the langages to the most used ones, as long as some people are ready to jump on it. Even if they're not included directly, we should link to it somehow, with their version and a vague indication of their completeness.

@chriseth
Copy link
Contributor

chriseth commented Nov 11, 2020

Thanks a lot for your insights, @Karocyt ! You are probably right that incomplete translations are not a problem. I'm currently leaning towards having translations in their own repository. Those could probably be forks of the main repository and are linked via github actions or other automation tools. I'm not sure if the proposed way through .po files is really useful - a first trial showed that solidity source examples are not exported, but that may just be a setting.

Do you think that the following approach could work:

  • for each translation, fork the repo into a new repo (maybe restrict to the 'docs' subdirectory, if that is possible)
  • this means it is initially english
  • translators translate page by page or even just paragraph by paragraph
  • set up automation to track the develop and the breaking branch on the main repo
  • on a change to main develop or main breaking, merge in the change, force-accepting the remote change over the local one.
  • this should have the effect that paragraphs or lines that were changed in the main repo revert back to english for each change, are hopefully noticed and then have to be translated

This workflow might even be compatible with using .po files or other means to do the translation in the future (machine translation etc).

if the auto-merged version cannot be compiled, this would maybe create a pull request instead.

@franzihei
Copy link
Member Author

Thanks for your comments and insights @Karocyt! Please note that we're still in the exploration phase on how to improve the translation process. We'll try out different approaches and once we're settled on a rough process we'll update the community about it. :)

With these efforts I'm trying to streamline the translation work done, ensure high quality and consistency while also being able to provide more guidelines and structure to the community translators.


Process Update

I've been trying out transifex. This is how the interface looks like:

image

Translation is done step by step / paragraph by paragraph. This is handy for actual human translators, but not so much if we want to create a first draft using machine translations.

I'd be curious how to translate a different version using .pot files instead of .po files. Afaik those should highlight changes done between the versions?

@Karocyt
Copy link
Contributor

Karocyt commented Nov 12, 2020

@chriseth : The thing with force-accepting the remote changes is that you loose a line on each typo/rephrasing, that don't necessarily needs to be corrected on the translated version.
There is also the problem of sections permutations: when I did my big version bump, the merge was a huge mess but most of it was just about sections moving around.
Both issues could be mitigated by requiring a review/correction of the merge commit by the involved team.

If there is no way to export them, the unexported examples might be a feature more than a problem. It might be better to cut the full contracts examples (Solidity by example) into smaller snippets interspersed with the current comment text. Comments are not pleasant to translate and the long/insightful ones could be better off as their own note and/or warning insert (see the Enums one for instance).
Short informational comments could stay in English by convention, as I don't think you end up reading the Solidity documentation as a complete neophyte developer.

Manual machine translation

I leaned heavily on DeepL translator for my first draft, really amazing with contextual translations and in many cases better than what I could have done on my own.
It internally looses context on every line break, so I had to get rid of those everywhere (as in here, but the .po export seems to fix it), keep only the paragraphs and fix the reST notation and examples manually afterwards.
Most of it was proofreading then.

Transifex

To try the interface in a similar case, they auto-accept contributors on the readthedocs project.
However, I'm not a huge fan of the sentence by sentence basis. It allows to translate a sentence occurring multiple times in the project only once but doesn't seems to allow to work one file at a time (in the case of a partial/ongoing translation, I tend to focus on basics first).

@axic
Copy link
Member

axic commented Nov 15, 2020

Screenshot

On readthedocs we also have a translation feature turned on. Clicking on zh above (for "Simplified Chinese) goes to https://solidity.readthedocs.io/zh/latest/, which does not seem to have any translation. It is apparently this repository: https://readthedocs.org/projects/solidity-zh/

I think we should remove it from the readthedocs settings:
Screenshot

@franzihei
Copy link
Member Author

Dropping new learnings here so we have them all in one place:

Example for a purely Github-driven workflow (ReactJS)

React, as in the React website and docs, are translated into several languages by community contributors. To streamline the process and ensure quality and to solve the versioning/updating issue they came up with the following workflow. (Read the full story here.)

  • They create a fork of the repo for each language
  • They have a prioritisation and progress checklist for how to tackle the translation step-by-step
  • They defined maintainer responsibilities
  • They use a bot, which creates PRs for new content to be translated each time the English version is updated. The bot can be found here and this is how a PR from the bot looks like

Process followed by React (copied from this issue):

  • Pick a "partner team" (or two) of translators for a proof-of-concept. (they had Simplified Chinese, Spanish and Japanese to choose from)
  • Prepare reactjs.org for forking/translation
  • Create the bot and/or git hooks to create issues/cherry-picks when the original English React documentation changes.
  • Create a guide for starting a new translation project and integrating the bot.
  • Make a site to keep track of translation progress for various languages (https://www.isreacttranslatedyet.com) (comment: does not seem to work currently)

Example for including machine translations into the workflow (CPP Reference)

cppreference.com uses Machine Translations for their localized content, which can then be proof-read / corrected by community contributors in a second step: https://en.cppreference.com/w/Cppreference:MachineTranslations.

Conclusions

I've been playing with transifex and I don't think it's a good fit, it feels like overhead to me. In the optimal scenario, we could set up a purely Github-driven semi-automated process.

Could anybody check the open-source React bot and evaluate if it makes sense to re-use some functionality?

@franzihei
Copy link
Member Author

franzihei commented Nov 23, 2020

🤖

@chriseth I think the scripts for the bot can be found here: https://github.com/reactjs/reactjs.org-translation/tree/master/scripts

The React Translation bot is based on this bot: https://github.com/vuejs-jp/che-tsumi/tree/master/lib

@benjioh5
Copy link

Korean translation is not updated after 2019-01-11 and no response at solidity-korea/solidity-docs-kr#72

Could you check or ping to @solidity-korea and org members?

@franzihei
Copy link
Member Author

Hey @benjioh5 - We are currently re-organising the translation process and unfortunately most of the currently existing translations are very outdated. We will create a new process for community translations and will let you know as soon as there is an update.

@franzihei
Copy link
Member Author

Kicking off the new Translation Workflow! 🎉

Hello everybody!

After quite some time I'm happy to be kicking off the new translation workflow for the Solidity documentation with you!

I'm tagging all the people that seem to have been involved in translations in the past here: @kcyang @dongsam @PhyrexTsai @GoodVincentTu @bartkim0426 @fkysly @zhuangjinxin @hongbinzuo @PhyrexTsai @mrblocktw @altuntasfatih @cagataycali @mevlanaayas @msusur @onursabitsalman @denizozzgur @Karocyt @clemlak @damianoazzolini @V0nMis3s @AdrianClv @christina938 @akira-19, @noctrlz, @taizo-kato

It would be amazing if some of you were still interested in contributing. :)

So where do we go from here?

Updates

  • I've set up a dedicated GitHub org to be the home of all community translations. This org can be found here: https://github.com/solidity-docs.
  • I created a rough process overview, maintainer guidelines and a preliminary progress checklist. All of those documents can be found here: https://github.com/solidity-docs/translation-guide.
  • We decided to not go for a translation tool but give every contributor the freedom to decide how they want to do the translations and keep a purely GitHub based process.
  • Community translations will be added to the official Solidity docs in the flyout languages menu once they reached a certain state (see progress checklist).

Automation - Help Needed!

Optimally, we would like to have a bot similar to the reactjs-translation-bot, which would create PRs with new content that needs to be translated every time the original documentation is updated.

Currently we have nobody who can set this up, so if you have experience in setting something like this up and want to help, please reach out! :)

Your Input

If you have any questions or input, feel free to let me know by creating a topic in the Documentation category of the Solidity forum or by opening an issue in the new solidity-docs organization.

Looking forward to kicking this off with many of you, starting new translations and revamping the existing ones! 🚀 🤗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants