Skip to content

Translation Workflow

Ruairi Douglas edited this page Aug 6, 2021 · 15 revisions

Translation Vendor (TV)

High-level overview:

In order to make sure we are keeping track of all the files that are edited in our repository, we keep a database record of all files that are marked for translation. On every merge into the main branch (i.e. every time we deploy to production or run a release), any files that are of type .mdx and have the translate property in the frontmatter are added to the database.

Every two weeks, a GitHub action executes to send these translation files to our vendor and create a job. A unique identifier for this job is then stored in a separate table. While the job is in the hands of the vendor to be translated, we execute another Github Action daily to determine if the job has been completed. If the job has been completed, the Github Action will execute a series of steps that will take the newly translated content, replace the old translated content with the new translated content, and then create a pull request. This pull request can then be reviewed before being merged in.

<iframe style="border: 1px solid rgba(0, 0, 0, 0.1);" width="800" height="450" src="https://www.figma.com/embed?embed_host=share&url=https%3A%2F%2Fwww.figma.com%2Ffile%2FPh8QXJlXm2Xxu2As27HBqx%2FSend-Translation%3Fnode-id%3D0%253A1" allowfullscreen></iframe>

Automatic workflow:

This section contains a lot of diagrams explaining each part of the workflow.To become more familiar with the GitHub related terminology or any of the technologies used in the workflow, visit the key concepts section.

Throughout the week, content contributors add changes to our site by creating pull requests off of the develop branch. This happens dozens of times per day. Less frequently, we release those changes (once to multiple times a day) to production through a pull request to our main branch (i.e. production environment.)

When we merge into our main branch, we execute the first of localization GitHub actions:

Add slugs to translation table

This action looks through all of the files that are being merged into the main branch, finds the files that are of the MDX type and have the ‘translate’ frontmatter. It then adds the file names (i.e. the slugs) to a data store (or table or queue, lots of names for this) in DynamoDB that is keyed by the language that they are to be translated in.

Over the next two weeks, all of the changed files will be added to this queue continuously on every merge to main.

Summary:

  • Executes: on every merge to the main branch
  • Can be manually triggered? No
  • Steps:
    • Looks at the files that have been changed on the merge
    • Saves the filenames (or slugs) that have translate property in the frontmatter
    • For each file with the translate property, will save the filename for each language that is listed under the translate property.

Send content to be translated

This action takes all of the saved file names from that translation table in PostresQL, serializes the content into a form that can be consumed by the vendor, sends the visual context for each page off of what is on docs.newrelic.com, it then creates a job for each language specified in the queue, as well as a batch for each job. It saves those id's to different tables which are used later to check on the status of the job.

Relationship Ratios

  • Locales - Translations(pages): 1:Many
  • Jobs - locales: 1:1
  • Jobs - Batches: 1:1

Summary:

  • Executes: bi-weekly
  • Can be manually triggered? Yes
  • Steps:
    • Get files from Translation table with PENDING status
    • Get page from filenames and serialize the content
    • Send serialized content to TV and create job
    • Create batch for each job
    • Sends visual context by fetching the latest version of page from docs.newrelic.com
    • Add jobs in the Job table with job_uid, batch_uid, status
    • Add entries to the TranslationsJobs table with translation_id, job_id
    • Update translations in the Translation table with status IN_PROGRESS or ERRORED on upload failure

Check status of translation job

This action executes every day at a specific time (time TBD). It will first check if any of the jobs saved are completed. If there is any job complete, then it will download the translated file, deserialize the file and turn it back into the mdx format, and then create a pull request into the ‘develop’ branch with the translated files. After the pull request is successfully created, it will remove the job id from the table, and remove the context for that id from the vendor’s store of contexts.

Summary:

  • Executes: daily
  • Can be manually triggered? Yes
  • Steps:
    • Get saved job ids from job id table
    • Checks TV if the job is completed by job id
    • If the job is not completed, nothing happens and the job will stop on the “Fetch translated content and deserialize” step.
    • If completed:
      • Downloads translated files
      • Saves translated files
      • Creates a pull request to develop with the newly translated files
      • Deletes job id from job id table
      • Deletes context

And there you have it! The files will be fully translated and added to our staging environment. In order to deploy it to production, another PR from develop to main needs to be created (i.e. a release).

Manually triggering workflow:

If there are time sensitive jobs that you would like to have translated, then a manual trigger might be necessary! To manually trigger any workflow, you must go to the repository: https://github.com/newrelic/docs-website

Go to the “Actions” tab from the homepage of the repository.

You can then see all of the workflows that we have available. There are three specific to the localization workflow: Add Slugs to Translation Queue, Check status of translation jobs, and Send content to be translated.

Select the workflow that you would like to run manually.

Click on run workflow, which will open a dropdown.

This should always be run from the branch develop. That will be the default branch that pops up, so there are no changes that need to be made from the drop down. Just click Run workflow!

Clicking on Run workflow will start the workflow. You will see it pop up in the workflow history. It might take a few seconds to pop up. If you see a yellow dot appear next to it, that means it is being queued.

Once you see a spinning yellow circle appear around the yellow dot, it means that the workflow has begun executing. When it has started executing, you can then click on it to see the progress of the workflow.

It will open up the menu below. To see the progress of the content, you will need to click on the box that says “Send content” or has some other name. This, in the context of Github Actions, is a job. All of the localization workflows only have one job. Click on the job to see the workflow executing.

In the job, you will see the steps of the workflow. You can click on the steps to see the logging output.

When the job is complete, the yellow mark will turn to a green checkmark. If the result is not what you expected (there isn’t a PR made, the content didn’t get sent to the vendor) you can look through the logging to see what happened.

Scenarios

Some scenarios for running a manual workflow.

There’s a bunch of content that has been deployed to production (i.e. merged into the main branch) and you want to send those translated files over sooner than the two week interval.

If you only want to send everything that is in the translation queue earlier, then you will just need to trigger the Send content to be translated workflow. Just follow the steps above to trigger the workflow. The result should be a job in TV. If there is no job, this could be for a couple of reasons: * There are no files in the translation table. * The job didn’t execute correctly but didn’t fail. Reach out to an engineer on the Developer Enablement team to find out what went wrong (maybe some edge case we missed).

There’s a job completed, and you want those changes in a pull request right away

If you would like to get content from a completed job into GitHub sooner than the daily execution of the workflow, you will need to trigger the Check status of translation jobs workflow. You can do this by following the guidelines above for triggering a manual workflow.

You have some files that you want to add to the translations queue.

There are two cases to consider when adding files to the translation queue.

  1. There is a file that hasn’t been translated before into that language, and you want to set it up to be translated into that language.
  2. There is a file where you want to re-run a translation but hasn’t been edited recently.

There is a file that hasn’t been translated before into that language

If this is the case, this will require you to make a PR to configure that file to be translated. Specifically, you will want to add that language to the translate property in the frontmatter. If the translate property does not exist, you will want to add the property and then list the language underneath.

To make a PR from the GitHub UI, you will need to navigate to the page that you want to translate in the github repo. If you don’t know where it resides in the repo but do know where it resides on the site, you can click on the “edit this page” button in the right navigation of the site, which will take you to the file in GitHub.

Once you are in the file, you may click the little pen icon in the right hand corner to edit the page. Then you can use the in-browser code editor to make changes and create a new branch. For example, if you would like to have the content to be translated into Japanese, you will need to add the front matter for that. For more information on creating a PR, reach out the documentation team! They are leading workshops on contributing to the docs site.

There is a file where you want to re-run a translation but hasn’t been edited recently

This is a case we hopefully wouldn’t run into, since we are constantly keeping track of all the files that have been edited and regularly sending them to be translated. If, however, something happens and there is some need to run a translation on a file that hasn’t been edited recently (no changes to frontmatter, no editing of the content) then reach out to someone on the Developer Enablement team. We may suggest running the translation separately as not part of the workflow (i.e. uploading the files manually, downloading them and then creating a PR) or we can add the filenames manually to the translation queue, but this is not recommended.

Key concepts / Glossary:

GitHub Branches and Pull Requests

Branches are source code for a specific version of the code. You usually branch off of the default branch which is the source of truth for your codebase. When you branch off of the default branch and make some changes, you can create a pull request to add those changes back in. In a pull request, you are requesting that the changes you made in your branch get merged into the default or base branch.

On our site, we have a little bit of a different set up. We have two branches that are the source of truth for our codebase. We have the develop branch which holds the code for our “staging environment” (meaning everything that we want to change before publishing it to the whole world) and then the main branch which holds the code for our “production environment.”

When we make changes to our codebase, we branch off the develop branch and then merge pull requests into the develop branch. When we want to release all these changes to production, we then create a pull request to merge develop into the main branch. We call the pull requests from the develop branch into the main branch “releases” since they are releasing the content to production.

GitHub Actions:

Github Actions are just basically scripts that are run in one of three ways:

  • By some other interaction on GitHub (creating a pull request, merging one branch into another, ect).
  • On a time interval (cron job: everyday at 6pm, every two weeks at 1am, ect.)
  • Or manually triggering the action (or “workflow” or “script”) from the GitHub UI. Github Actions are also called workflows, and inside those workflows are jobs (separate scripts that run to complete the workflow). Most of our workflows only have one job, which contains a series of steps.

DynamoDB tables.

We have two tables in DynamoDB for keeping track of our localization workflow. The first table is the to_translate table or the translation table. It keeps track of all the filenames that have been changed since the last job created and the languages that the files need to be translated to. This is an example of what that table looks like:

The other table that we have is for keeping track of the job ids. These get stored once the job gets created, and is used to check for completion. In DynamoDB, it is named being_translated. This is an example of what that table looks like:

Serialization / Deserialization

In order for TV to be able to process our files, we need to convert it into a format that they accept. For that reason, we serialize our mdx files into html. What this means is that we convert most of custom components into something that looks like this:

In this, we take our mdx components and convert them into plain html. We specifically turn our components into divs and props into serialized strings as either attributes of the div or as children of the div. Therefore, only the content that is relevant to the translator is provided while the rest is hidden as attributes html elements.

When getting this content back from TV, we get back the same format that we sent it to them with the strings replaced with the translated strings. Then we take all the “serialized” components and convert them back into mdx, and take the file and save it as mdx.

Gatsby Cloud

Although not listed on the diagrams above, Gatsby Cloud is where we host and deploy our site. Every time we merge into the main branch, Gatsby Cloud is triggered to build the latest updates from GitHub. It fetches the code from the main branch and builds the site. Once the site has passed the checks, it deploys the built site to docs.newrelic.com. If the site fails to build, then the new version is not deployed and the engineers are notified.

Context

Although also not described in detail in the Automatic workflow section, we also send visual context to TV during the Send content to be translated workflow. This context is the html straight from the current version of the docs site available on docs.newrelic.com. When we send the context to TV, they run automatic matching on the files to match the files that we have uploaded to the contexts that we have uploaded. In the Check status of translation files workflow, we delete the contexts for the job that we have created. The reason for this is to ensure that no outdated contexts are being matched. Read more about visual context and context matching here.

Clone this wiki locally