Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the French documentation #163

Merged
merged 16 commits into from
Jul 18, 2024
1 change: 1 addition & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ New features and enhancements
Internal changes
^^^^^^^^^^^^^^^^
* `numpy` has been pinned below v2.0.0 until `xclim` and other dependencies are updated to support it. (:pull:`161`).
* A helper script has been added in the `CI` directory to facilitate the translation of the `xhydro` documentation. (:issue:`63`, :pull:`163`).

v0.3.6 (2024-06-10)
-------------------
Expand Down
141 changes: 141 additions & 0 deletions CI/translator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
"""Translate missing msgstr entries in .po files using the specified translator."""

import logging
import re
import time
from glob import glob
from pathlib import Path

import deep_translator

logger = logging.getLogger(__name__)


def translate_missing_po_entries( # noqa: C901
Zeitsperre marked this conversation as resolved.
Show resolved Hide resolved
dir_path: str,
translator: str = "GoogleTranslator",
source_lang: str = "en",
target_lang: str = "fr",
clean_old_entries: bool = True,
overwrite_fuzzy: bool = True,
**kwargs,
):
r"""
Translate missing msgstr entries in .po files using the specified translator.

Parameters
----------
dir_path : str
The path to the directory containing the .po files.
translator : str
The translator to use. Uses GoogleTranslator by default, but can be changed to any other translator supported by `deep_translator`.
source_lang : str
The source language of the .po files. Defaults to "en".
target_lang : str
The target language of the .po files. Defaults to "fr".
clean_old_entries : bool
Whether to clean old entries in the .po files. Defaults to True.
overwrite_fuzzy : bool
Whether to overwrite fuzzy entries in the .po files. Defaults to True.
\*\*kwargs : dict
Additional keyword arguments to pass to the translator.
"""
msg_pattern = re.compile(r"msgid (.*?)(?=(#~|#:|$))", re.DOTALL)
fuzzy_pattern = re.compile(r"#, fuzzy(.*?)\nmsgid (.*?)(?=(#~|#:|$))", re.DOTALL)

# Initialize the translator
translator = getattr(deep_translator, translator)(
source=source_lang, target=target_lang, **kwargs
)

# Get all .po files
files = glob(f"{dir_path}/**/*.po", recursive=True)

number_of_calls = 0
for file_path in files:
if not any(
dont_translate in file_path for dont_translate in ["changelog", "apidoc"]
):
with open(file_path, "r+", encoding="utf-8") as file:
content = file.read()

# Find all fuzzy entries
fuzzy_entries = fuzzy_pattern.findall(str(content))
if len(fuzzy_entries) > 0 and overwrite_fuzzy:
logger.info(
f"Found {len(fuzzy_entries)} fuzzy entries in {file_path}"
)
for i in fuzzy_entries:
entry = i[1].split("\nmsgstr ")
# Remove the fuzzy entry
content = content.replace(entry[1], '""\n\n')
# Since we can't guarantee the exact way the fuzzy entry was written, we remove the fuzzy tag in 2 steps
content = content.replace(", fuzzy", "")
content = content.replace("#\nmsgid", "msgid")

# Find all msgid and msgstr pairs
msgids = []
msgstrs = []
for i in msg_pattern.findall(str(content)):
ids, strs = i[0].split("\nmsgstr ")
ids = ids if ids != '""' else ""
strs = strs.replace('\\"', "'").replace('"', "").replace("\n", "")
msgids.extend([ids])
msgstrs.extend([strs])

# Track if the file was modified
modified = False

for msgid, msgstr in zip(msgids, msgstrs):
# Check if translation is missing
if msgid and not msgstr:
# Translate the missing string
translated_text = translator.translate(
msgid.replace('\\"', "'").replace('"', "").replace("\n", "")
)

# Split the translated text into lines of max 60 characters
if len(translated_text) > 70: # 70 to include the spaces
words = translated_text.split()
length = 0
words[0] = '"\n"' + words[0]
for i in range(len(words)):
length += len(words[i])
if length > 60:
words[i] = '"\n"' + words[i]
length = 0
translated_text = " ".join(words)

# Replace the empty msgstr with the translated text
content = content.replace(
f'msgid {msgid}\nmsgstr ""',
f'msgid {msgid}\nmsgstr "{translated_text}"',
1,
)
modified = True

# Sleep to avoid rate limiting
number_of_calls += 1
if number_of_calls % 100 == 0:
time.sleep(60)
else:
time.sleep(1)

if clean_old_entries:
is_old = str(content).split("#~")
if len(is_old) > 1:
content = is_old[0]
modified = True

# If modifications were made, write them back to the file
if modified:
logger.info(f"Updating translations in {file_path}")
file.seek(0)
file.write(content)
file.truncate()


# FIXME: Add argparse to make it a command-line tool
if __name__ == "__main__":
dir_path = Path(__file__).parents[1] / "docs" / "locales" / "fr" / "LC_MESSAGES"
translate_missing_po_entries(dir_path)
17 changes: 17 additions & 0 deletions CONTRIBUTING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,23 @@ To run specific code style checks:

To get ``black``, ``isort``, ``blackdoc``, ``ruff``, and ``flake8`` (with plugins ``flake8-alphabetize`` and ``flake8-rst-docstrings``) simply install them with ``pip`` (or ``conda``) into your environment.

Translations
------------

If you would like to contribute to the French translation of the documentation, you can do so by running the following command:

.. code-block:: console

make initialize-translations

This will create or update the French translation files in the `docs/locales/fr/LC_MESSAGES` directory. You can then edit the `.po` files in this directory to provide translations for the documentation. You can use the `translator.py` script located in the `CI` directory to automatically translate the English documentation to French, which uses Google Translate by default. Note that this script requires the `deep-translator` package to be installed in your environment.
RondeauG marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: console

pip install deep-translator

We aim to automate this process eventually but until then, we want to keep the French translation up-to-date with the English documentation at least when a new release is made.

Code of Conduct
---------------

Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ coverage: ## check code coverage quickly with the default Python
autodoc: clean-docs ## create sphinx-apidoc files:
sphinx-apidoc -o docs/apidoc --private --module-first src/xhydro

initialize-translations: clean-docs ## initialize translations, ignoring autodoc-generated files
initialize-translations: clean-docs autodoc ## initialize translations, including autodoc-generated files
${MAKE} -C docs gettext
sphinx-intl update -p docs/_build/gettext -d docs/locales -l fr

Expand Down
24 changes: 24 additions & 0 deletions docs/locales/fr/LC_MESSAGES/apidoc/modules.po
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# SOME DESCRIPTIVE TITLE.
# Copyright (C) 2023, Thomas-Charles Fortier Filion
# This file is distributed under the same license as the xHydro package.
# FIRST AUTHOR <EMAIL@ADDRESS>, 2024.
#
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: xHydro 0.3.6\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2024-07-11 16:20-0400\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language: fr\n"
"Language-Team: fr <LL@li.org>\n"
"Plural-Forms: nplurals=2; plural=(n > 1);\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=utf-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Generated-By: Babel 2.14.0\n"

#: ../../apidoc/modules.rst:2
msgid "xhydro"
msgstr ""
Loading
Loading