|
5 | 5 | - Discussion: [#56301](https://github.com/pandas-dev/pandas/issues/56301)
|
6 | 6 | [#57204](https://github.com/pandas-dev/pandas/pull/57204)
|
7 | 7 | - Author: [Albert Steppi](https://github.com/steppi),
|
8 |
| -- Revision: 1 |
| 8 | +- Revision: 2 |
9 | 9 |
|
10 | 10 | ## Abstract
|
11 | 11 |
|
12 | 12 | The suggestion is to have official translations made for content of the core
|
13 |
| -project website [pandas.pydata.org](https://pandas.pydata.org) and provide a |
14 |
| -language drop-down selector on [pandas.pydata.org](https://pandas.pydata.org) |
15 |
| -similar to what currently exists at [numpy.org](https://numpy.org). |
| 13 | +project website [pandas.pydata.org](https://pandas.pydata.org) and offer |
| 14 | +a low friction way for users to access these translations on the core |
| 15 | +project website. |
16 | 16 |
|
| 17 | +## Motivation, Scope, Usage, and Impact |
17 | 18 |
|
18 |
| -## Motivation and Scope |
| 19 | +There are many potential users with no or a low level of English proficiency |
| 20 | +who could benefit from quality official translations of the Pandas website |
| 21 | +content. Though translations for all documentation would be valuable, |
| 22 | +producing and maintaining translations for such a large and oft-changing |
| 23 | +collection of text would take an immense and sustained effort which may |
| 24 | +be infeasible. The suggestion is instead to have translations made for only |
| 25 | +a key set of pages from the core project website. |
19 | 26 |
|
20 |
| -Pandas is a foundational package in the Scientific Python ecosystem and there |
21 |
| -are many potential users with no or low English proficiency who would benefit |
22 |
| -from having high quality information about Pandas available in their native |
23 |
| -language. |
24 |
| - |
25 |
| -Translation of all content presents considerable challenge due to its sheer |
26 |
| -volume and due to the tendency for technical documentation to exist in a state |
27 |
| -of flux. The suggestion is to have translations for a targeted subset, selected: |
28 |
| - |
29 |
| -- from things which are relatively stable to reduce the ongoing burden of |
30 |
| - keeping translations up to date. |
31 |
| -- to maximize the benefit to users and potential users who currently have no or |
32 |
| - a low level of English proficiency, given the person-hours and resources that |
33 |
| - are likely to be available now and into the future. |
34 |
| - |
35 |
| -Consideration of what subset of content would be most useful for users with |
36 |
| -no or a low level of English proficiency could be a guiding principal to help |
37 |
| -select what information should be available on the core project website, outside |
38 |
| -of the technical documentation. |
39 |
| - |
40 |
| -## Detailed Description |
41 |
| - |
42 |
| -The following is a list of all pages on the core project website which are sourced |
43 |
| -from markdown files at https://github.com/pandas-dev/pandas/tree/main/web/pandas. |
44 |
| - |
45 |
| -- Landing page: https://pandas.pydata.org |
46 |
| -- About pandas: https://pandas.pydata.org/about |
47 |
| -- Project roadmap: https://pandas.pydata.org/about/roadmap.html |
48 |
| -- Governance: https://pandas.pydata.org/about/governance.html |
49 |
| -- Team: https://pandas.pydata.org/about/team.html |
50 |
| -- Sponsors: https://pandas.pydata.org/about/sponsors.html |
51 |
| -- Citing and logo: https://pandas.pydata.org/about/citing.html |
52 |
| -- Getting started: https://pandas.pydata.org/getting_started.html |
53 |
| -- Code of conduct: https://pandas.pydata.org/community/coc.html |
54 |
| -- Ecosystem: https://pandas.pydata.org/community/ecosystem.html |
55 |
| -- Contribute: https://pandas.pydata.org/contribute.html |
56 |
| - |
57 |
| -Provisionally, the suggestion is for all of this content to be translated with |
58 |
| -the possible exception of the "Project roadmap", which may be of limited |
59 |
| -interest to new users. Currently the "Getting started" section may be of |
60 |
| -limited utility to users unable to engage with the externally linked content. In |
61 |
| -the "Project roadmap" within the subsection labeled "Documentation improvements" |
62 |
| -there is a stated goal to: |
63 |
| - |
64 |
| -*Improve the "Getting Started" documentation, designing and writing learning |
65 |
| - paths for users different backgrounds (e.g. brand new to programming, familiar |
66 |
| - with other languages like R, already familiar with Python).* |
67 |
| - |
68 |
| -It is recommended that this goal be accomplished alongside translation work in |
69 |
| -order to make this page more useful to those with no or low English proficiency. |
70 |
| -This would also prevent the need for retranslation if this goal were to be |
71 |
| -accomplished after the original translation work is completed. |
72 |
| - |
73 |
| -A language selection drop-down should be added to the navigation-bar similar to |
74 |
| -what exists at https://numpy.org. |
75 |
| - |
76 |
| - |
77 |
| -## Usage and Impact |
78 |
| - |
79 |
| -The primary impact would be lowering the barrier to entry for non-English |
80 |
| -speakers to get started using Pandas and moving along the path towards learning |
81 |
| -to use it skillfully. |
82 |
| - |
83 |
| -In 2022 it was estimated that there were approximately 400 million native |
84 |
| -speakers of English and between 1.5 - 2 billion people who speak English as a |
85 |
| -second language worldwide |
86 |
| -[Wikipedia](https://web.archive.org/web/20240129080609/https://en.wikipedia.org/wiki/English-speaking_world). |
87 |
| -With an estimated world population of over 8 billion people, this leaves many |
88 |
| -for whom the Pandas core website is not directly accessible. Pandas is an |
89 |
| -important piece of software infrastructure for data manipulation and analysis |
90 |
| -with utility beyond the English speaking world. There is a vast population of |
91 |
| -users and potential users who could benefit from having official information |
92 |
| -about Pandas published in their native language. |
93 |
| - |
94 |
| -Although automated translation tools can help those with no or low English |
95 |
| -proficiency access the content of the Pandas website, these tools often still |
96 |
| -struggle with the technical and jargon-laden language of scientific |
97 |
| -software. This was evinced during the translation of https://numpy.org. |
98 |
| -Automatic translation tools are invaluable as a starting point for human |
99 |
| -translators, but human translators remain important to ensure accuracy. |
100 |
| - |
101 |
| -## Implementation |
| 27 | +## Detailed Description and Implementation |
102 | 28 |
|
103 | 29 | The bulk of the work for setting up translation infrastructure, finding and
|
104 | 30 | vetting translators, and working out how to publish translations, will fall
|
105 | 31 | upon a cross-functional team funded by the [Scientific Python Community & Communications
|
106 | 32 | Infrastructure grant](https://scientific-python.org/doc/scientific-python-community-and-communications-infrastructure-2022.pdf)
|
107 | 33 | to work on adding translations for the main websites of all
|
108 | 34 | [Scientific Python core projects](https://scientific-python.org/specs/core-projects/).
|
109 |
| -The goal is to minimize the burden on the core Pandas maintainers. |
110 |
| - |
111 |
| -A GitHub repository should be set up to mirror content from the core webpage |
112 |
| -which is selected for translation. A GitHub action should be set up to keep |
113 |
| -the mirrored repository up-to-date. Either an action within the main Pandas |
114 |
| -repo which pushes updates to the mirror, or a cron in the mirror which polls |
115 |
| -for relevant updates in Pandas repo and pulls them when necessary. |
| 35 | +The hope is to minimize the burden on the core Pandas maintainers. |
116 | 36 |
|
117 |
| -The mirrored repository would then be synced to the Crowdin localization |
118 |
| -management platform as described in |
| 37 | +No translated content would be hosted within the Pandas repository itself. |
| 38 | +Instead a separate GitHub repository could be set up containing the content |
| 39 | +selected for translation. This repository could then be synced to the Crowdin |
| 40 | +localization management platform as described in |
119 | 41 | [Crowdin's documentation](https://support.crowdin.com/github-integration/).
|
120 |
| -There would be separate folders within the mirror repository, one for each target |
121 |
| -language, with the content initially untranslated. |
122 |
| -Crowdin would then provide a user interface for translators, and updates |
123 |
| -to translations would be pushed to the branch `l10n_main` on the mirrored |
124 |
| -repository. Periodically, manual pull requests would be made to the main Pandas |
125 |
| -repo, adding translated content within folders alongside of the English content. |
126 |
| - |
127 |
| -Translations will be managed within an enterprise Crowdin organization created for |
128 |
| -Scientific Python localization projects. Access to this organization is |
129 |
| -invite-only, and translators will be vetted to help safe-guard against the |
130 |
| -spamming of low quality or inflammatory translations. Approval from a trusted |
131 |
| -admin would be required before translations are merged into the main Pandas |
132 |
| -repo. |
133 |
| - |
134 |
| -A language drop-down selector will need to be added to the navigation-bar of |
135 |
| -the Pandas website. The plan is for development of a generic solution that |
136 |
| -can be reused for all Scientific Python website translations. |
| 42 | +Crowdin would then provide a user interface for translators, and updates to |
| 43 | +translations would be pushed to a feature branch, with completed translations |
| 44 | +periodically merged into `main` after given approval by trusted |
| 45 | +language-specific admin's working across the Scientific Python core projects |
| 46 | +participating in the translation program. There will be no need for Pandas |
| 47 | +maintainers to verify the quality of translations. |
| 48 | + |
| 49 | +The result would be a repository containing parallel versions of content from |
| 50 | +pandas.pydata.org, translated into various languages. Translated content could |
| 51 | +then be pulled from this repository during generation of the Pandas website. A |
| 52 | +low friction means of choosing between languages could then be added. Possibly a |
| 53 | +drop-down language selector similar to what now exists for https://numpy.org, or |
| 54 | +simple links similar to what now exists for https://www.sympy.org/en/index.html. |
| 55 | +A developer supported by the "Scientific Python Community & Communications |
| 56 | +Infrastructure grant" could assist with making the changes necessary for the |
| 57 | +Pandas website to support publication of translations. |
| 58 | + |
| 59 | +If desired, a cron job could be set up on the repository containing translated |
| 60 | +content to check for relevant changes or updates to the Pandas website's content |
| 61 | +and pull them if necessary. Translators could then receive a notification from |
| 62 | +Crowdin that there are new strings to translate. This could help with the |
| 63 | +process of keeping translations up to date. |
137 | 64 |
|
138 | 65 |
|
139 | 66 | ### PDEP History
|
140 | 67 |
|
141 | 68 | - 01 February 2024: Initial draft
|
| 69 | +- 02 February 2024: First revision |
0 commit comments