Skip to content

Commit cc907de

Browse files
committed
PDEP-14: First revision
1 parent 38ae16e commit cc907de

File tree

1 file changed

+41
-113
lines changed

1 file changed

+41
-113
lines changed

web/pandas/pdeps/0014-translate-website-content.md

+41-113
Original file line numberDiff line numberDiff line change
@@ -5,137 +5,65 @@
55
- Discussion: [#56301](https://github.com/pandas-dev/pandas/issues/56301)
66
[#57204](https://github.com/pandas-dev/pandas/pull/57204)
77
- Author: [Albert Steppi](https://github.com/steppi),
8-
- Revision: 1
8+
- Revision: 2
99

1010
## Abstract
1111

1212
The suggestion is to have official translations made for content of the core
13-
project website [pandas.pydata.org](https://pandas.pydata.org) and provide a
14-
language drop-down selector on [pandas.pydata.org](https://pandas.pydata.org)
15-
similar to what currently exists at [numpy.org](https://numpy.org).
13+
project website [pandas.pydata.org](https://pandas.pydata.org) and offer
14+
a low friction way for users to access these translations on the core
15+
project website.
1616

17+
## Motivation, Scope, Usage, and Impact
1718

18-
## Motivation and Scope
19+
There are many potential users with no or a low level of English proficiency
20+
who could benefit from quality official translations of the Pandas website
21+
content. Though translations for all documentation would be valuable,
22+
producing and maintaining translations for such a large and oft-changing
23+
collection of text would take an immense and sustained effort which may
24+
be infeasible. The suggestion is instead to have translations made for only
25+
a key set of pages from the core project website.
1926

20-
Pandas is a foundational package in the Scientific Python ecosystem and there
21-
are many potential users with no or low English proficiency who would benefit
22-
from having high quality information about Pandas available in their native
23-
language.
24-
25-
Translation of all content presents considerable challenge due to its sheer
26-
volume and due to the tendency for technical documentation to exist in a state
27-
of flux. The suggestion is to have translations for a targeted subset, selected:
28-
29-
- from things which are relatively stable to reduce the ongoing burden of
30-
keeping translations up to date.
31-
- to maximize the benefit to users and potential users who currently have no or
32-
a low level of English proficiency, given the person-hours and resources that
33-
are likely to be available now and into the future.
34-
35-
Consideration of what subset of content would be most useful for users with
36-
no or a low level of English proficiency could be a guiding principal to help
37-
select what information should be available on the core project website, outside
38-
of the technical documentation.
39-
40-
## Detailed Description
41-
42-
The following is a list of all pages on the core project website which are sourced
43-
from markdown files at https://github.com/pandas-dev/pandas/tree/main/web/pandas.
44-
45-
- Landing page: https://pandas.pydata.org
46-
- About pandas: https://pandas.pydata.org/about
47-
- Project roadmap: https://pandas.pydata.org/about/roadmap.html
48-
- Governance: https://pandas.pydata.org/about/governance.html
49-
- Team: https://pandas.pydata.org/about/team.html
50-
- Sponsors: https://pandas.pydata.org/about/sponsors.html
51-
- Citing and logo: https://pandas.pydata.org/about/citing.html
52-
- Getting started: https://pandas.pydata.org/getting_started.html
53-
- Code of conduct: https://pandas.pydata.org/community/coc.html
54-
- Ecosystem: https://pandas.pydata.org/community/ecosystem.html
55-
- Contribute: https://pandas.pydata.org/contribute.html
56-
57-
Provisionally, the suggestion is for all of this content to be translated with
58-
the possible exception of the "Project roadmap", which may be of limited
59-
interest to new users. Currently the "Getting started" section may be of
60-
limited utility to users unable to engage with the externally linked content. In
61-
the "Project roadmap" within the subsection labeled "Documentation improvements"
62-
there is a stated goal to:
63-
64-
*Improve the "Getting Started" documentation, designing and writing learning
65-
paths for users different backgrounds (e.g. brand new to programming, familiar
66-
with other languages like R, already familiar with Python).*
67-
68-
It is recommended that this goal be accomplished alongside translation work in
69-
order to make this page more useful to those with no or low English proficiency.
70-
This would also prevent the need for retranslation if this goal were to be
71-
accomplished after the original translation work is completed.
72-
73-
A language selection drop-down should be added to the navigation-bar similar to
74-
what exists at https://numpy.org.
75-
76-
77-
## Usage and Impact
78-
79-
The primary impact would be lowering the barrier to entry for non-English
80-
speakers to get started using Pandas and moving along the path towards learning
81-
to use it skillfully.
82-
83-
In 2022 it was estimated that there were approximately 400 million native
84-
speakers of English and between 1.5 - 2 billion people who speak English as a
85-
second language worldwide
86-
[Wikipedia](https://web.archive.org/web/20240129080609/https://en.wikipedia.org/wiki/English-speaking_world).
87-
With an estimated world population of over 8 billion people, this leaves many
88-
for whom the Pandas core website is not directly accessible. Pandas is an
89-
important piece of software infrastructure for data manipulation and analysis
90-
with utility beyond the English speaking world. There is a vast population of
91-
users and potential users who could benefit from having official information
92-
about Pandas published in their native language.
93-
94-
Although automated translation tools can help those with no or low English
95-
proficiency access the content of the Pandas website, these tools often still
96-
struggle with the technical and jargon-laden language of scientific
97-
software. This was evinced during the translation of https://numpy.org.
98-
Automatic translation tools are invaluable as a starting point for human
99-
translators, but human translators remain important to ensure accuracy.
100-
101-
## Implementation
27+
## Detailed Description and Implementation
10228

10329
The bulk of the work for setting up translation infrastructure, finding and
10430
vetting translators, and working out how to publish translations, will fall
10531
upon a cross-functional team funded by the [Scientific Python Community & Communications
10632
Infrastructure grant](https://scientific-python.org/doc/scientific-python-community-and-communications-infrastructure-2022.pdf)
10733
to work on adding translations for the main websites of all
10834
[Scientific Python core projects](https://scientific-python.org/specs/core-projects/).
109-
The goal is to minimize the burden on the core Pandas maintainers.
110-
111-
A GitHub repository should be set up to mirror content from the core webpage
112-
which is selected for translation. A GitHub action should be set up to keep
113-
the mirrored repository up-to-date. Either an action within the main Pandas
114-
repo which pushes updates to the mirror, or a cron in the mirror which polls
115-
for relevant updates in Pandas repo and pulls them when necessary.
35+
The hope is to minimize the burden on the core Pandas maintainers.
11636

117-
The mirrored repository would then be synced to the Crowdin localization
118-
management platform as described in
37+
No translated content would be hosted within the Pandas repository itself.
38+
Instead a separate GitHub repository could be set up containing the content
39+
selected for translation. This repository could then be synced to the Crowdin
40+
localization management platform as described in
11941
[Crowdin's documentation](https://support.crowdin.com/github-integration/).
120-
There would be separate folders within the mirror repository, one for each target
121-
language, with the content initially untranslated.
122-
Crowdin would then provide a user interface for translators, and updates
123-
to translations would be pushed to the branch `l10n_main` on the mirrored
124-
repository. Periodically, manual pull requests would be made to the main Pandas
125-
repo, adding translated content within folders alongside of the English content.
126-
127-
Translations will be managed within an enterprise Crowdin organization created for
128-
Scientific Python localization projects. Access to this organization is
129-
invite-only, and translators will be vetted to help safe-guard against the
130-
spamming of low quality or inflammatory translations. Approval from a trusted
131-
admin would be required before translations are merged into the main Pandas
132-
repo.
133-
134-
A language drop-down selector will need to be added to the navigation-bar of
135-
the Pandas website. The plan is for development of a generic solution that
136-
can be reused for all Scientific Python website translations.
42+
Crowdin would then provide a user interface for translators, and updates to
43+
translations would be pushed to a feature branch, with completed translations
44+
periodically merged into `main` after given approval by trusted
45+
language-specific admin's working across the Scientific Python core projects
46+
participating in the translation program. There will be no need for Pandas
47+
maintainers to verify the quality of translations.
48+
49+
The result would be a repository containing parallel versions of content from
50+
pandas.pydata.org, translated into various languages. Translated content could
51+
then be pulled from this repository during generation of the Pandas website. A
52+
low friction means of choosing between languages could then be added. Possibly a
53+
drop-down language selector similar to what now exists for https://numpy.org, or
54+
simple links similar to what now exists for https://www.sympy.org/en/index.html.
55+
A developer supported by the "Scientific Python Community & Communications
56+
Infrastructure grant" could assist with making the changes necessary for the
57+
Pandas website to support publication of translations.
58+
59+
If desired, a cron job could be set up on the repository containing translated
60+
content to check for relevant changes or updates to the Pandas website's content
61+
and pull them if necessary. Translators could then receive a notification from
62+
Crowdin that there are new strings to translate. This could help with the
63+
process of keeping translations up to date.
13764

13865

13966
### PDEP History
14067

14168
- 01 February 2024: Initial draft
69+
- 02 February 2024: First revision

0 commit comments

Comments
 (0)