Skip to content

PDEP-14: Publish translations of pandas.pydata.org #57204

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

steppi
Copy link

@steppi steppi commented Feb 2, 2024

Following up on the discussion in #56301, this PR adds a first draft for a PDEP proposing creation and publication of translations of the core website content.

To summarize, the Scientific Python community and communications infrastructure grant is supporting work to facilitate official translation of content from the websites of Scientific Python core projects and publication of these translations. This PDEP proposes that Pandas participates in these efforts. The hope is that the bulk of the work would be taken on by myself and/or a colleague at Quansight, and the work required by Pandas maintainers would be minimal.

There's more information in the PDEP below. Thank you. Looking forward to hearing your feedback.

@mroeschke mroeschke added the PDEP pandas enhancement proposal label Feb 2, 2024
@mroeschke
Copy link
Member

cc @pandas-dev/pandas-core

@datapythonista
Copy link
Member

Thanks for putting this together @steppi, nice start. I'm fine with the general idea, but being honest, not a fan of the PDEP as it is now.

I may be wrong, but I don't think we need so much context on why translations are useful. This is going to be reviewed by many people, and to me feels like everything in the PDEP except the implementation part could be summarized in a paragraph, making the review of the PDEP much faster.

Also, I don't think the PDEP should be opinionated on what exact pages should be translated. In my opinion we want to define what's made available to be translated, and that's it. Then if you work on some of the translations as part of the grant, surely feel free to skip the roadmap or whatever it doesn't make sense.

I don't fully understand about the mirror, maybe you can expand on why a mirror is useful.

Finally, feels like your idea is to create files such as web/pandas/it/getting_started.md or web/pandas/zh/about/team.md in this same repo with the content translated. I'm a strong -1 on this. We can synchronize the translations in our production repository directly from somewhere else. I'm -1 on anything that involves having translated content in this repo.

Not sure if the idea is to generate .po / .mo files or to have a full copy of the markdown files for each translation, but whatever it is it can be kept separately, and we can fetch it from somewhere else when we build our website, so we have the translated files published. I don't think there is added value on moving translated context to this repo.

@steppi
Copy link
Author

steppi commented Feb 2, 2024

Thanks for the pointed criticism @datapythonista. I’m just trying to get the ball rolling and would have been surprised to produce a good proposal on the first go honestly. I’ll take your feedback into account and make revisions.

@simonjayhawkins
Copy link
Member

Finally, feels like your idea is to create files such as web/pandas/it/getting_started.md or web/pandas/zh/about/team.md in this same repo with the content translated. I'm a strong -1 on this. We can synchronize the translations in our production repository directly from somewhere else. I'm -1 on anything that involves having translated content in this repo.

yep, at first glance, this appears to be a major stumbling block.

I see many web sites that warn of external content, so I guess that any links from the pandas web site to translated content would need something similar.

I don't think it is realistic to expect pandas core maintainers to approve changes/ merge PRs they do not understand. Even if we have some maintainers that are comfortable with some other languages, this still reduces the bandwidth available for approval and discussion.

@datapythonista
Copy link
Member

datapythonista commented Feb 2, 2024

To be clear, I'm fine to have the translations in for example https://pandas.pydata.org/it/ for Italian.

What I think it's a bad idea that adds zero value and a significant maintenance cost is to have translated texts in this repo. The CI process can download translations from .po files, from translated markdown files or from whatever, render the website with them, and publish the website with the translations. I don't see how having translated content in this repo if the pandas maintainers and the translators are different groups of people is helpful in any way.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I'm fine to have the translations in for example https://pandas.pydata.org/it/ for Italian.

sounds good.

I don't see how having translated content in this repo if the pandas maintainers and the translators are different groups of people is helpful in any way.

agree. There a two sentences in this current proposal which I see as problematic.

Comment on lines 124 to 125
repository. Periodically, manual pull requests would be made to the main Pandas
repo, adding translated content within folders alongside of the English content.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this

Comment on lines 130 to 132
spamming of low quality or inflammatory translations. Approval from a trusted
admin would be required before translations are merged into the main Pandas
repo.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this

@steppi
Copy link
Author

steppi commented Feb 3, 2024

Thank you @datapythonista and @simonjayhawkins for your feedback. I've produced an updated version of the proposal incorporating your suggestions. I've removed all of the extra-fluff and replaced it with a one paragraph summary that I hope hits upon the key points, removed the opinions on what should be translated, removed the suggestions to host translated content on the Pandas website itself and removed everything else that would be onerous for Pandas maintainers.

@steppi steppi force-pushed the localization-pdep branch from cc907de to 2610360 Compare February 3, 2024 03:09
@datapythonista
Copy link
Member

Thanks for the update and the flexibility @steppi, this looks much better.

Not sure if you agree, but seems like the proposal goes into many technical details in some things (like already deciding to use crowdin over other options, the need to clone the repo...). And at the same time the process is very unclear. I tried to watch a video on how is the workflow in crowdin, but in their website they'll make me register, and in youtube I couldn't find a short video to understand the process. I watched few, but they all cover the crowdin dashboard.

Do you think in this PDEP we want to already agree in the technical solution, or we just want to agree on moving forward with the project? Feels like it'd be good to either remove the implementation details, or add a much clearer explanation on what exactly is going to be implemented.

Other than that, Imy personal preference is to leave the translators parts more open. If there is abuse or problems with the translations, I would look for solutions, but so far I'd start with letting anyone translate anything. I assume the translation software will let us block or revert if ever needed.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @steppi for the changes.

My current thinking is if this is a funded project with limited scope and duration that maybe a PDEP is not needed after all.

PDEP are for enhancements to the pandas project and processes. It appears that most of this work will be outside of pandas

Comment on lines +13 to +15
project website [pandas.pydata.org](https://pandas.pydata.org) and offer
a low friction way for users to access these translations on the core
project website.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I infer from @datapythonista comments that ideally the proposal should be low friction for contributors as well as the users. i.e. pandas users and volunteers can contribute along side grant funded Quansight staff.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's right. Grant funded Quansight staff will work mostly on setting up infrastructure, and helping to coordinate and facilitate. The hope is that most of the translators will be volunteers, or will be supported by small grants we could potentially help find for them.

Comment on lines +21 to +25
content. Though translations for all documentation would be valuable,
producing and maintaining translations for such a large and oft-changing
collection of text would take an immense and sustained effort which may
be infeasible. The suggestion is instead to have translations made for only
a key set of pages from the core project website.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am correct in thinking that anyone would be able to open a PR on the mirror site (if this would be the solution).

documentation changes are often good first issues.

I assume its the approval process that "would take an immense and sustained effort which may be infeasible."

Otherwise do we have an off ramp for when the translator funding ends?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, anyone could open a PR, and anyone could contribute translations on Crowdin after asking for an invite. Our hope is to set up a compounding snowball type effect, where we can help build a community of volunteer translators who can help keep translations up to date.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. pandas is foremost a community driven volunteer project so this aligns well with the project values.

@steppi
Copy link
Author

steppi commented Feb 3, 2024

Thanks @datapythonista

Not sure if you agree, but seems like the proposal goes into many technical details in some things (like already deciding to use crowdin over other options, the need to clone the repo...). And at the same time the process is very unclear. I tried to watch a video on how is the workflow in crowdin, but in their website they'll make me register, and in youtube I couldn't find a short video to understand the process. I watched few, but they all cover the crowdin dashboard.

Understood. I just saw that the template section heading is Detailed Implementation and thought the expectation was to give a high level of detail. If we don't need to settle down the technical details, I think that would be better. The strong bias towards choosing Crowdin is motivated by wanting to have a uniform platform among the different core projects, and the quality of the free support they've been offering. Since it's only been used for NumPy so far, there's still an opportunity to change platform if there's something that would be significantly better. I didn't really understand how Crowdin works until I set up a test project and played around with it. If you'd like, I could do a short call to run through the workflow, or make a short screencast video.

Other than that, my personal preference is to leave the translators parts more open. If there is abuse or problems with the translations, I would look for solutions, but so far I'd start with letting anyone translate anything. I assume the translation software will let us block or revert if ever needed.

I could look into how that would work. I know that it's possible to suggest translations without overriding existing ones. I think having things invite-only doesn't have to add that much friction if approval is prompt and we guaranteed.

@steppi
Copy link
Author

steppi commented Feb 3, 2024

Thanks @steppi for the changes.

My current thinking is if this is a funded project with limited scope and duration that maybe a PDEP is not needed after all.

PDEP are for enhancements to the pandas project and processes. It appears that most of this work will be outside of pandas

Thanks @simonjayhawkins. That would be nice if we could forgo the PDEP process. Yes, the plan would be for most of the work to be outside of Pandas.

@simonjayhawkins
Copy link
Member

That would be nice if we could forgo the PDEP process.

hopefully others will give some input on this. If the pandas core maintainers are only required to create some sort of synchronization to the pandas web site then IMHO a PDEP is not needed.

Another concern I have that I don't think has yet been mentioned is ensuring that users of the translated documentation are aware where to report issues/suggestions. I assume this is not going to be the main pandas repo.

@steppi
Copy link
Author

steppi commented Feb 3, 2024

Another concern I have that I don't think has yet been mentioned is ensuring that users of the translated documentation are aware where to report issues/suggestions. I assume this is not going to be the main pandas repo

Good question. I think such issues/suggestions should be made in the repo hosting the translations. There could be a link to create an issue in that repo among the things listed when someone clicks new issue in pandas.

@simonjayhawkins
Copy link
Member

There could be a link to create an issue in that repo among the things listed when someone clicks new issue in pandas.

sgtm

Copy link
Contributor

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that a PDEP is needed here. Having said that, I think it is important to specifically say what work is required of the pandas core team to support this PDEP. There are a few things to consider:

  1. What work by the pandas team is required up front (i.e., just one time) to support these translations? For example, changes to how the web site is built, or an additional button/menu, etc.
  2. What ongoing support, if any, is required by the pandas team to support these translations?
  3. If someone updates the English version of a page on the pandas web site, how would the translations get updated?
  4. Will the translations clearly indicate that issues with the translations should be reported elsewhere?
  5. Suppose at some point the pandas team decides to reorganize the pandas documentation? Does this need to be coordinated with the translations project?

For me, if the "cost" of doing this is just the "up front" cost of setting things up, I don't think a PDEP is needed. But if doing this will create an ongoing burden of support from the pandas team, then a PDEP is most likely warranted.

@steppi
Copy link
Author

steppi commented Feb 7, 2024

Thanks @Dr-Irv.

  1. What work by the pandas team is required up front (i.e., just one time) to support these translations? For example, changes to how the web site is built, or an additional button/menu, etc.

Yes, I think you have this right. It would require changes to how the web site is built and addition of some kind of UI element to switch languages.

2. What ongoing support, if any, is required by the pandas team to support these translations?

At the moment I can't think of anything that goes beyond my answers to the following questions.

3. If someone updates the English version of a page on the pandas web site, how would the translations get updated?
As I envision it, the repo hosting the translated content will have a github action polling for changes to the English content, and merging them. These changes would then be synced to Crowdin and translators would receive a notification that there are new strings to translate. Pandas maintainers should not be responsible for finding or communicating with translators. We're going to try to build an active translation community that works across the Scientific Python projects.

4. Will the translations clearly indicate that issues with the translations should be reported elsewhere?

Yes, I think that's doable. Also, one of the preset options when trying to open a new issue on Pandas could be for issues with translations, and it would link elsewhere.

5. Suppose at some point the pandas team decides to reorganize the pandas documentation? Does this need to be coordinated with the translations project?

Potentially, but I'd hope to make the need for coordination minimal. If its just a reorganization, without significant changes to any of the content selected for translation, this could be sorted out on the translation project side with a few hours work (provided that the content selected for translation is kept reasonably small). It would be good to get a heads up that you're planning a reorganization, but I think I could just set up email filters to notify me about issues or pull requests related to website updates.

I think you could potentially feel constrained with respect to major or complete changes to the content that's selected for translation though. In that all translations would be lost and it may take time for new ones to be made.

@steppi
Copy link
Author

steppi commented Feb 25, 2024

Hi everyone. While this is still up in the air, is it OK if I get started setting up a mirrored repo with the content and syncing it with Crowdin? There's an open source license request form, https://crowdin.com/page/open-source-project-setup-request, that they'll let me fill out in your place with your permission. The decision for whether or not you want to publish the translations on your website can wait, but in any case I think it would still be valuable for quality translations to exist.

@datapythonista
Copy link
Member

Thanks for the continued work on this @steppi. This sounds good. I'm personally fine to close this PDEP and have you working on this independently from the main pandas repository. Also, if you want to publish the translated content in for example GitHub pages, I can add a rule to our web server configuration so for example pandas.pydata.org/zh/ serves the Chinese translation you publish there. I guess there may be few challenges here and there, but if this makes sense to you, I personally don't think we need further discussion, build whatever you think it's best, and when things are ready, just let us know how things work, so we can make sure there are no significant concerns in terms of the translations becoming obsolete, people abusing and using the translations to spam...

Of course if you prefer to continue iterating on this PDEP you are welcome to, but since my understanding is that this is a grant you want to be working on regardless of the decisions here, I personally think it'll make your life easier to build the first version in the best way to you personally, and we give you feedback once we can comment on the specific implementation.

If you can get the website translated without core devs having to review PRs with translations or have significant extra maintenance work, it'll be amazing, and I don't think there'll be blockers from anyone.

@steppi
Copy link
Author

steppi commented Feb 27, 2024

Thanks @datapythonista! That sounds good. Feel free to close this issue now. I'll get to work on setting things up for Pandas translations.

If you can get the website translated without core devs having to review PRs with translations or have significant extra maintenance work, it'll be amazing, and I don't think there'll be blockers from anyone.

That's the goal. The discussion with you and other Pandas devs was very valuable in helping me put together a concrete plan for moving this forward. Thank you to everyone who was involved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PDEP pandas enhancement proposal
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants