Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a "Best Practices" document #27483

Open
TomAugspurger opened this issue Jul 19, 2019 · 4 comments
Open

Add a "Best Practices" document #27483

TomAugspurger opened this issue Jul 19, 2019 · 4 comments
Labels

Comments

@TomAugspurger
Copy link
Contributor

I'd like to have a document that describes how we think people should write pandas code.

This introduces a bit of friction when documenting something, since you'll need to decide "does it go in best practices or the user guide?" But I think the idea of a "best practices" document with opinionated, short examples and prose, linking back to the user guide and API docs, is valuable.

I've started a notebook at https://mybinder.org/v2/gh/TomAugspurger/pandas-best-practices/master?filepath=Best%20Practices.ipynb

Are there any sections you would add / remove?

Would you structure it differently?

(tangentially, I'd like to explore how we can incorporate binder into our documentation).

@TomAugspurger TomAugspurger added this to the 1.0 milestone Jul 19, 2019
@TomAugspurger
Copy link
Contributor Author

There's probably a lot of overlap between this and #26831.

@datajanko
Copy link
Contributor

What about:

  • Avoid chained indexing
  • Avoid iteration
  • Use boolean masks

If you are searching stackoverflow, still lots of questions do chained indexing.

Additionally, in lots of questions people want to iterate, which most of the times can be avoided using vectorisation, boolean masks etc. I would put this under this under tidy data, since people often just come up with awfully formated data, we could emphasize how easy tasks are if data are well formatted. (Think of lists of strings or tuples in a column)

@JMBurley
Copy link
Contributor

+1 for avoid iterations, boolean masks. From interviews, I can confirm a majority of newbies are bad at both.

On a broader point, I think "how you should write pandas code" falls into two buckets:

  • Syntactic sugar: What style do we suggest is the most readable
  • Efficiency: What methods optimise runtime/memory

I think both are valuable, and a good best-practices document would be helpful for the community at large. Syntactic sugar can be addressed by an opinionated doc with short examples like the airport ones in @TomAugspurger 's notebook; efficiency is best addressed (IMO) with plots showing the runtime/mem footprint of different methods (see https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas for a good example on iterrows).

I am not sure how to give this document the necessary visibility to make it useful, although that is a problem to be solved after there is a defined document that the community thinks is great.

@TomAugspurger TomAugspurger modified the milestones: 1.0, Contributions Welcome Dec 31, 2019
@TomAugspurger
Copy link
Contributor Author

Probably not happening for 1.0.

@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants