-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a "Best Practices" document #27483
Comments
There's probably a lot of overlap between this and #26831. |
What about:
If you are searching stackoverflow, still lots of questions do chained indexing. Additionally, in lots of questions people want to iterate, which most of the times can be avoided using vectorisation, boolean masks etc. I would put this under this under tidy data, since people often just come up with awfully formated data, we could emphasize how easy tasks are if data are well formatted. (Think of lists of strings or tuples in a column) |
+1 for avoid iterations, boolean masks. From interviews, I can confirm a majority of newbies are bad at both. On a broader point, I think "how you should write pandas code" falls into two buckets:
I think both are valuable, and a good best-practices document would be helpful for the community at large. Syntactic sugar can be addressed by an opinionated doc with short examples like the airport ones in @TomAugspurger 's notebook; efficiency is best addressed (IMO) with plots showing the runtime/mem footprint of different methods (see https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas for a good example on iterrows). I am not sure how to give this document the necessary visibility to make it useful, although that is a problem to be solved after there is a defined document that the community thinks is great. |
Probably not happening for 1.0. |
I'd like to have a document that describes how we think people should write pandas code.
This introduces a bit of friction when documenting something, since you'll need to decide "does it go in best practices or the user guide?" But I think the idea of a "best practices" document with opinionated, short examples and prose, linking back to the user guide and API docs, is valuable.
I've started a notebook at https://mybinder.org/v2/gh/TomAugspurger/pandas-best-practices/master?filepath=Best%20Practices.ipynb
Are there any sections you would add / remove?
Would you structure it differently?
(tangentially, I'd like to explore how we can incorporate binder into our documentation).
The text was updated successfully, but these errors were encountered: