-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] adorn functions #993
Comments
@thatlittleboy your thoughts on encapsulation to enforce order sound like the right thing to do. I'd admit I'm not so well-versed in the |
Great, thanks for the affirmation @ericmjl . I'll have a think about the desired API and propose something in a PR when I'm ready. :) |
Hello, Thank you |
Hi @Sabrina-Hassaim, welcome! I am going to tag @samukweku, he’s been super active here as a core contributor to pyjanitor and has more context than I. Meanwhile, can I ask, what are your goals for contributing? Want to see how we can best support you as you make your contributions! |
Hello @ericmjl, thank you for your response. I’m currently working on an academic project where I need to contribute to open-source projects by resolving issues. Given my background in data analysis and my experience with libraries like Pandas, I found it fitting to contribute to PyJanitor, as it aligns with my skill set. |
hi @Sabrina-Hassaim please feel free to contribute; i suggest you have a look at the development guide. looking forward to your PR. |
Brief Description
There are a few
adorn_*
functions from R's janitor that are not yet ported over to pyjanitor. Janitor docs here.I'm specifically looking at:
adorn_totals
: adds a "total" column to either the rows, the columns, or bothadorn_percentages
: converts the cell values into percentages, calculated along either axis or over the entire dataframe. In the R formulation, these are floats between 0 and 1, not the 0-100 percentages.adorn_pct_formatting
: formats the 0 to 1 values into the 0 to 100 percentage values, with rounding/formatting optionsadorn_ns
: adds the raw counts back into the cell values (meant to be run afteradorn_percentages
), so each cell has both percentage & count info, like "56 (24.3%)" for example.I imagine these might be particularly useful for those doing data reporting.
These should go into the
functions
module.Example API
In pyjanitor, I don't think having four separate functions work (how to enforce that
adorn_ns
comes afteradorn_percentages
? and where would we get the counts required foradorn_ns
? etc.).Perhaps we could just do a
adorn_totals
, and anadorn_percentages
(which encapsulates the behaviour ofadorn_pct_formatting
andadorn_ns
as well, controlled via function parameters).adorn_totals
This function should mirror the R function almost 1-1.
A few points I disagree(?) with the R implementation:
na.rm
parameter for this, but I somehow feel this isn't necessary.where
parameter, as defined by the R implementation, is to dictate whether to add a Totals "row" or "col"; as opposed to doing the summation over "row"/"col". In the latter case,where="row"
would add a new column containing the Totals across the rows (which to me is more natural). I'm calling this parameteraxis
here btw.adorn_percentages
TBD. Let me have a little think about this over the weekend, I decided against my own implementation idea while writing out the example API.. ><
Original idea
Parameters:
count_position
: whether to do front=="56 (23.4%)", back=="23.4% (56)"count_format
/percentage_format
: if int, then represents the number of decimal places to round to. otherwise a string format specification like ':,.2f' or whatever.I'm not that sold on this API yet. Doesn't look too clean / friendly to use. After all, it is an amalgamation of 3 different behaviours in 1 function 😅). Would be happy to hear comments / suggestions to improve, if any.
The text was updated successfully, but these errors were encountered: