Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] adorn functions #993

Open
thatlittleboy opened this issue Jan 18, 2022 · 6 comments · May be fixed by #1439
Open

[ENH] adorn functions #993

thatlittleboy opened this issue Jan 18, 2022 · 6 comments · May be fixed by #1439
Assignees

Comments

@thatlittleboy
Copy link
Contributor

thatlittleboy commented Jan 18, 2022

Brief Description

There are a few adorn_* functions from R's janitor that are not yet ported over to pyjanitor. Janitor docs here.

I'm specifically looking at:

  • adorn_totals: adds a "total" column to either the rows, the columns, or both
  • adorn_percentages: converts the cell values into percentages, calculated along either axis or over the entire dataframe. In the R formulation, these are floats between 0 and 1, not the 0-100 percentages.
  • adorn_pct_formatting: formats the 0 to 1 values into the 0 to 100 percentage values, with rounding/formatting options
  • adorn_ns: adds the raw counts back into the cell values (meant to be run after adorn_percentages), so each cell has both percentage & count info, like "56 (24.3%)" for example.

I imagine these might be particularly useful for those doing data reporting.
These should go into the functions module.

Example API

In pyjanitor, I don't think having four separate functions work (how to enforce that adorn_ns comes after adorn_percentages? and where would we get the counts required for adorn_ns? etc.).

Perhaps we could just do a adorn_totals, and an adorn_percentages (which encapsulates the behaviour of adorn_pct_formatting and adorn_ns as well, controlled via function parameters).

adorn_totals

This function should mirror the R function almost 1-1.

>>> df = pd.DataFrame({"a": [6, np.nan, 2.5], "b": list("xyz")}); df
     a  b
0  6.0  x
1  NaN  y
2  2.5  z
>>> df.adorn_totals(
...     subset=None,  # or list of index/col names; preferably can take in ranges like `slice("col_a","col_d")` also since `.loc` supports it
...     axis="col",  # index/0/row or column/1/col or both
...     fill_value: str='-',
...     name: str='Total',
... )
         a  b
0      6.0  x
1      NaN  y
2      2.5  z
Total  3.5  -

A few points I disagree(?) with the R implementation:

  • I'm thinking that NaN values will be treated as 0 here by default, so totals won't be affected by presence of NaN -> sum(1, NaN, 2.5) = 3.5. The R janitor function has an na.rm parameter for this, but I somehow feel this isn't necessary.
  • The where parameter, as defined by the R implementation, is to dictate whether to add a Totals "row" or "col"; as opposed to doing the summation over "row"/"col". In the latter case, where="row" would add a new column containing the Totals across the rows (which to me is more natural). I'm calling this parameter axis here btw.

adorn_percentages

TBD. Let me have a little think about this over the weekend, I decided against my own implementation idea while writing out the example API.. ><

Original idea
>>> df = pd.DataFrame({"a": [6, np.nan, 2.5], "b": list("xyz")}); df
     a  b
0  6.0  x
1  NaN  y
2  2.5  z
>>> df.adorn_percentages(
...     subset=None,  # similar to `adorn_totals`
...     axis='col',  # similar to `adorn_totals`
...     adorn_count=True,
...     count_position='front',  # ignored if adorn_count=False
...     count_format=0,  # ignored if adorn_count=False
...     percentage_format=2,
... )
            a  b
0  6 (70.59%)  x
1         nan  y
2   3 (29.4%)  z

Parameters:

  • count_position: whether to do front=="56 (23.4%)", back=="23.4% (56)"
  • count_format / percentage_format: if int, then represents the number of decimal places to round to. otherwise a string format specification like ':,.2f' or whatever.

I'm not that sold on this API yet. Doesn't look too clean / friendly to use. After all, it is an amalgamation of 3 different behaviours in 1 function 😅). Would be happy to hear comments / suggestions to improve, if any.

@ericmjl
Copy link
Member

ericmjl commented Jan 22, 2022

@thatlittleboy your thoughts on encapsulation to enforce order sound like the right thing to do.

I'd admit I'm not so well-versed in the adorn_* family of functions in janitor, so I'll hold off on commenting on their specific behaviour. That said, I am in favour of adding in janitor functionality into pyjanitor, and I'm also in favour of your way of thinking about how to organize the functions in a sane fashion too. 😄

@thatlittleboy
Copy link
Contributor Author

Great, thanks for the affirmation @ericmjl . I'll have a think about the desired API and propose something in a PR when I'm ready. :)

@Sabrina-Hassaim
Copy link

Hello,
My name is Sabrina, and I’m excited about the opportunity to contribute to the pyjanitor project. I have been exploring it and found several issues that align with my skills. I would love to be assigned to one or more issues, starting by this one.
Please let me know how I can help.

Thank you

@ericmjl
Copy link
Member

ericmjl commented Oct 17, 2024

Hi @Sabrina-Hassaim, welcome! I am going to tag @samukweku, he’s been super active here as a core contributor to pyjanitor and has more context than I. Meanwhile, can I ask, what are your goals for contributing? Want to see how we can best support you as you make your contributions!

@Sabrina-Hassaim
Copy link

Hello @ericmjl, thank you for your response. I’m currently working on an academic project where I need to contribute to open-source projects by resolving issues. Given my background in data analysis and my experience with libraries like Pandas, I found it fitting to contribute to PyJanitor, as it aligns with my skill set.

@samukweku
Copy link
Collaborator

hi @Sabrina-Hassaim please feel free to contribute; i suggest you have a look at the development guide. looking forward to your PR.

@Sarra99 Sarra99 linked a pull request Jan 24, 2025 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants