Add new function for pairwise T-tests between columns of a dataframe (pingouin.ptests) #291

raphaelvallat · 2022-07-15T22:10:47Z

As discussed in #290, this PR adds the ptests (pairwise_ttest) method to pandas.DataFrame to calculate pairwise T-tests between columns of a pandas DataFrame. This can be used as an alternative to the pingouin.pairwise_tests function when the data is in wide-format instead of long-format. Unlike the pairwise_tests function, the ptests function only return the T-values (lower triangle) and p-values (upper triangle). Please see examples below:

I'm looking for one reviewer to review the PR. Thanks!

>>> import numpy as np
>>> import pandas as pd
>>> import pingouin as pg
>>> # Load an example dataset of personality dimensions
>>> df = pg.read_dataset('pairwise_corr').iloc[:30, 1:]
>>> df.columns = ["N", "E", "O", 'A', "C"]
>>> # Add some missing values
>>> df.iloc[[2, 5, 20], 2] = np.nan
>>> df.iloc[[1, 4, 10], 3] = np.nan
>>> df.head().round(2)
    N     E     O     A     C
0  2.48  4.21  3.94  3.96  3.46
1  2.60  3.19  3.96   NaN  3.23
2  2.81  2.90   NaN  2.75  3.50
3  2.90  3.56  3.52  3.17  2.79
4  3.02  3.33  4.02   NaN  2.85

# Independent pairwise T-tests

>>> df.ptests()
      N       E      O      A    C
N       -     ***    ***    ***  ***
E  -8.397       -                ***
O  -8.332  -0.596      -         ***
A  -8.804    0.12   0.72      -  ***
C  -4.759   3.753  4.074  3.787    -

# Let's compare with SciPy

>>> from scipy.stats import ttest_ind
>>> np.round(ttest_ind(df["N"], df["E"]), 3)
array([-8.397,  0.   ])

# Passing custom parameters to the lower-level :py:func:`scipy.stats.ttest_ind` function

>>> df.ptests(alternative="greater", equal_var=True)
      N       E      O      A    C
N       -
E  -8.397       -                ***
O  -8.332  -0.596      -         ***
A  -8.804    0.12   0.72      -  ***
C  -4.759   3.753  4.074  3.787    -

# Paired T-test, showing the actual p-values instead of stars

>>> df.ptests(paired=True, stars=False, decimals=4)
      N        E       O       A       C
N        -   0.0000  0.0000  0.0000  0.0002
E  -7.0773        -  0.8776  0.7522  0.0012
O  -8.0568  -0.1555       -  0.8137  0.0008
A  -8.3994   0.3191  0.2383       -  0.0009
C  -4.2511   3.5953  3.7849  3.7652       -

# Adjusting for multiple comparisons using the Holm-Bonferroni method

>>> df.ptests(paired=True, stars=False, padjust="holm")
      N       E      O      A      C
N       -   0.000  0.000  0.000  0.001
E  -7.077       -     1.     1.  0.005
O  -8.057  -0.155      -     1.  0.005
A  -8.399   0.319  0.238      -  0.005
C  -4.251   3.595  3.785  3.765      -

codecov · 2022-07-15T22:13:05Z

Codecov Report

Merging #291 (4a11e6b) into master (dce908b) will increase coverage by 0.01%.
The diff coverage is 100.00%.

❗ Current head 4a11e6b differs from pull request most recent head 65c4da5. Consider uploading reports for the commit 65c4da5 to get more accurate results

@@            Coverage Diff             @@
##           master     #291      +/-   ##
==========================================
+ Coverage   98.75%   98.76%   +0.01%     
==========================================
  Files          19       19              
  Lines        3298     3332      +34     
  Branches      529      536       +7     
==========================================
+ Hits         3257     3291      +34     
  Misses         24       24              
  Partials       17       17

Impacted Files	Coverage Δ
pingouin/pairwise.py	`99.46% <100.00%> (+0.05%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

remrama

Looks great @raphaelvallat , zero problems here.

I was working on something yesterday and realized that I could really use this new feature! I wasn't sure if it was implemented yet, took a look and saw you were still waiting for a review. I hope you don't mind I jumped in 👋

Sidenote, I was hoping there was a bit more flexibility in the upper triangle. I'm sure you already considered that so you probably landed on the current structure for good reason. But just fyi, I was thinking the stars parameter could be replaced with something like upper="pvals" (or "stars", "effsize", ...). I don't wanna get too crazy, but offering a similar flexibility in the lower triangle too (an analogous lower parameter) would allow easy access to non-parametrics etc.

notebooks/01_ANOVA.ipynb

remrama · 2022-08-25T00:53:46Z

pingouin/pairwise.py

+
+    Passing custom parameters to the lower-level :py:func:`scipy.stats.ttest_ind` function
+
+    >>> df.ptests(alternative="greater", equal_var=True)


Heads up, I got an error thrown from ttest_ind when I initially ran this in an environment with scipy version 1.7.3:

ValueError: nan-containing/masked inputs with nan_policy='omit' are currently not supported by permutation tests, one-sided asymptotic tests, or trimmed tests.

I updated straight to 1.9.0 and it worked fine 👍

Thanks for letting me know! I think I'll keep the requirements of scipy>=1.7 for now, and we'll bump it to 1.9 in a future Pingouin release.

pingouin/pairwise.py

docs/changelog.rst

raphaelvallat · 2022-08-27T00:37:38Z

Thanks so much @remrama — I was desperately waiting for a reviewer :-)

So about being more flexible in the output, I agree that this could be a nice addition in a future PR. My worry — and the reason I did not implement it — is that for increased speed we are using the lower-level scipy functions here and not a call to pg.ttest:

if paired:
    func = ttest_rel
else:
    func = ttest_ind
t, p = func(self[a], self[b], **kwargs, nan_policy="omit")

Unfortunately however, scipy only returns the T and p-values, so we'd have to either recalculate the effsize / degrees of freedom manually, or, simpler but probably much slower, do a call to pg.ttest instead.

That said, I'll have to do some benchmarks on how much slower this is going to be. I tend to be very obsessed about code speed, but most of the time the differences are barely visible to the users in real-world data...

Thanks!

into rcorr_ptests

raphaelvallat added 6 commits July 8, 2022 17:54

First working implementation of the pd.DataFrame.ptest function

e2933b3

Beter implementation + unit testing

e7be2ae

Black formating

25ad407

Updated changelog

94729d2

Add version added

889fc9b

Update notebooks + documentation

536d567

raphaelvallat added the feature request 🚧 New feature or request label Jul 15, 2022

raphaelvallat self-assigned this Jul 15, 2022

Fix typo in changelog

4a11e6b

raphaelvallat mentioned this pull request Jul 17, 2022

Roadmap for release 0.6.0 #279

Open

11 tasks

remrama approved these changes Aug 25, 2022

View reviewed changes

raphaelvallat added 3 commits August 26, 2022 17:37

Merge branch 'master' into rcorr_ptests

6ccbca0

Fix typos

a73fc7f

Merge branch 'rcorr_ptests' of https://github.com/raphaelvallat/pingouin

65c4da5

into rcorr_ptests

remrama approved these changes Aug 27, 2022

View reviewed changes

raphaelvallat merged commit adf0718 into master Aug 27, 2022

raphaelvallat deleted the rcorr_ptests branch August 27, 2022 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new function for pairwise T-tests between columns of a dataframe (pingouin.ptests) #291

Add new function for pairwise T-tests between columns of a dataframe (pingouin.ptests) #291

raphaelvallat commented Jul 15, 2022

codecov bot commented Jul 15, 2022 •

edited

Loading

remrama left a comment

remrama Aug 25, 2022

raphaelvallat Aug 27, 2022

raphaelvallat commented Aug 27, 2022


		Passing custom parameters to the lower-level :py:func:`scipy.stats.ttest_ind` function

		>>> df.ptests(alternative="greater", equal_var=True)

Add new function for pairwise T-tests between columns of a dataframe (pingouin.ptests) #291

Add new function for pairwise T-tests between columns of a dataframe (pingouin.ptests) #291

Conversation

raphaelvallat commented Jul 15, 2022

codecov bot commented Jul 15, 2022 • edited Loading

Codecov Report

remrama left a comment

Choose a reason for hiding this comment

remrama Aug 25, 2022

Choose a reason for hiding this comment

raphaelvallat Aug 27, 2022

Choose a reason for hiding this comment

raphaelvallat commented Aug 27, 2022

codecov bot commented Jul 15, 2022 •

edited

Loading