Description
There haven been previous attempts to reduce the length of the Series/DataFrame repr (pandas.options.display.max_rows
), eg #20514. Related pandas-dev email: https://mail.python.org/pipermail/pandas-dev/2018-March/000732.html
In that discussion, I once made the following proposal to introduce two thresholds:
- We have 2 thresholds instead of 1 (the current 'max_rows'): a number of
rows to show in a truncated repr, and a max number of rows to show
without truncating - For 'big' dataframes, we show a truncated repr. And then I would go even
lower than 20 and only show first/last 5 (so like a max_rows of 10) - For 'small' dataframes, we show the full dataframe without truncating, up
to the threshold.
We would still need to define those two thresholds. But for example, using the current max_rows of 60: we could show a full repr up to 60 rows, and once the number of rows > 60, we only show 10 (first/last 5).
You can then still set both thresholds at the same number (like 20, as in the linked PR above) to not get this variable behaviour.
This is actually similar to what numpy arrays do (but with a bigger threshold: eg np.random.randn(1000) shows all 1000 elements, np.random.randn(1001) shows the first/lst 3).
And it is also very similar to what R tibbles do: they have a "print_min" and "print_max" options with exactly this behaviour, only their "print_max" is lower (it's 10 and 20, respectively):
options(tibble.print_max = n, tibble.print_min = m): if there are more than
n rows, print only the first m rows. Use options(tibble.print_max = Inf)
to always show all rows.