Skip to content

Conversation

@maartenbreddels
Copy link
Member

This is a start for cleaner expression printing, ideas/improvements welcome, this is just a cleaned up version of what we had.

image
TODO:

@datascienceit
Copy link

  1. I think 5-10 numbers are good enough, no need for 25 head and 25 tail.
  2. I would add the dtype and if it's virtual before
  3. I would add the total count after.
  4. I think we should show it as a table, even in the cases of expressions, as a table with one, not a list, it's more readable in my opinion.
name: 'Column A'   dtype: int 
1
2.2
3
44
5
...
Size: 25   virtual:true

Just an idea

@maartenbreddels maartenbreddels force-pushed the str_and_repr_expression branch from 3b2468f to 6ec7189 Compare March 2, 2019 13:49
@JovanVeljanoski
Copy link
Member

I've given this a long thought.. In essence I agree with @datascienceit. A column view as he suggested is the most readable, but can be at time annoying having to scroll down. Indeed, having a head and tail number of values set to for e.g. 10 should be a good enough overview of the contents.
My update to his suggestion is to put all the meta-information at the top (the virtual and size/length/rows information). That should minimize the users scrolling to the bottom every time just to look at this.

Initially I was not convinced if we should at all show the size/length of a column. After all, columns come from a (for now) dense tabular dataset, and they all would have the same length. But on second thought, it would be useful when comparing columns from different tables.. so perhaps it is fine to lave it in.

Also a column view leaves the option to add index, if we decide to go for a fancier way for indexing rows.

My competing idea was to have a row based printing. Something like

name: 'Column A'   dtype: int   virtual: False   Size: 50
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]

The downside of this idea is that is it hard to display an index here, if that becomes necessary. Also for values with many decimal places, or text/string columns with a bit longer length, the readability will go down fast.

So I would vote for the 1st idea, as suggested by @datascienceit

@maartenbreddels
Copy link
Member Author

I've settle together with Jovan on this for the moment:
image

We can do a prettier html version later, the idea is just to some values, the expression (clipped to 1 line max), and the dtype and state of the expression (normal column or expression).
We print the meta information on top, since in the notebook, you see that first, and when you see the bottom you see the row number as well.

@xdssio
Copy link
Collaborator

xdssio commented Mar 8, 2019 via email

@JovanVeljanoski
Copy link
Member

The reason why we decided to put all the 'meta-data' above the table, is that the user would not need to scroll all the way to the bottom each time a column is printed, in the case we choose the do the 1st and last 10 entries. At times I find this quite annoying in other libraries...

@maartenbreddels
Copy link
Member Author

I know it's a bit silly, but can we align the numbers? Those are the most

there are still 2 issues, if it contains masked values, it is not aligned with the decimal (can be fixed). By default the 'g' formatting is used for decimals, which does not do 0 padding in the end, which can look unclean, but makes it more clear what are round numbers etc.

@maartenbreddels maartenbreddels force-pushed the str_and_repr_expression branch from e4e601c to 9d7df37 Compare March 18, 2019 13:14
@maartenbreddels
Copy link
Member Author

Indeed, right align looks cleaner:
image

@maartenbreddels maartenbreddels merged commit 99f0154 into master Mar 18, 2019
@maartenbreddels
Copy link
Member Author

Thanks for the feedback all!

@maartenbreddels maartenbreddels deleted the str_and_repr_expression branch May 11, 2020 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants