Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

abbreviated output for data.frame is wrong #28

Closed
jankatins opened this issue Feb 25, 2016 · 28 comments
Closed

abbreviated output for data.frame is wrong #28

jankatins opened this issue Feb 25, 2016 · 28 comments
Labels
Milestone

Comments

@jankatins
Copy link
Contributor

I've a data frame (> 100 rows) which is abbreviated when I print it with full_df %>% select(raw_lname, raw_fname). But the index is not showing the right numbers but the numbers in the current data frame:

1
2
[...]
29
30
...
32
33
[...]
60
61

-> It doesn't show the real row number but the one from the printed frame :-(

@karldw
Copy link
Contributor

karldw commented Feb 26, 2016

It's possible I broke this when I was fixing things in the array abbreviating function. Can you tell me more about how this output is wrong? In particular, I'm not sure I understand the phrase "the index is not showing the right numbers but the numbers in the current data frame".

@jankatins
Copy link
Contributor Author

I would expect that the index shows the row number in the old dataframe or none. So

1
2
[...]
29
30
...
150
151
[...]
180
181

@karldw
Copy link
Contributor

karldw commented Mar 7, 2016

I think this was fixed along with #31. @JanSchulz, can you confirm?

@jankatins
Copy link
Contributor Author

Nope, still the same behavior, I think I tracked it down to the use of the readr package:

ensure.package("dplyr")
ensure.package("readr")
df = data.frame(a=c(1:1000, 1111111111111), b=1:1001)
write_csv(df, "test_delete.csv")
df2 = read_csv("test_delete.csv")
#df2 %>% select(a) %>% glimpse()
df %>% select(a) # ok
df2 %>% select(a) # no ok...

@karldw
Copy link
Contributor

karldw commented Mar 7, 2016

Interesting, thanks. I think it's something with the way repr shortens dplyr's tbl_dfs. For instance, this code has the same problem:

library(dplyr)
df <- data.frame(a=c(1:1000, 111111), b=1:1001)
df2 <- tbl_df(df)
df  # ok
df2  # not ok

@karldw
Copy link
Contributor

karldw commented Mar 10, 2016

The problem is that dplyr automatically reassigns row names, which makes a ton of sense, but causes a bit of hassle here. The problem should be fixed whenever my current set of commits is merged in. (#32)

@jankatins
Copy link
Contributor Author

Confirmed as fixed in a dataframe and with dplyr. Closing.

@karldw Thanks!

@jankatins
Copy link
Contributor Author

Ok, I take that back:

library("dplyr")
options(jupyter.log_level = 3)
options(jupyter.rich_display = FALSE)
data.frame(a=1:200, b=201:400) %>% select(a)
[... shortend...]
27        27
28        28
29        29
30        30
<U+22EE>   <U+22EE>
171      171
172      172
173      173
174      174
[...shortend...]

In the html output it's fine...

@jankatins jankatins reopened this Mar 30, 2016
@jankatins jankatins added the bug label Apr 7, 2016
@jankatins jankatins added this to the 0.5 milestone Apr 7, 2016
@jankatins
Copy link
Contributor Author

Is this #21?

@karldw
Copy link
Contributor

karldw commented Apr 11, 2016

Maybe? I don't run into your issue with Firefox or Chromium on Ubuntu.

screenshot from 2016-04-11 08-29-28

@flying-sheep
Copy link
Member

OK, looks like it. thanks!

@jankatins
Copy link
Contributor Author

With me, it's still broken!

@karldw are you on linux? If yes, this is "just" #21 again... If so we should replace the char with something which works on windows as well... At least as long as we don't have a fix for #21...

@jankatins jankatins reopened this Apr 11, 2016
@flying-sheep
Copy link
Member

@karldw are you on linux? If yes, this is "just" #21 again...

this is why i closed it:

I don't run into your issue with Firefox or Chromium on Ubuntu

you have missed that part 😁

@jankatins
Copy link
Contributor Author

Oups, yes... should go home...

@jankatins
Copy link
Contributor Author

Ok, it doesn't look like #21 will be fixed soon, so we should change the char: r-lib/evaluate#66 (comment)

@flying-sheep
Copy link
Member

nooooooooooooooooooooooooooooooooooooooooooooooooooo

@flying-sheep
Copy link
Member

no

@flying-sheep
Copy link
Member

no fucking windows preventing us from having pretty things.

if you want you can create code that detects if the character is supported and replaces it if it isn’t. with your strlen checking, there’s a good avenue for this.

like:

ellip.h <- if (strlen('\u22EF') == 1) '\u22EF' else '...'
ellip.v <- if (strlen('\u22EE') == 1) '\u22EE' else ellip.h
ellip.d <- if (strlen('\u22F1') == 1) '\u22F1' else ''

@jankatins
Copy link
Contributor Author

This check doesn't work, you would have to print it in a capture.output... Is it such a problem to just use three dots?

@flying-sheep
Copy link
Member

yes!

we can encapsulate the correct check in a function:

char_fallback <- function(char, default) {
    real_len <- nchar(char)
    r_len <- nchar(capture.output(cat(char))
    if (real_len == r_len) char else default
}
ellip.h <- char_fallback('\u22EF', '...')
ellip.v <- char_fallback('\u22EE', ellip.h)
ellip.d <- char_fallback('\u22F1', '')

would that be correct?

@jankatins
Copy link
Contributor Author

This works on my windows box:

char_fallback <- function(char, default) {
  real_len <- nchar(char)
  r_len <- nchar(capture.output(cat(char)))
  if (real_len == r_len) char else default
}
ellip.h <- char_fallback('\u22EF', '...')
ellip.v <- char_fallback('\u22EE', ellip.h)
ellip.d <- char_fallback('\u22F1', '')

If you go that way, you could also do something like this in the executer:

if(.Platform$OS.type == "windows") {
  issue_warning_if_bad_unicode <- function(code){
    real_len <- nchar(code)
    r_len <- nchar(capture.output(cat(code)))
    if (real_len != r_len){
       # Whatever is the right method here...
      display_error("Your code contains a unicode char which is not displayable in your current locale. This can lead to subtile errors if you use such chars to do comparisons. Please see ...")
    }
  }
} else {
   # Nothing on other systems
  issue_warning_if_bad_unicode <- function(code) NULL
}
issue_warning_if_bad_unicode(code)

@flying-sheep
Copy link
Member

display_error() would be warning() 😉

and your idea is to call that function every time something is executed?

@jankatins
Copy link
Contributor Author

display_error() would be warning() 😉

I'm not sure if the warning would end up in the right place in the frontend... -> Similar to handle_display_error:

 if (!silent) {
                send_response('stream', request, 'iopub', list(
                    name = 'stderr',
                    text = msg))
            }

and your idea is to call that function every time something is executed?

Yes, only on windows and maybe configurable.

I'm a big fan of "explicit error messages": better do a check too many (or a fallback) than getting obscure error messages (or even silent errors, like here) in places where it's not obvious what went wrong...

@flying-sheep
Copy link
Member

ah, good. we should create a display_error method for the Executor then and call it from handle_display_error and when the code has unicode and people are on shitty OSs 😜

@jankatins
Copy link
Contributor Author

more display_stream(..., stream=c('stdout', 'stderr')), but this could also simple be a private helper in the irkernel package.

@jankatins
Copy link
Contributor Author

If you want, just go ahead, I won't work on this until wednesday evening...

@jankatins
Copy link
Contributor Author

Ok, on it...

@flying-sheep
Copy link
Member

👍 sorry, wasn’t in the mood

jankatins added a commit to jankatins/repr that referenced this issue Apr 13, 2016
On windows, ellipses are usually outside the chars which can be displayed in our
way to display plain text /with  `capture.output(print(...))` in the current
locale. The leads to strangely escaped chars and we would have such a escaped
chars each time we display a dataframe.

This problem is probably not going away soon:
IRkernel#28 (comment)

Work around this by simple using three dots.
jankatins added a commit to jankatins/repr that referenced this issue Apr 13, 2016
On windows, ellipses are usually outside the chars which can be displayed in our
way to display plain text /with  `capture.output(print(...))` in the current
locale. The leads to strangely escaped chars and we would have such a escaped
chars each time we display a dataframe.

This problem is probably not going away soon:
IRkernel#28 (comment)

Work around this by simple using three dots.
jankatins added a commit to jankatins/repr that referenced this issue Apr 21, 2016
On windows, ellipses are usually outside the chars which can be displayed in our
way to display plain text /with  `capture.output(print(...))` in the current
locale. The leads to strangely escaped chars and we would have such a escaped
chars each time we display a dataframe.

This problem is probably not going away soon:
IRkernel#28 (comment)

Work around this by simple using three dots.
jankatins added a commit to jankatins/repr that referenced this issue Apr 21, 2016
On windows, ellipses are usually outside the chars which can be displayed in our
way to display plain text /with  `capture.output(print(...))` in the current
locale. The leads to strangely escaped chars and we would have such a escaped
chars each time we display a dataframe.

This problem is probably not going away soon:
IRkernel#28 (comment)

Work around this by simple using three dots.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants