Skip to content

Commit

Permalink
Do not re-set iconv converter state when the input encoding may use
Browse files Browse the repository at this point in the history
byte-order from a BOM, because some iconv implementations (including
win_iconv, Apple libiconv) forget the byte order on re-set. Partially
reverts 85476.


git-svn-id: https://svn.r-project.org/R/trunk@87222 00db46b3-68df-0310-9c12-caf00c1e9a41
  • Loading branch information
kalibera committed Oct 10, 2024
1 parent 9d47208 commit 6840518
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 1 deletion.
4 changes: 4 additions & 0 deletions src/gnuwin32/system.c
Original file line number Diff line number Diff line change
Expand Up @@ -460,8 +460,12 @@ FileReadConsole(const char *prompt, unsigned char *buf, int len, int addhistory)
err = (res == (size_t)(-1));
/* errors lead to part of the input line being ignored */
if(err) {
/* Should re-set with a stateful encoding, but some iconv
implementations forget byte-order learned from BOM.
Riconv(cd, NULL, NULL, &ob, &onb);
*ob = '\0';
*/
printf(_("<ERROR: re-encoding failure from encoding '%s'>\n"),
R_StdinEnc);
}
Expand Down
7 changes: 6 additions & 1 deletion src/main/sysutils.c
Original file line number Diff line number Diff line change
Expand Up @@ -793,11 +793,16 @@ attribute_hidden SEXP do_iconv(SEXP call, SEXP op, SEXP args, SEXP env)
/* it seems this gets thrown for non-convertible input too */
/* EINVAL returned for invalid input on macOS with system
libiconv */
/*
Should re-set with a stateful encoding, but some iconv
implementations forget byte-order learned from BOM.
res = Riconv(obj, NULL, NULL, &outbuf, &outb);
if (res == -1 && errno == E2BIG) {
R_AllocStringBuffer(2*cbuff.bufsize, &cbuff);
goto top_of_loop;
}
}
*/
if(fromUTF8 && streql(sub, "Unicode")) {
if(outb < 13) {
R_AllocStringBuffer(2*cbuff.bufsize, &cbuff);
Expand Down
4 changes: 4 additions & 0 deletions src/unix/sys-std.c
Original file line number Diff line number Diff line change
Expand Up @@ -1025,8 +1025,12 @@ Rstd_ReadConsole(const char *prompt, unsigned char *buf, int len,
err = res == (size_t)(-1);
/* errors lead to part of the input line being ignored */
if(err) {
/* Should re-set with a stateful encoding, but some iconv
implementations forget byte-order learned from BOM.
Riconv(cd, NULL, NULL, &ob, &onb);
*ob = '\0';
*/
printf(_("<ERROR: re-encoding failure from encoding '%s'>\n"),
R_StdinEnc);
strncpy((char *)buf, obuf, len);
Expand Down

0 comments on commit 6840518

Please sign in to comment.