Bug: UTF-8 characters across chunk boundaries get corrupted #374

yetzt · 2021-09-08T11:07:59Z

When the bytes of a UTF-8 character happen to get split across chunks when requesting a resource that is treated as text and not parsed, StreamDecoder._transform will corrupt it by applying iconv.decode on the chunk and thereby replacing the partial unicode character with a replacement character.

needle v3.0.0, reproduce:

require("needle").get("https://data.interaktiv.cloud.funkedigital.de/wahl/example/nw/erg_05994.xml", { parse: false }, function(err, resp, data){
	console.log(data.split('\n')[4379]); // line 4380
});

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: UTF-8 characters across chunk boundaries get corrupted #374

Bug: UTF-8 characters across chunk boundaries get corrupted #374

yetzt commented Sep 8, 2021 •

edited

Loading

Bug: UTF-8 characters across chunk boundaries get corrupted #374

Bug: UTF-8 characters across chunk boundaries get corrupted #374

Comments

yetzt commented Sep 8, 2021 • edited Loading

yetzt commented Sep 8, 2021 •

edited

Loading