Use node buffers and decode full lines #198

xPaw · 2019-03-06T19:20:22Z

This is a change I noticed in #142 but I rewrote it myself (after looking at that PR, it doesn't appear to actually implement the buffering correctly)

This change will use buffers and will split it into lines from said buffer. iconv.decode is now called on a single IRC line instead of the whole data buffer passed from the socket (which I only assume could cause troubles on partial data).

I also no longer keep an array of lines and directly emit each split and decoded line. There's no need to keep an array because split is no longer used (it doesn't exist on Buffer either). This should lower memory consumption.

I also reset incoming_buffer when calling connect() instead of constructor, which should clear out any remaining data if the socket dies.

I tested it with ZNC playback where it feels a lot of data, and it works fine.

prawnsalad · 2019-03-06T21:06:17Z

The partial data has been a concern since this was put in place although I've never seen it reported that it's ever an issue. Happy to hear a better to handle it though.

This part is something that's been mentioned by a few people over the years though: this.incoming_buffer.indexOf('\n', startIndex). Some UTF8 (and most likely other encoding) characters do include the \n byte within them which would cause the splitting to occur half way through a character.

xPaw · 2019-03-06T21:09:26Z

do include the \n byte within them

That could be a problem on master as well though, right? EDIT: It probably won't be, as it decodes first.

I could search for \r\n, but that will break on networks that only send \n.

prawnsalad · 2019-03-06T21:21:57Z

Right, decoding happens first so searching \n will always specifically match a \n character. Also correct on point 2.

xPaw · 2019-03-06T21:23:32Z

I'm not sure how to solve this then. Ideas?

prawnsalad · 2019-03-06T21:28:34Z

The potential partial data issue? It's been a concern for several years and I'm not sure of a way around it despite a few more knowledgable encoding people looking over it over the years. Unless someone can mention a work around or reports of the potential issue actually occurring, then I can only leave it as it is.

xPaw · 2019-03-06T21:51:23Z

I asked in #ircv3, people over there just seem to split on \n over the raw buffer as well. I looked at hexchat code and didn't see anything interesting over there either.

Merge then?

prawnsalad · 2019-03-07T09:59:26Z

I really can't see any benefit with this PR, it's just switching a potential partial data issue with a slightly more likely potential issue with character splitting. There hasn't been any reports of the current implementation breaking either.

Am i missing something from the #ircv3 discussion yesterday?

xPaw · 2019-03-07T10:01:53Z

From what I gathered, I don't see the issue you are describing could ever arise.

If a client does happen to send such encoded data that happens to have 0x0A (\n) in there somewhere, the server would already split on that, and such a message would never reach other clients (without the server re-encoding the message at least).

Besides, what multibyte encoding can send 0x0A where it doesn't mean a new line? This can't happen in UTF-8.

xPaw · 2019-03-07T12:21:55Z

For reference why it can't happen in utf8: https://softwareengineering.stackexchange.com/questions/262227/why-does-utf-8-waste-several-bits-in-its-encoding

To add to that, it seems that Kiwi/irc-framework is alone to decode data from socket first. Other clients that I looked at all search for the new line character first.

EDIT: I looked at ZNC's code, and their ReadLine function also searches for new line character before doing any conversions. https://github.com/jimloco/Csocket/blob/e8d9e0bb248c521c2c7fa01e1c6a116d929c41b4/Csocket.cc#L2489-L2536

prawnsalad · 2019-04-03T03:58:54Z

After discussing and getting this explained clearer with some others that know way more about encoding than I, this seems fine to me now. Merging

Use node buffers and decode full lines

81b9be0

xPaw requested a review from prawnsalad March 6, 2019 19:20

Use 0x0A to search raw bytes without decoding internally

6047bff

xPaw mentioned this pull request Mar 7, 2019

Add fallback encoding for non-UTF-8 lines #142

Closed

prawnsalad merged commit 1b9263e into master Apr 3, 2019

xPaw deleted the xpaw/buffers branch July 13, 2019 09:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use node buffers and decode full lines #198

Use node buffers and decode full lines #198

xPaw commented Mar 6, 2019 •

edited

Loading

prawnsalad commented Mar 6, 2019

xPaw commented Mar 6, 2019 •

edited

Loading

prawnsalad commented Mar 6, 2019

xPaw commented Mar 6, 2019

prawnsalad commented Mar 6, 2019

xPaw commented Mar 6, 2019

prawnsalad commented Mar 7, 2019

xPaw commented Mar 7, 2019 •

edited

Loading

xPaw commented Mar 7, 2019 •

edited

Loading

prawnsalad commented Apr 3, 2019

Use node buffers and decode full lines #198

Use node buffers and decode full lines #198

Conversation

xPaw commented Mar 6, 2019 • edited Loading

prawnsalad commented Mar 6, 2019

xPaw commented Mar 6, 2019 • edited Loading

prawnsalad commented Mar 6, 2019

xPaw commented Mar 6, 2019

prawnsalad commented Mar 6, 2019

xPaw commented Mar 6, 2019

prawnsalad commented Mar 7, 2019

xPaw commented Mar 7, 2019 • edited Loading

xPaw commented Mar 7, 2019 • edited Loading

prawnsalad commented Apr 3, 2019

xPaw commented Mar 6, 2019 •

edited

Loading

xPaw commented Mar 6, 2019 •

edited

Loading

xPaw commented Mar 7, 2019 •

edited

Loading

xPaw commented Mar 7, 2019 •

edited

Loading