-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use node buffers and decode full lines #198
Conversation
The partial data has been a concern since this was put in place although I've never seen it reported that it's ever an issue. Happy to hear a better to handle it though. This part is something that's been mentioned by a few people over the years though: |
That could be a problem on master as well though, right? EDIT: It probably won't be, as it decodes first. I could search for |
Right, decoding happens first so searching |
I'm not sure how to solve this then. Ideas? |
The potential partial data issue? It's been a concern for several years and I'm not sure of a way around it despite a few more knowledgable encoding people looking over it over the years. Unless someone can mention a work around or reports of the potential issue actually occurring, then I can only leave it as it is. |
I asked in Merge then? |
I really can't see any benefit with this PR, it's just switching a potential partial data issue with a slightly more likely potential issue with character splitting. There hasn't been any reports of the current implementation breaking either. Am i missing something from the #ircv3 discussion yesterday? |
From what I gathered, I don't see the issue you are describing could ever arise. If a client does happen to send such encoded data that happens to have 0x0A (\n) in there somewhere, the server would already split on that, and such a message would never reach other clients (without the server re-encoding the message at least). Besides, what multibyte encoding can send 0x0A where it doesn't mean a new line? This can't happen in UTF-8. |
For reference why it can't happen in utf8: https://softwareengineering.stackexchange.com/questions/262227/why-does-utf-8-waste-several-bits-in-its-encoding To add to that, it seems that Kiwi/irc-framework is alone to decode data from socket first. Other clients that I looked at all search for the new line character first. EDIT: I looked at ZNC's code, and their ReadLine function also searches for new line character before doing any conversions. https://github.com/jimloco/Csocket/blob/e8d9e0bb248c521c2c7fa01e1c6a116d929c41b4/Csocket.cc#L2489-L2536 |
After discussing and getting this explained clearer with some others that know way more about encoding than I, this seems fine to me now. Merging |
This is a change I noticed in #142 but I rewrote it myself (after looking at that PR, it doesn't appear to actually implement the buffering correctly)
This change will use buffers and will split it into lines from said buffer.
iconv.decode
is now called on a single IRC line instead of the whole data buffer passed from the socket (which I only assume could cause troubles on partial data).I also no longer keep an array of lines and directly emit each split and decoded line. There's no need to keep an array because
split
is no longer used (it doesn't exist on Buffer either). This should lower memory consumption.I also reset
incoming_buffer
when callingconnect()
instead of constructor, which should clear out any remaining data if the socket dies.I tested it with ZNC playback where it feels a lot of data, and it works fine.