Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib,parser: optimize parser and socket transport #254

Closed
wants to merge 3 commits into from

Conversation

belochub
Copy link
Member

  • Optimize parser by using Buffers without conversion to string when possible and avoiding calling ToString when parsing strings.
  • Change socket to work with Buffers instead of strings to avoid unneeded multiple conversion.
  • Refactor (reimplement) parseNetworkPackets function to make it more verbose, improving readability, and change it to work on Buffers instead of strings.
  • Avoid excessive copying in cases when messages are divided and being received by parts (experimental).

This PR somewhat experimental in some parts (mostly lib/socket.js changes) and is open to comments with better ideas on how to improve it.

if (this._buffer) {
newPartLength = chunk.indexOf(0);
if (newPartLength === -1) {
this._buffer = Buffer.concat(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's worth trying to allocate a big buffer and manually append data to it if there is enough space or if there is not enough space then copy both of them into a new bigger buffer till it's size grows to MAX_PACKET_SIZE (same approach may be used in _parseRemains to avoid creating a new buffer 2 more times)? Or maybe there were problems with this approach?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "append"? Isn't it just copying from one buffer into another?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but AFAIK concat always creates a new buffer to copy the ones passed. I propose to avoid copying both buffers if the first one contains enough space to fit the second buffer and just copy the second buffer to the first one, hence 'append'.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size of the buffer will always be equal to the size of the data inside it because it is coming from 'data' event on the socket, so there is never enough space to copy the second buffer inside it. And you can't just resize the buffer because it is just a view over the ArrayBuffer underneath, you can only create a new one.
I still can't understand how you want this to be implemented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant this for a case where we receive consecutive messages without '\0' in them, so when first such message arrives we copy its contents to the buffer in this class that has enough space (with space to spare) or if it doesn't then we replace it with a bigger one and copy, hence on a second and so on half-messages we won't have to copy what we've received before and only append the received data to the buffer we have in class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lundibundi

now when we receive first half-message we concat it with null(initial state of class buffer)

Where does this happen exactly? I don't recall this behavior at all.

upon each next half-message we receive

There literally can only be two half-messages :)

then upon receiving consequent half-messages

Same here, can you please explain what you mean by "consequent half-messages"?

This way we will only copy our accumulated data if there is not enough space in the class buffer (and we will grow it more that just the size of received data), hence avoiding a lot of copying in case of very fragmented data.

Yes and I was talking exactly about the growing part, it will lead to almost the same amount of copying we have now (see first paragraph in #254 (comment)), and will lead to increased memory usage, I don't understand how this will be better for performance overall.

Copy link
Member

@lundibundi lundibundi Jul 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@belochub Oh yeah, we doesn't concat with null, we just assign it to _buffer.

If we assume that

There literally can only be two half-messages :)

Then yes, there will be the same amount of copying.
If we don't need to handle the case where there are more that 2 half-messages then my suggestion will only make the code more complex, so we should not implement it in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lundibundi, there is nothing to assume, it is a fact that there can't be more than two halves of a whole message (half-messages).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@belochub I think that @lundibundi meant partial-message not half-message. He used the wrong term for it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nechaido, okay then, to benchmark this approach we will need an option to set the message size in benchmark, can you, please, add this option to the benchmark introduced in #253?

}
}

_parseRemains(newPart, newPartLength) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe name it _parseWith or _parseWithChunk?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about _parseMessageRemains?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand the reason for "Remains", as I see this function actually parses the main part and what remains is the latter part of chunk after slice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Remains" means message part that remains in the buffer from the previous parsing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ain't that function parse data that was accumulated (and including the next message) till we've received message with '\0' in it?

Copy link
Member

@lundibundi lundibundi Jul 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that's up to you anyway. Also maybe _parseRemainsWith.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this naming, _parseRemainsWith what? There should be a word after "with".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see it, first argument to the function is what with refers to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but when you call this function, you can't tell what was the name of the arguments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, well, I can 😄. Anyway my thought was that you don't need names of the arguments, it's that the arguments that you pass are what with refers to.

Copy link
Member

@nechaido nechaido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nechaido
Copy link
Member

@belochub could you rebase this on master and run benchmark with and without proposed changes and publish results here.

* Slightly improve string parsing performance.
* Highly improve parsing performance for Buffers by
  avoiding toString conversion (leading to copying data).
@belochub
Copy link
Member Author

Closing this, since serde implementation was moved from this package into the separate mdsf package.
Some of the optimizations from here should probably be moved into the mdsf package.

@belochub belochub closed this Aug 22, 2018
belochub added a commit to metarhia/mdsf that referenced this pull request Aug 22, 2018
* Slightly improve string parsing performance.
* Highly improve parsing performance for Buffers by
  avoiding toString conversion (leading to copying data).

Refs: metarhia/jstp#254
belochub added a commit to metarhia/mdsf that referenced this pull request Aug 22, 2018
Avoid using `strlen()` magic and explicitly check for `'\0'`
character.

Refs: metarhia/jstp#254
belochub added a commit to metarhia/mdsf that referenced this pull request Aug 22, 2018
Avoid using `strlen()` magic and explicitly check for `'\0'`
character.

Refs: metarhia/jstp#254
belochub added a commit to metarhia/mdsf that referenced this pull request Aug 27, 2018
* Slightly improve string parsing performance.
* Highly improve parsing performance for Buffers by
  avoiding toString conversion (leading to copying data).

Refs: metarhia/jstp#254
belochub added a commit to metarhia/mdsf that referenced this pull request Aug 27, 2018
* Slightly improve string parsing performance.
* Highly improve parsing performance for Buffers by
  avoiding toString conversion (leading to copying data).
* Replace deprecated Utf8Value constructor usage.

Refs: metarhia/jstp#254
belochub added a commit to metarhia/mdsf that referenced this pull request Aug 27, 2018
* Slightly improve string parsing performance.
* Highly improve parsing performance for Buffers by
  avoiding toString conversion (leading to copying data).

Refs: metarhia/jstp#254
belochub added a commit to metarhia/mdsf that referenced this pull request Aug 28, 2018
* Slightly improve string parsing performance.
* Highly improve parsing performance for Buffers by
  avoiding toString conversion (leading to copying data).

Refs: metarhia/jstp#254
PR-URL: #21
Reviewed-By: Denys Otrishko <shishugi@gmail.com>
Reviewed-By: Alexey Orlenko <eaglexrlnk@gmail.com>
belochub added a commit to metarhia/mdsf that referenced this pull request Aug 29, 2018
Avoid using `strlen()` magic and explicitly check for `'\0'`
character.

Refs: metarhia/jstp#254
belochub added a commit to metarhia/mdsf that referenced this pull request Aug 30, 2018
Avoid using `strlen()` magic and explicitly check for `'\0'`
character.

Refs: metarhia/jstp#254
belochub added a commit to metarhia/mdsf that referenced this pull request Aug 30, 2018
Avoid using `strlen()` magic and explicitly check for `'\0'`
character.

Refs: metarhia/jstp#254
PR-URL: #22
Reviewed-By: Alexey Orlenko <eaglexrlnk@gmail.com>
Reviewed-By: Denys Otrishko <shishugi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants