Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parser: make parser single-pass #61

Closed
wants to merge 1 commit into from
Closed

Conversation

belochub
Copy link
Member

@belochub belochub commented Feb 8, 2017

  • Eliminate jstp::parser::internal::PrepareString function thus avoiding extra memory copying.
  • Fix deleting comment blocks located inside of strings.
  • Fix key parsing not throwing error when it included spaces but was not enclosed in quotes.

Fixes: #60


Type type;
if (!GetType(to_parse, to_parse + size, &type)) {

size_t start_pos = internal::SkipToNextToken(str, end);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be technically correct, it should probably be ptrdiff_t, not size_t. Though they are the same things on x86/x86_64 and most other platforms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is not the case, because SkipToNextToken returns size_t, which is fine (there is no pointer subtraction to get result in this function), but generally you are right and there are other occurrences where it can be important. I guess, I will still review and check for any issues that may be caused by this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I investigated this topic and found out that the main difference between size_t and ptrdiff_t is that former is unsigned integer, whereas latter is signed.
Also, there are a lot of arguments on this topic and generally it is recommended to use signed types for array indexing as well as pointer arithmetic.
At the same time Google C++ Style Guide says:

When appropriate, you are welcome to use standard types like size_t and ptrdiff_t.

We use int very often, for integers we know are not going to be too big, e.g., loop counters. Use plain old int for such things. You should assume that an int is at least 32 bits, but don't assume that it has more than 32 bits. If you need a 64-bit integer type, use int64_t or uint64_t.

I guess it will be better to substitute almost all size_t uses with ptrdiff_t or plain int, but we need to discuss which one will suit us better.
Also I think we need new issue and corresponding PR for this topic, so that we can talk it over and weigh all the pros and cons.

if (size != parsed_size) {
(kParseFunctions[type])(isolate, str + start_pos, end, &parsed_size);

parsed_size += internal::SkipToNextToken(str + start_pos + parsed_size, end);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not increment str pointer after each SkipToNextToken call? We don't need the initial address anyway.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For two reasons: it is const char* and to enable universal check on line 98.

bool is_single_line;

switch (str[1]) {
case '/': {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for braces.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just followed Google C++ Styleguide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the Google C++ Styleguide states:

case blocks in switch statements can have curly braces or not, depending on your preference.

Then we need to decide whether we use them or don't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I liked it more with braces, what is your opinion on this topic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK, I would not use them and have never used them before in such cases, but I checked it out in my editor and this code snippet indeed seems to look prettier and more readable with braces.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stick with braces then. We need to update it in other switch-statements, should I apply it in this pull request, or it is the case for another one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better raise a new PR.

}
case '*': {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

is_single_line = false;
break;
}
default: { // In case it is not a comment start
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. No need for braces.
  2. Two spaces before a single-line comment.

result[j++] = str[i];
}
}
if (is_single_line) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using a ternary operator, though it's up to you.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@aqrln
Copy link
Member

aqrln commented Feb 9, 2017

@belochub I don't feel well now so I will review the rest of the PR tomorrow.

@belochub
Copy link
Member Author

Rebased on master and fixed indentation in single-line comment.

@aqrln
Copy link
Member

aqrln commented Feb 10, 2017

@belochub mind changing Refs to Fixes in the commit metadata?

return String::Empty(isolate);
}
strncpy(result + res_index, symb, out_offset);
delete[] symb;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to get rid of dynamic memory allocation here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can allocate a buffer on stack and pass it to GetControlChar by pointer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this PR, though.

@@ -524,25 +545,21 @@ Local<String> ParseKeyInObject(Isolate* isolate,
} else {
size_t current_length = 0;
for (size_t i = 0; i < *size; i++) {
if (begin[i] == ':') {
if (begin[i] == '_' || (i != 0 ?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe decouple the check into an inline function?

if (current_length != 0) {
result = String::NewFromUtf8(isolate, begin,
NewStringType::kInternalized,
static_cast<int>(current_length))
.ToLocalChecked();
break;
} else {
THROW_EXCEPTION(SyntaxError, "Unexpected token :");
THROW_EXCEPTION(SyntaxError, "Unexpected identifier");
return Local<String>();
Copy link
Member

@aqrln aqrln Feb 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like it to be consistent. In some cases we use Local<String>(), in some cases — String::Empty(Isolate). Though it is the matter of a different PR since it was not introduced in this one.

break;
}
current_key = ParseKeyInObject(isolate,
begin + i,
end,
&current_length);
i += current_length;
i += SkipToNextToken(begin + i, end);
if (begin[i] != ':') {
THROW_EXCEPTION(SyntaxError, "Unexpected token");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change the message to the more meaningful "Colon expected"?

size_t parsed_size = 0;
const char* source = PrepareString(isolate, *in, in.length(), &total_size);
// const char* source = PrepareString(isolate, *in, in.length(), &total_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented-out line should be deleted.

@@ -29,9 +29,10 @@ namespace packet_parser {
Local<String> ParseNetworkPackets(Isolate* isolate,
const String::Utf8Value& in,
Local<Array> out) {
size_t total_size = 0;
size_t total_size = in.length();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can now be const size_t.

@aqrln
Copy link
Member

aqrln commented Feb 10, 2017

LGTM if the commented line is removed. Other points are discussable.

* Eliminate `jstp::parser::internal::PrepareString` function thus
  avoiding extra memory copying.
* Fix deleting comment blocks located inside of strings.
* Fix key parsing not throwing error when it included spaces
  but was not enclosed in quotes.

Fixes: #60
@belochub
Copy link
Member Author

@aqrln updated to resolve some of the issues you addressed.

Copy link
Member

@aqrln aqrln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aqrln
Copy link
Member

aqrln commented Feb 10, 2017

@belochub other issues mentioned have to be extracted and posted on the issue tracker. Will you do it or I should take care of it?

@belochub
Copy link
Member Author

@aqrln, I already have an idea for a new PR fixing some of the issues, so I guess I will also extract other issues as well.

belochub added a commit that referenced this pull request Feb 10, 2017
* Eliminate `jstp::parser::internal::PrepareString` function thus
  avoiding extra memory copying.
* Fix deleting comment blocks located inside of strings.
* Fix key parsing not throwing error when it included spaces
  but was not enclosed in quotes.

Fixes: #60
PR-URL: #61
@belochub
Copy link
Member Author

Landed in 3b2c1d3.

@belochub belochub closed this Feb 10, 2017
@belochub belochub deleted the parser-single-pass branch February 10, 2017 14:56
aqrln pushed a commit that referenced this pull request Feb 20, 2017
* Eliminate `jstp::parser::internal::PrepareString` function thus
  avoiding extra memory copying.
* Fix deleting comment blocks located inside of strings.
* Fix key parsing not throwing error when it included spaces
  but was not enclosed in quotes.

Fixes: #60
PR-URL: #61
@aqrln aqrln mentioned this pull request Feb 20, 2017
aqrln pushed a commit that referenced this pull request Feb 20, 2017
* Eliminate `jstp::parser::internal::PrepareString` function thus
  avoiding extra memory copying.
* Fix deleting comment blocks located inside of strings.
* Fix key parsing not throwing error when it included spaces
  but was not enclosed in quotes.

Fixes: #60
PR-URL: #61
aqrln pushed a commit that referenced this pull request Apr 2, 2017
* Eliminate `jstp::parser::internal::PrepareString` function thus
  avoiding extra memory copying.
* Fix deleting comment blocks located inside of strings.
* Fix key parsing not throwing error when it included spaces
  but was not enclosed in quotes.

Fixes: #60
PR-URL: #61
aqrln pushed a commit that referenced this pull request Apr 2, 2017
* Eliminate `jstp::parser::internal::PrepareString` function thus
  avoiding extra memory copying.
* Fix deleting comment blocks located inside of strings.
* Fix key parsing not throwing error when it included spaces
  but was not enclosed in quotes.

Fixes: #60
PR-URL: #61
aqrln pushed a commit that referenced this pull request Apr 3, 2017
* Eliminate `jstp::parser::internal::PrepareString` function thus
  avoiding extra memory copying.
* Fix deleting comment blocks located inside of strings.
* Fix key parsing not throwing error when it included spaces
  but was not enclosed in quotes.

Fixes: #60
PR-URL: #61
@aqrln aqrln mentioned this pull request Apr 3, 2017
belochub added a commit that referenced this pull request Jan 22, 2018
* Eliminate `jstp::parser::internal::PrepareString` function thus
  avoiding extra memory copying.
* Fix deleting comment blocks located inside of strings.
* Fix key parsing not throwing error when it included spaces
  but was not enclosed in quotes.

Fixes: #60
PR-URL: #61
belochub added a commit that referenced this pull request Jan 22, 2018
* Eliminate `jstp::parser::internal::PrepareString` function thus
  avoiding extra memory copying.
* Fix deleting comment blocks located inside of strings.
* Fix key parsing not throwing error when it included spaces
  but was not enclosed in quotes.

Fixes: #60
PR-URL: #61
@belochub belochub mentioned this pull request Jan 22, 2018
belochub added a commit that referenced this pull request Jan 23, 2018
This is a new and shiny first major release for `metarhia-jstp`.
Changes include API refactoring and improvements, implementations of
CLI, sessions, and application versions, native addon build optimizations,
lots of bug fixes, test coverage increase, and other, less notable changes.

This release also denotes the bump of the protocol version to v1.0.
The only difference from the previous version of the protocol is that
"old" heartbeat messages (`{}`) are now deprecated and `ping`/`pong`
messages must be used for this purpose instead.

Notable changes:

 * **src,build:** improve the native module subsystem
   *(Alexey Orlenko)*
   [#36](#36)
   **\[semver-minor\]**
 * **build:** compile in ISO C++11 mode
   *(Alexey Orlenko)*
   [#37](#37)
   **\[semver-minor\]**
 * **build:** improve error handling
   *(Alexey Orlenko)*
   [#40](#40)
   **\[semver-minor\]**
 * **lib:** refactor record-serialization.js
   *(Alexey Orlenko)*
   [#41](#41)
 * **parser:** fix a possible memory leak
   *(Alexey Orlenko)*
   [#44](#44)
   **\[semver-minor\]**
 * **protocol:** change the format of handshake packets
   *(Alexey Orlenko)*
   [#54](#54)
   **\[semver-major\]**
 * **parser:** make parser single-pass
   *(Mykola Bilochub)*
   [#61](#61)
 * **parser:** remove special case for '\0' literal
   *(Mykola Bilochub)*
   [#68](#68)
   **\[semver-major\]**
 * **parser:** fix bug causing node to crash
   *(Mykola Bilochub)*
   [#75](#75)
 * **client:** drop redundant callback argument
   *(Alexey Orlenko)*
   [#104](#104)
   **\[semver-major\]**
 * **client:** handle errors in connectAndInspect
   *(Alexey Orlenko)*
   [#105](#105)
   **\[semver-major\]**
 * **socket,ws:** use socket.destroy() properly
   *(Alexey Orlenko)*
   [#84](#84)
   **\[semver-major\]**
 * **cli:** add basic implementation
   *(Mykola Bilochub)*
   [#107](#107)
   **\[semver-minor\]**
 * **connection:** fix error handling in optional cbs
   *(Alexey Orlenko)*
   [#147](#147)
   **\[semver-major\]**
 * **test:** add JSON5 specs test suite
   *(Alexey Orlenko)*
   [#158](#158)
 * **lib:** change event signature
   *(Denys Otrishko)*
   [#187](#187)
   **\[semver-major\]**
 * **lib:** add address method to Server
   *(Denys Otrishko)*
   [#190](#190)
   **\[semver-minor\]**
 * **parser:** implement NaN and Infinity parsing
   *(Mykola Bilochub)*
   [#201](#201)
 * **parser:** improve string parsing performance
   *(Mykola Bilochub)*
   [#220](#220)
 * **lib:** optimize connection events
   *(Denys Otrishko)*
   [#222](#222)
   **\[semver-major\]**
 * **lib:** refactor server and client API
   *(Denys Otrishko)*
   [#209](#209)
   **\[semver-major\]**
 * **lib,src:** rename term packet usages to message
   *(Denys Otrishko)*
   [#270](#270)
   **\[semver-major\]**
 * **lib:** emit events about connection messages
   *(Denys Otrishko)*
   [#252](#252)
   **\[semver-minor\]**
 * **lib:** implement API versioning
   *(Denys Otrishko)*
   [#231](#231)
   **\[semver-minor\]**
 * **lib:** allow to set event handlers in application
   *(Denys Otrishko)*
   [#286](#286)
   **\[semver-minor\]**
 * **lib:** allow to broadcast events from server
   *(Denys Otrishko)*
   [#287](#287)
   **\[semver-minor\]**
 * **connection:** make callback method private
   *(Alexey Orlenko)*
   [#306](#306)
   **\[semver-major\]**
 * **lib:** implement sessions
   *(Mykola Bilochub)*
   [#289](#289)
   **\[semver-major\]**
 * **connection:** use ping-pong instead of heartbeat
   *(Dmytro Nechai)*
   [#303](#303)
   **\[semver-major\]**
belochub added a commit to metarhia/mdsf that referenced this pull request Jul 19, 2018
* Eliminate `jstp::parser::internal::PrepareString` function thus
  avoiding extra memory copying.
* Fix deleting comment blocks located inside of strings.
* Fix key parsing not throwing error when it included spaces
  but was not enclosed in quotes.

Fixes: metarhia/jstp#60
PR-URL: metarhia/jstp#61
belochub added a commit to metarhia/mdsf that referenced this pull request Jul 19, 2018
* Eliminate `jstp::parser::internal::PrepareString` function thus
  avoiding extra memory copying.
* Fix deleting comment blocks located inside of strings.
* Fix key parsing not throwing error when it included spaces
  but was not enclosed in quotes.

Fixes: metarhia/jstp#60
PR-URL: metarhia/jstp#61
belochub added a commit to metarhia/mdsf that referenced this pull request Jul 21, 2018
* Eliminate `jstp::parser::internal::PrepareString` function thus
  avoiding extra memory copying.
* Fix deleting comment blocks located inside of strings.
* Fix key parsing not throwing error when it included spaces
  but was not enclosed in quotes.

Fixes: metarhia/jstp#60
PR-URL: metarhia/jstp#61
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make parser single-pass
2 participants