Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunked string parsing #67

Merged
merged 2 commits into from
Mar 31, 2017

Conversation

thiagomacieira
Copy link
Member

This is a new v0.5 feature: the ability to parse chunked CBOR strings per chunk. It also adds the ability to do zero-copy parsing of strings. From the docs:

   if (cbor_value_is_text_string(value)) {
       char *ptr;
       size_t len;
       while (1) {
           err = cbor_value_get_text_string(value, &ptr, &len, &value));
           if (err) return err;
           if (ptr == NULL) return CborNoError;
           consume(ptr, len);
       }
   }

There are two commits in this series:

  1. Add support for parsing chunked strings
  2. rewrite the iterate_string_chunks function with new chunk code

@thiagomacieira
Copy link
Member Author

Copy link

@otaviobp otaviobp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall code looks good to me, but I am not working with any projects using cbor, so I didn't test it. Better asking for someone else that is using cbor to check that too.
Nice feature btw. I would definitely use that if I was working with cbor right now.

src/cborparser.c Outdated
* char *ptr;
* size_t len;
* while (1) {
* err = cbor_value_get_text_string(value, &ptr, &len, &value));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no cbor_value_get_text_string defined here. Am I missing something or this should be cbor_value_get_text_string_chunk?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cbor_value_get_xxx_string is an inline function in cbor.h which asserts that the type is correct, then calls _cbor_value_get_string. This documentation (and below) apply to the public function.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline function is called cbor_value_get_xxxx_string_chunk. I couldn't find a _cbor_value_get_string too. Unless there is a macro somewhere defining them, there is a problem with names in the sample code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right. Thanks for catching this.

src/cborparser.c Outdated
* char *ptr;
* size_t len;
* while (1) {
* err = cbor_value_get_byte_string(value, &ptr, &len, &value));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

This is the last remainng parsing API. It provides a pair of functions
that extract either a chunk from a byte string or from a text
string. They operate also in non-chunked strings and simulates a
one-chunk string, so code can be generic.

With it, it's possible to support zero-copy parsing of both types of
strings with TinyCBOR.

Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
The new get_string_chunk function allows us to merge the chunked and
non-chunked versions into one single code path. Not only will this share
code with the chunk functions (if the user is using them), there is an
actual reduction in code size after this change.

Before, the function was 0x1d8 bytes in size on x86-64 (-march=skylake)
and 0x17b bytes for x86 (-march=silvermont -miamcu). After, the
combination of the the new iteration function and get_string_chunk is,
respectively, 0x193 and 0x157 bytes, a reduction of 69 and 36 bytes.

Signed-off-by: Thiago Macieira <thiago.macieira@intel.com>
@thiagomacieira thiagomacieira force-pushed the chunked-string-parsing branch from 864d1a6 to 2dcf42f Compare March 16, 2017 01:33
@thiagomacieira
Copy link
Member Author

With no reply, assuming everything is ok.

@thiagomacieira thiagomacieira merged commit 2dcf42f into intel:dev Mar 31, 2017
@thiagomacieira thiagomacieira deleted the chunked-string-parsing branch March 31, 2017 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants