-
-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline nodes can reference text data of parent block #309
Comments
Sounds good. Do you want to submit a PR? |
I wonder what performance impact this would have on normal use. (More allocations.) |
Use zero-terminated C strings and a separate length field instead of cmark_chunks. Literal inline text will now be copied from the parent block's content buffer, resulting in a slight overhead. The node struct never references memory of other nodes now, fixing commonmark#309. Node accessors don't have to check for delayed creation of C strings.
Use zero-terminated C strings and a separate length field instead of cmark_chunks. Literal inline text will now be copied from the parent block's content buffer, resulting in a slight overhead. The node struct never references memory of other nodes now, fixing commonmark#309. Node accessors don't have to check for delayed creation of C strings.
Here's a branch exploring the idea: https://github.com/nwellnhof/cmark/commits/rework-node-struct The additional allocations cause about 10-15% overhead on my machine. Some other improvements bring this down to 5-10%. Note that the slowdown should only be visible with the built-in renderers. Parsing and iterating all literals using the public API should be faster than before. I really like some of the simplifications in the branch. But if we want to avoid the slowdown, another approach is to clone literals when an inline node is unlinked. |
Simpler is good. I'm for it, even if there's a small slowdown. |
Can you submit your branch as a PR? |
Use zero-terminated C strings and a separate length field instead of cmark_chunks. Literal inline text will now be copied from the parent block's content buffer, slowing the benchmark down by 10-15%. The node struct never references memory of other nodes now, fixing commonmark#309. Node accessors don't have to check for delayed creation of C strings, so parsing and iterating all literals using the public API should actually be faster than before.
Use zero-terminated C strings and a separate length field instead of cmark_chunks. Literal inline text will now be copied from the parent block's content buffer, slowing the benchmark down by 10-15%. The node struct never references memory of other nodes now, fixing #309. Node accessors don't have to check for delayed creation of C strings, so parsing and iterating all literals using the public API should actually be faster than before.
Here's what I measured in benchmarks ( before this change: 1.33 mean That's about 18% -- were you getting different measurements for the performance impact? |
Even with the impact, I think the change is a good idea, but it's more than I'd hoped. |
On Linux, I get 1.21 before and 1.29 after. |
Under MinGW: 2.22 before, 2.31 after |
On my MacBook, the results of each run vary a lot. But the slowdown doesn't seem higher than 10%. |
Strange. My results are pretty consistent with what I reported on my old macbook pro. |
Use zero-terminated C strings and a separate length field instead of cmark_chunks. Literal inline text will now be copied from the parent block's content buffer, slowing the benchmark down by 10-15%. The node struct never references memory of other nodes now, fixing commonmark#309. Node accessors don't have to check for delayed creation of C strings, so parsing and iterating all literals using the public API should actually be faster than before.
When parsing inline nodes, some pieces of text are kept in the parent block's content buffer and referenced from inline children as non-allocated
cmark_chunk
pointing directly into the parent's buffer. If the parent is freed, these pointers become invalid. This can lead to memory corruption. for example when moving inline nodes to another tree and deleting the old parent.I'd suggest to copy all text data to inline children (i. e. use
chunk_clone
instead ofchunk_dup
). Then we could also think about removing the content field for block nodes.The text was updated successfully, but these errors were encountered: