Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom CowStr type #23

Open
wants to merge 55 commits into
base: master
Choose a base branch
from
Open

Custom CowStr type #23

wants to merge 55 commits into from

Conversation

kmaasrud
Copy link
Collaborator

@kmaasrud kmaasrud commented Mar 15, 2023

I've started implementing the CowStr type, heavily based on pulldown_cmark's similar type. Most necessary traits should be in place, but I have not yet tried introducing it to the codebase (currently, I am just importing it via a custom name JotdownCowStr.)

src/string.rs Outdated Show resolved Hide resolved
src/string.rs Outdated Show resolved Hide resolved
src/string.rs Outdated Show resolved Hide resolved
src/string.rs Outdated Show resolved Hide resolved
hellux added 4 commits March 20, 2023 23:39
better to provide the original url, the event is already tagged as email

also avoids a string allocation
keep the tag for unresolved links, and allow distinguishing between
`[tag][tag with empty url]` and `[tag][non-existent tag]`.

closes #26
Merge branch 'unresolved_links'

closes #27
ci job still goes green when fuzzing fails, otherwise
@kmaasrud kmaasrud force-pushed the cowstr branch 2 times, most recently from efbc56a to 75dcfb2 Compare March 21, 2023 17:01
hellux added 8 commits March 21, 2023 20:57
Try to make rendering of each event independent.

The only case where we need to peek is when a backref link should be
added to the last paragraph within a footnote.

Before, when exiting a paragraph, we would peek and add the link before
emitting the close tag if the next event is a the footnote end.

Now, the paragraph end event skips emitting a paragraph close tag if it
is within a footnote. The next event will then always close the
paragraph, and if it is a footnote end, it will add the backref link
before closing.
no longer useful as no peeking is needed, use simple early exit instead
separate input/output from rendering state
They apply more to the Render trait now than the implementation in the
html module
derive push/write automatically from these
allow rendering iterators with borrowed events

resolves #24
allow comparing between rendering owned, borrowed or cloned events
@kmaasrud kmaasrud force-pushed the cowstr branch 2 times, most recently from f4b9e8d to 4d4d1c1 Compare March 21, 2023 21:39
@kmaasrud kmaasrud marked this pull request as ready for review March 21, 2023 21:39
@kmaasrud kmaasrud requested a review from hellux March 21, 2023 21:39
@hellux
Copy link
Owner

hellux commented Mar 21, 2023

e.g. readme/all benchmarks seem to suggest no significant benefit currently (4d4d1c1 compared to 16491a4):

block
  Instructions:             1212464 (+0.164150%)
  L1 Accesses:              1607567 (+0.196083%)
  L2 Accesses:                 1810 (-0.330396%)
  RAM Accesses:                2181 (+0.553250%)
  Estimated Cycles:         1692952 (+0.209303%)

block_inline
  Instructions:             5222063 (-0.180426%)
  L1 Accesses:              7517519 (-0.143724%)
  L2 Accesses:                 3109 (-11.39926%)
  RAM Accesses:                2549 (+0.750988%)
  Estimated Cycles:         7622279 (-0.159213%)

full
  Instructions:             5719667 (-0.113706%)
  L1 Accesses:              8098320 (-0.031009%)
  L2 Accesses:                 5213 (-21.37255%)
  RAM Accesses:                3023 (+0.800267%)
  Estimated Cycles:         8230190 (-0.106288%)
full/readme             time:   [226.23 µs 226.28 µs 226.33 µs]
                        thrpt:  [53.096 MiB/s 53.108 MiB/s 53.120 MiB/s]
                 change:
                        time:   [+2.3752% +2.4656% +2.5533%] (p = 0.00 < 0.05)
                        thrpt:  [-2.4897% -2.4063% -2.3201%]
                        Performance has regressed.

having a custom type adds some complexity and inconvenience, so it should have a noticeable performance benefit if we want to go through with it.

might be possible to look at benchmarks where there are more short cowstrings, though.

src/lib.rs Outdated Show resolved Hide resolved
@hellux
Copy link
Owner

hellux commented Mar 21, 2023

should be possible to avoid allocation for concatenation when we have an inline? e.g. in Attributes::insert:

            if key == "class" {
                *prev = format!("{} {}", prev, val).into();
            } else {
                *prev = val;
            }

src/lib.rs Outdated Show resolved Hide resolved
@kmaasrud
Copy link
Collaborator Author

Also, a personal preference: I like to keep commits very atomic. I would perhaps create a no-op CowStr type at first, then add the inline variant, then add each custom function one at a time, e.g. replace.

It is especially useful for benchmarks, easier to detect what affects performance. Also nice when reviewing, to focus on one thing at a time.

Unfortunately, it can take some effort to keep the history tidy if making many changes or rebasing a lot, so it is not really a requirement.

Agreed! I'll make sure to have that in mind for upcoming PRs, but its sadly a bit late for this one, as the whole CowStr with all it's methods would be required to benchmark anything.

I've made some progress on the push_str method, and the Criterion benchmarks look promising actually! I'll look further into the places you mentioned where the advantages of CowStr can be utilized, but am excited for your feedback as well.

src/string.rs Outdated Show resolved Hide resolved
src/string.rs Outdated Show resolved Hide resolved
src/string.rs Outdated Show resolved Hide resolved
src/string.rs Outdated Show resolved Hide resolved
Ordering::Equal => {}
}

inner[start..start + to.len()].copy_from_slice(to.as_bytes());
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be faster to use a new buffer than copying in place for each match? would be good to benchmark to be sure

src/string.rs Outdated Show resolved Hide resolved
src/string.rs Outdated Show resolved Hide resolved
src/string.rs Outdated Show resolved Hide resolved
@hellux
Copy link
Owner

hellux commented Mar 28, 2023

I've made some progress on the push_str method, and the Criterion benchmarks look promising actually!

Did you find a good benchmark that has owned/inlined cowstrs, or have you added one? Otherwise the performance should be unchanged, right?

@hellux
Copy link
Owner

hellux commented Mar 28, 2023

another thought, the owned variant could potentially be a Box<str> instead? It would allow for a smaller cowstr which would make the whole Event smaller which could potentially have a noticeable impact on performance (for all input).

I was initially thinking that Cow<str> would use a Box<str> for its owned variant but it seems to use String because it is 32 bytes. But we could now potentially lower it to 24 bytes if we use Box<str>.

@hellux
Copy link
Owner

hellux commented Mar 28, 2023

I'll make sure to have that in mind for upcoming PRs, but its sadly a bit late for this one, as the whole CowStr with all it's methods would be required to benchmark anything.

I think only the CowStr itself is needed for it to be functional, but we would expect no change in performance then (unless e.g Box<str> is used for owned variant). But adding the Inlined variant might not make any difference unless the push_str is added. replace shouldn't have any impact as it is not used internally.

src/lib.rs Show resolved Hide resolved
@kmaasrud
Copy link
Collaborator Author

I think only the CowStr itself is needed for it to be functional, but we would expect no change in performance then (unless e.g Box is used for owned variant). But adding the Inlined variant might not make any difference unless the push_str is added. replace shouldn't have any impact as it is not used internally.

That's why I was thinking that a complete CowStr (with the Inlined variant and String contained within the Owned variant) would be good to benchmark against our existing Cow. After that, I could add another commit that replaces Owned(String) with Owned(Box) and assess the performance of that change separately.

@kmaasrud kmaasrud force-pushed the cowstr branch 3 times, most recently from ae8d721 to 24418f9 Compare April 1, 2023 20:18
@kmaasrud
Copy link
Collaborator Author

kmaasrud commented Apr 2, 2023

@hellux Here is the full/all benchmark comparing the commit just before and after the introduction of the new CowStr. I have not added any new benchmarks with many short strings, so this seems promising, doesn't it?

Commit Throughput Change
14f6c41 inline: store attributes in vec (squash) 67.958 MiB/s -
18363ea feat: add custom CowStr type 73.363 MiB/s +8.0656%

@hellux
Copy link
Owner

hellux commented Apr 3, 2023

@hellux Here is the full/all benchmark comparing the commit just before and after the introduction of the new CowStr. I have not added any new benchmarks with many short strings, so this seems promising, doesn't it?
Commit Throughput Change
14f6c41 inline: store attributes in vec (squash) 67.958 MiB/s -
18363ea feat: add custom CowStr type 73.363 MiB/s +8.0656%

I'm not able to reproduce similar results on my machines:

AMD 5950X, Zen 3:

commit                                                      cycles                 throughput
14f6c4188 inline: store attributes in vec (squash)          6127389 (-4.265530%)   55.252
c5814c912 feat: add custom CowStr type                      6251397 (+2.023831%)   54.922
200091463 cowstr: add docstrings                            6251397 (No change)    54.820
f067cbd17 cowstr: add tests for push, push_str and replace  6251397 (No change)    55.039

Intel i5-8250U, Coffee Lake:

commit                                                    cycles                 throughput
14f6c41 inline: store attributes in vec (squash)          6564735 (+0.965820%)   34.945
c5814c9 feat: add custom CowStr type                      6526309 (-0.585340%)   34.929
2000914 cowstr: add docstrings                            6526309 (No change)    34.891
f067cbd cowstr: add tests for push, push_str and replace  6526309 (No change)    34.891

I wouldn't expect an improvement to the full/all benchmark unless the cowstr size has been reduced, behavior should be similar in most cases. Should only notice a difference if there are many small cowstrs, which I think we need to add a benchmark for.

Would be interesting to see benchmarks when the cowstr size has been reduced, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CowStr
3 participants