Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement OP_WRITE_RAW on the instructions buffer #86

Merged
merged 8 commits into from
Nov 2, 2020

Conversation

peterzhu2118
Copy link
Member

Implement OP_WRITE_RAW as an immediate instruction. The current implementation stores the size as a 24 byte unsigned integer after the instruction and then size number of bytes following it is the string. I also added a OP_WRITE_RAW_SKIP instruction that skips skips the string. This instruction is used in BlockBody#remove_blank_strings to replace OP_WRITE_RAW instruction. I'm not entirely satisfied in adding an extra instruction to handle this corner case. I've thought of two other ways to implement this:

  1. Store an extra boolean flag to signal whether to skip the instruction or not. The upside to this is that we don't need the special OP_WRITE_RAW_SKIP instruction but will require an extra 8 bytes.
  2. Add a function to the c_buffer that allows deleting sections from the middle. This is probably the cleanest solution but will have performance overhead of memmove.

Also see #77.

Benchmarks

Base branch:

              parse:    156.542  (± 3.2%) i/s -      1.575k in  10.074790s
             render:    222.599  (± 8.1%) i/s -      2.222k in  10.071418s
     parse & render:     85.181  (± 2.3%) i/s -    856.000  in  10.056397s

This branch:

              parse:    158.134  (± 8.9%) i/s -      1.573k in  10.057410s
             render:    230.214  (± 7.8%) i/s -      2.288k in  10.019494s
     parse & render:     87.223  (± 3.4%) i/s -    876.000  in  10.057597s

@dylanahsmith
Copy link
Contributor

The benchmark results don't seem right. I would expect parse time to be significantly slower due to having to copy the immediate argument for OP_WRITE_RAW. Rendering also doesn't seem more efficient, since the string length needs to be decoded. Did you get the benchmark results mixed up?

On master I got this for the benchmark

              parse:    156.458  (± 3.2%) i/s -      1.575k in  10.077279s
             render:    207.469  (± 5.3%) i/s -      2.080k in  10.058117s
     parse & render:     83.494  (± 3.6%) i/s -    840.000  in  10.070280s

and on this branch (rebased) I got

              parse:    147.837  (± 4.1%) i/s -      1.484k in  10.053717s
             render:    203.815  (± 5.4%) i/s -      2.033k in  10.009205s
     parse & render:     80.104  (± 3.7%) i/s -    800.000  in  10.000488s

which seems inline with what I expected.

I was hoping to avoid slowing down rendering with this change. I think we could have a more efficient OP_WRITE_RAW instruction for a byte sized length argument to make rendering more comparable to what we have on master.

Copy link
Contributor

@macournoyer macournoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM. I think introducing a wide version of the instructions might solve the performance regression.

break;
}
case OP_WRITE_RAW_SKIP:
Copy link
Contributor

@macournoyer macournoyer Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look specific to write. Could it be renamed to OP_SKIP?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Maybe even a more generic instruction like OP_JUMP? I think a jump instruction will be useful in the future for many other purposes too. But a jump instruction should probably be signed (to be able to jump backwards). WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah jump! Good idea. Yes offset needs to be signed for for/while to work.

Copy link
Member Author

@peterzhu2118 peterzhu2118 Oct 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, if it's signed when we can only jump 8KiB relative if we use 24-bits, but OP_WRITE_RAW could contain as much as 16KiB (unsigned 24-bits). Should we fall back to using 32-bits for the jump? Or maybe a different instruction for jumping forward vs. backwards?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I've ever seen a specialized negative jump instruction. But that does sound a lot simpler. 👍

@peterzhu2118 peterzhu2118 force-pushed the pz-immediate-raw branch 3 times, most recently from afae410 to c95ba50 Compare October 15, 2020 18:56
@dylanahsmith
Copy link
Contributor

Have you tried using a shorter OP_WRITE_RAW instruction to fix the performance regression? How does that affect the benchmark results?

@peterzhu2118
Copy link
Member Author

@dylanahsmith I just changed OP_WRITE_RAW to be 1 byte and added a OP_WRITE_RAW_W that is 3 bytes. I think it's reduced the performance impact.

This branch after the change:

              parse:    160.329  (± 6.2%) i/s -      1.605k in  10.049573s
             render:    232.030  (± 7.3%) i/s -      2.318k in  10.050578s
     parse & render:     85.336  (± 5.9%) i/s -    856.000  in  10.070809s

This branch before the change:

              parse:    156.611  (± 6.4%) i/s -      1.560k in  10.006603s
             render:    226.787  (± 7.5%) i/s -      2.266k in  10.054463s
     parse & render:     84.311  (± 7.1%) i/s -    840.000  in  10.008604s

Master:

              parse:    161.068  (± 6.2%) i/s -      1.605k in  10.000922s
             render:    235.662  (± 6.8%) i/s -      2.346k in  10.003183s
     parse & render:     86.112  (± 5.8%) i/s -    864.000  in  10.075406s

Copy link
Contributor

@dylanahsmith dylanahsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks good. The performance impact also seems acceptable now. However, I think we now need unit tests to cover the OP_WRITE_RAW_W code paths, which will require raw strings in the template with a length of at least 256 bytes.

@macournoyer
Copy link
Contributor

Rendering time seems almost unaffected w/ the wide instruction. I think we should be "ignoring" the parse speed when benchmarking since the goal is to cache the serialized template.

Moving to wide instructions is effectively moving an operation from rendering to the parsing phase. So parse time is affected. But that will be gone when compiled templates are serialized. So we should be moving as much as possible to the parsing phase if it improves rendering time.

@peterzhu2118 peterzhu2118 merged commit ac408fe into master Nov 2, 2020
@peterzhu2118 peterzhu2118 deleted the pz-immediate-raw branch November 2, 2020 14:38
@peterzhu2118 peterzhu2118 mentioned this pull request Nov 2, 2020
dylanahsmith pushed a commit that referenced this pull request Feb 11, 2021
Implement OP_WRITE_RAW on the instructions buffer

(cherry picked from commit ac408fe)
dylanahsmith pushed a commit that referenced this pull request Feb 11, 2021
Implement OP_WRITE_RAW on the instructions buffer

(cherry picked from commit ac408fe)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants