Implement OP_WRITE_RAW on the instructions buffer #86

peterzhu2118 · 2020-10-14T14:28:47Z

Implement OP_WRITE_RAW as an immediate instruction. The current implementation stores the size as a 24 byte unsigned integer after the instruction and then size number of bytes following it is the string. I also added a OP_WRITE_RAW_SKIP instruction that skips skips the string. This instruction is used in BlockBody#remove_blank_strings to replace OP_WRITE_RAW instruction. I'm not entirely satisfied in adding an extra instruction to handle this corner case. I've thought of two other ways to implement this:

Store an extra boolean flag to signal whether to skip the instruction or not. The upside to this is that we don't need the special OP_WRITE_RAW_SKIP instruction but will require an extra 8 bytes.
Add a function to the c_buffer that allows deleting sections from the middle. This is probably the cleanest solution but will have performance overhead of memmove.

Also see #77.

Benchmarks

Base branch:

              parse:    156.542  (± 3.2%) i/s -      1.575k in  10.074790s
             render:    222.599  (± 8.1%) i/s -      2.222k in  10.071418s
     parse & render:     85.181  (± 2.3%) i/s -    856.000  in  10.056397s

This branch:

              parse:    158.134  (± 8.9%) i/s -      1.573k in  10.057410s
             render:    230.214  (± 7.8%) i/s -      2.288k in  10.019494s
     parse & render:     87.223  (± 3.4%) i/s -    876.000  in  10.057597s

ext/liquid_c/vm_assembler.c

dylanahsmith · 2020-10-14T20:22:42Z

The benchmark results don't seem right. I would expect parse time to be significantly slower due to having to copy the immediate argument for OP_WRITE_RAW. Rendering also doesn't seem more efficient, since the string length needs to be decoded. Did you get the benchmark results mixed up?

On master I got this for the benchmark

              parse:    156.458  (± 3.2%) i/s -      1.575k in  10.077279s
             render:    207.469  (± 5.3%) i/s -      2.080k in  10.058117s
     parse & render:     83.494  (± 3.6%) i/s -    840.000  in  10.070280s

and on this branch (rebased) I got

              parse:    147.837  (± 4.1%) i/s -      1.484k in  10.053717s
             render:    203.815  (± 5.4%) i/s -      2.033k in  10.009205s
     parse & render:     80.104  (± 3.7%) i/s -    800.000  in  10.000488s

which seems inline with what I expected.

I was hoping to avoid slowing down rendering with this change. I think we could have a more efficient OP_WRITE_RAW instruction for a byte sized length argument to make rendering more comparable to what we have on master.

macournoyer

Code LGTM. I think introducing a wide version of the instructions might solve the performance regression.

macournoyer · 2020-10-14T21:07:35Z

ext/liquid_c/vm.c

                break;
            }
+            case OP_WRITE_RAW_SKIP:


Doesn't look specific to write. Could it be renamed to OP_SKIP?

Yeah. Maybe even a more generic instruction like OP_JUMP? I think a jump instruction will be useful in the future for many other purposes too. But a jump instruction should probably be signed (to be able to jump backwards). WDYT?

Ah yeah jump! Good idea. Yes offset needs to be signed for for/while to work.

Hmmm, if it's signed when we can only jump 8KiB relative if we use 24-bits, but OP_WRITE_RAW could contain as much as 16KiB (unsigned 24-bits). Should we fall back to using 32-bits for the jump? Or maybe a different instruction for jumping forward vs. backwards?

I don't think I've ever seen a specialized negative jump instruction. But that does sound a lot simpler. 👍

dylanahsmith · 2020-10-16T17:07:43Z

Have you tried using a shorter OP_WRITE_RAW instruction to fix the performance regression? How does that affect the benchmark results?

peterzhu2118 · 2020-10-16T18:40:12Z

@dylanahsmith I just changed OP_WRITE_RAW to be 1 byte and added a OP_WRITE_RAW_W that is 3 bytes. I think it's reduced the performance impact.

This branch after the change:

              parse:    160.329  (± 6.2%) i/s -      1.605k in  10.049573s
             render:    232.030  (± 7.3%) i/s -      2.318k in  10.050578s
     parse & render:     85.336  (± 5.9%) i/s -    856.000  in  10.070809s

This branch before the change:

              parse:    156.611  (± 6.4%) i/s -      1.560k in  10.006603s
             render:    226.787  (± 7.5%) i/s -      2.266k in  10.054463s
     parse & render:     84.311  (± 7.1%) i/s -    840.000  in  10.008604s

Master:

              parse:    161.068  (± 6.2%) i/s -      1.605k in  10.000922s
             render:    235.662  (± 6.8%) i/s -      2.346k in  10.003183s
     parse & render:     86.112  (± 5.8%) i/s -    864.000  in  10.075406s

dylanahsmith

The implementation looks good. The performance impact also seems acceptable now. However, I think we now need unit tests to cover the OP_WRITE_RAW_W code paths, which will require raw strings in the template with a length of at least 256 bytes.

macournoyer · 2020-10-16T20:18:00Z

Rendering time seems almost unaffected w/ the wide instruction. I think we should be "ignoring" the parse speed when benchmarking since the goal is to cache the serialized template.

Moving to wide instructions is effectively moving an operation from rendering to the parsing phase. So parse time is affected. But that will be gone when compiled templates are serialized. So we should be moving as much as possible to the parsing phase if it improves rendering time.

Implement OP_WRITE_RAW on the instructions buffer (cherry picked from commit ac408fe)

peterzhu2118 requested a review from dylanahsmith October 14, 2020 14:28

dylanahsmith requested a review from macournoyer October 14, 2020 19:55

dylanahsmith reviewed Oct 14, 2020

View reviewed changes

ext/liquid_c/vm_assembler.c Show resolved Hide resolved

macournoyer approved these changes Oct 15, 2020

View reviewed changes

peterzhu2118 force-pushed the pz-immediate-raw branch 3 times, most recently from afae410 to c95ba50 Compare October 15, 2020 18:56

dylanahsmith reviewed Oct 16, 2020

View reviewed changes

peterzhu2118 force-pushed the pz-immediate-raw branch from e85d5eb to eae0382 Compare October 26, 2020 14:41

This was referenced Oct 26, 2020

Write block body to a shared buffer after compilation #102

Merged

Freeze block body after parsing completes Shopify/liquid#1331

Merged

peterzhu2118 force-pushed the pz-immediate-raw branch 3 times, most recently from 933afdb to 56e8d00 Compare October 29, 2020 18:07

dylanahsmith approved these changes Oct 30, 2020

View reviewed changes

peterzhu2118 added 8 commits October 30, 2020 17:29

Implement immediate raw instruction

d8fcda5

Mark all constants directly

0461aa8

Implement OP_JUMP_FWD to jump forward

08840d7

Implement single byte OP_WRITE_RAW instruction

b70e49d

Remove source from block_body_t

1aafc9e

Add test for OP_WRITE_RAW_W

d80c840

Add support for disassemble

95553b1

Add test for disassembly for OP_WRITE_RAW_W

971611f

peterzhu2118 force-pushed the pz-immediate-raw branch from 4b8f0e5 to 971611f Compare October 30, 2020 21:34

peterzhu2118 merged commit ac408fe into master Nov 2, 2020

peterzhu2118 deleted the pz-immediate-raw branch November 2, 2020 14:38

peterzhu2118 mentioned this pull request Nov 2, 2020

Fix flaky test #113

Closed

dylanahsmith pushed a commit that referenced this pull request Feb 11, 2021

Merge pull request #86 from Shopify/pz-immediate-raw

dc15b50

Implement OP_WRITE_RAW on the instructions buffer (cherry picked from commit ac408fe)

dylanahsmith pushed a commit that referenced this pull request Feb 11, 2021

Merge pull request #86 from Shopify/pz-immediate-raw

ef242dd

Implement OP_WRITE_RAW on the instructions buffer (cherry picked from commit ac408fe)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement OP_WRITE_RAW on the instructions buffer #86

Implement OP_WRITE_RAW on the instructions buffer #86

peterzhu2118 commented Oct 14, 2020

dylanahsmith commented Oct 14, 2020

macournoyer left a comment

macournoyer Oct 14, 2020 •

edited

Loading

peterzhu2118 Oct 15, 2020

macournoyer Oct 15, 2020

peterzhu2118 Oct 15, 2020 •

edited

Loading

macournoyer Oct 15, 2020

dylanahsmith commented Oct 16, 2020

peterzhu2118 commented Oct 16, 2020

dylanahsmith left a comment

macournoyer commented Oct 16, 2020

Implement OP_WRITE_RAW on the instructions buffer #86

Implement OP_WRITE_RAW on the instructions buffer #86

Conversation

peterzhu2118 commented Oct 14, 2020

Benchmarks

dylanahsmith commented Oct 14, 2020

macournoyer left a comment

Choose a reason for hiding this comment

macournoyer Oct 14, 2020 • edited Loading

Choose a reason for hiding this comment

peterzhu2118 Oct 15, 2020

Choose a reason for hiding this comment

macournoyer Oct 15, 2020

Choose a reason for hiding this comment

peterzhu2118 Oct 15, 2020 • edited Loading

Choose a reason for hiding this comment

macournoyer Oct 15, 2020

Choose a reason for hiding this comment

dylanahsmith commented Oct 16, 2020

peterzhu2118 commented Oct 16, 2020

dylanahsmith left a comment

Choose a reason for hiding this comment

macournoyer commented Oct 16, 2020

macournoyer Oct 14, 2020 •

edited

Loading

peterzhu2118 Oct 15, 2020 •

edited

Loading