-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Liquid::C::Expression to optimize expression evaluation #60
Conversation
df377cf
to
afae259
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Just some questions.
ext/liquid_c/vm.c
Outdated
break; | ||
case OP_RENDER_VARIABLE_RESCUE: | ||
args->node_line_number = (unsigned int)*const_ptr++; | ||
args->ip = ip; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ip
and const_ptr
are saved because next instruction is assumed to be calling a Ruby function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR just moves this case to keep the rendering instruction handling together, since they shouldn't be used in the Liquid::C::Expression code.
This was added in the last PR (#59) and described as follows
Another design choice I made for this PR is to do error handling for the whole block body render, rather than for each variable render. This way we can reduce the state saving cost of
rb_rescue
from liquid code that doesn't encounter variable render errors. In order to recover from the exception, we just restore the stack size to what it was at the start of the block body render and iterate the instructions to jump just past the variable write.
In order to iterate the instructions to the end of the variable render on an exception, we need to save the instruction position at the start of variable rendering.
I'll add some comments to the code to make this clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll also move code around in the last PR so that the diff is easier to read in this PR.
d7d9ee0
to
900c227
Compare
ea00f8b
to
5a5b34a
Compare
900c227
to
23b8e12
Compare
5a5b34a
to
9d8a3d7
Compare
fe51bc8
to
5dc1ca1
Compare
9d8a3d7
to
239b210
Compare
5dc1ca1
to
5cee268
Compare
239b210
to
eeae3a1
Compare
5cee268
to
56096af
Compare
eeae3a1
to
a2c628d
Compare
a2c628d
to
25be6e4
Compare
56096af
to
1fef4af
Compare
818b8a0
to
6f561d6
Compare
1fef4af
to
2855e4a
Compare
@@ -25,12 +26,23 @@ void vm_assembler_gc_mark(vm_assembler_t *code) | |||
switch (*ip++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we mark the ops that need to be marked and call liquid_vm_next_instruction
in vm.c
to move to the next instruction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to explicitly handle all cases here to make sure we actually mark all ruby constants. Otherwise, it would be easy to accidentally miss a new instruction that has a ruby constant argument, where failing to GC mark that constant could end up being a subtle and hard to debug bug.
Perhaps we could use an assertion to ensure that liquid_vm_next_instruction
moves the instruction and constant pointers consistently with this function. However, that brings the annoyance #ifndef NDEBUG
specific state.
Instead, what I want to do in the near future is get rid of all non-ruby constants from the constants buffer. That will simplify GC marking, serialization and deserialization. However, that would add copying overhead for OP_WRITE_RAW
, which makes sense for serialization & deserialization but is unnecessary overhead without serialization & deserialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get rid of all non-ruby constants from the constants buffer
Where do you plan to put them? It looks like probably interleaved into the code buffer given the implementation of PUSH_INT*, which sounds good to me overall. It does make it more difficult to include medium size non-ruby constants inline. I mentioned in another comment that perhaps we could skip the conversion through RString in some cases, but I also wouldn't want to have strings in the code buffer. (in the OP_LOOKUP_COMMAND we can just use an enum, but I can imagine potentially wanting to skip RString for some drop methods and filter names).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wanted to add them to the instructions buffer if they can be used in their serialized form.
I was planning on using an immediate string argument for OP_WRITE_RAW
, since it seems rare that we would re-use the string and we won't want the deserialization overhead.
For filter, variable and key names, it is actually quite likely that we will re-use the object, so I think a table lookup would make sense.
Longer term, I'm more interested in what is most efficient, where we can re-use code to make things easier.
2e17c30
to
14e0e9a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool stuff!
if (rstring_eq(key, "size") || rstring_eq(key, "first") || rstring_eq(key, "last")) | ||
vm_assembler_add_lookup_command(code, key); | ||
else | ||
vm_assembler_add_lookup_const_key(code, key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like we could eventually avoid RString here entirely, but it's pretty minor.
result = empty_string; | ||
break; | ||
} | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very ugly and fast approach 💯
ext/liquid_c/parser.c
Outdated
rb_enc_raise(utf8_encoding, cLiquidSyntaxError, "[:%s] is not a valid expression", symbol_names[p.cur.type]); | ||
|
||
return expr; | ||
vm_assembler_add_push_nil(code); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there cases that could hit this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above rb_enc_raise
calls are NO_RETURN
, so I guess this is just dead code. I'll remove this to avoid confusion.
} | ||
if (is_command) { | ||
Check_Type(key, T_STRING); | ||
ID intern_key = rb_intern(RSTRING_PTR(key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it seems like we could avoid the conversion through RString here as mentioned in another comment. Definitely not a blocker, tiny optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could optimize commands for this case, but we would still need to pass a string as the key to rb_funcall(object, id_aref, 1, key)
if the object is a hash.
ext/liquid_c/variable_lookup.c
Outdated
Check_Type(key, T_STRING); | ||
ID intern_key = rb_intern(RSTRING_PTR(key)); | ||
if (rb_respond_to(object, intern_key)) { | ||
VALUE next_object = rb_funcall(object, rb_intern(RSTRING_PTR(key)), 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate rb_intern
here (was in the original code, probably my fault)
@@ -25,12 +26,23 @@ void vm_assembler_gc_mark(vm_assembler_t *code) | |||
switch (*ip++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get rid of all non-ruby constants from the constants buffer
Where do you plan to put them? It looks like probably interleaved into the code buffer given the implementation of PUSH_INT*, which sounds good to me overall. It does make it more difficult to include medium size non-ruby constants inline. I mentioned in another comment that perhaps we could skip the conversion through RString in some cases, but I also wouldn't want to have strings in the code buffer. (in the OP_LOOKUP_COMMAND we can just use an enum, but I can imagine potentially wanting to skip RString for some drop methods and filter names).
|
||
def test_find_dynamic_variable | ||
context = Liquid::Context.new({"x" => "y", "y" => 42}) | ||
expr = Liquid::C::Expression.strict_parse('[x]') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought [x]
on its own was supposed to be a syntax error, and that it required an identifier in front. It looks like this will make {{ [x] }}
possible, which is currently disallowed
Liquid::Template.parse("{{ [x] }}", "x" => "y", "y" => 42, error_mode: :strict).render
...
# => Liquid::SyntaxError (Liquid syntax error: [:open_square, "["] is not a valid expression in "{{ [x] }}")
I'm not necessarily opposed to making it allowed, but we should get consensus from @Shopify/guardians-of-the-liquid
and modify this in Ruby too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I was looking at what was possible with Liquid::Expression.parse("[x]")
. I didn't realize we considered a dynamic variable lookup to be a strict parse error. I change it back to being a strict parse error, since I didn't intend to change it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll also note that if you try to parse and render it in lax mode, it returns an empty string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~/src/liquid(master)$ bundle exec irb -Ilib -rliquid
irb> Liquid::Template.parse('{{ [x] }}').render({ 'x' => 'y', 'y' => 42 })
=> "42"
irb> context = Liquid::Context.new({ 'x' => 'y', 'y' => 42 })
irb> expression = Liquid::Expression.parse('[x]')
irb> context.evaluate(expression)
=> 42
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, it was returning an empty string for you since you were passing the variable values as an argument to parse, when they are an argument to render
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, obviously 🤦
Well, maybe we should make this work on the strict parser then. Because it works in prod with the lax fallback.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it looks like this has been the behaviour strict parse behaviour for variables in liquid-c for a long time now. I think since variable parsing was added to liquid-c (see how TOKEN_OPEN_SQUARE
is allowed at the start of parse_variable
in the PR that introduced the function https://github.com/Shopify/liquid-c/pull/13/files#diff-307861feddcb5da5f0af73741d9fdea2R93)
So I'm hesitant to change it. Especially since it could be a breaking change for strictly parsed code. I'm also unsure if it was really the intention of the strict parser to remove any features, so we could consider whether this should be considered a strict parser bug in the liquid gem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, in the end it seems we should add support for this to the Ruby strict parser.
ext/liquid_c/variable.c
Outdated
vm_assembler_add_push_const(code, keyword_args); | ||
|
||
if (const_keyword_args != Qnil) { | ||
rb_hash_freeze(const_keyword_args); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are going to have to always freeze or always dup the hash. Otherwise, it is too easy to test a filter using dynamic keyword arguments and then get errors in production when a frozen hash is passed in for constant keyword arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #88 to remove the constant keyword arguments optimization on master. We can always add it back later.
6e7d0d2
to
6525790
Compare
6525790
to
e5c3c4a
Compare
Depends on #59 and Shopify/liquid#1300Problem
In #59, expressions were compiled to ruby constants or lookup ruby objects (Liquid::VariableLookup & Liquid::RangeLookup) that are evaluated with
context_evaluate
. This was done in that PR for simplicity, but they should really be compiled to more granular VM instructions for rendering performance and for serializability (for future caching of VM compiled templates).Solution
I replaced the previous code for strict parsing expressions and variable lookups to instead emit VM code, which is now compiled directly into Liquid::C::BlockBody objects when compiling variables. Expressions are also often parsed and evaluated independent of Liquid::Variable in tags, so I introduced a
Liquid::C::Expression
object that gets returned fromLiquid::Expression.parse
if the expression includes a variable lookup and strict parsing of it succeeds. I ended up doing both of these in the same commit, since I also leverage theLiquid::C::Expression
object for filter keyword argument parsing to handle automatic cleanup on a parse exception.Note that a couple of direct uses of
Liquid::Variable.new
(in assign and echo tags) no longer leverage liquid-c for parsing, which slightly impacts parse speed. I'll embed those tags in the liquid VM code in a follow-up PR rather than introducing a Liquid::C::Variable class. This PR significantly speeds up rendering, as shown in the benchmarks, which more than makes up for this parsing regression.Since
Liquid::Variable.new
is no longer monkey patched andLiquid::Expression.parse
no longer turns a comparable object, I had to update the liquid-c variable unit tests so they are tested through normal template parsing and rendering like an integration test. I also stopped testing against the liquid gem's variable unit tests for the same reasons.To help with detecting and debugging VM stack underflows or overflows, I added some assertions that are enabled by default for CI and gem development but are disabled by default for gem installation.
Benchmarks
Before this PR (on the #59 branch)
after