Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

included contexts should still be able to pop using backrefs #104

Closed
keith-hall opened this issue Sep 6, 2017 · 12 comments · Fixed by #288
Closed

included contexts should still be able to pop using backrefs #104

keith-hall opened this issue Sep 6, 2017 · 12 comments · Fixed by #288

Comments

@keith-hall
Copy link
Collaborator

keith-hall commented Sep 6, 2017

Now that #101 is fixed, I was able to get the ASP syntax tests to pass by changing the syntax definition to not include a with_prototype conflict - namely by changing [1] and [2] from scope:text.html.asp#html to scope:text.html.basic:

loading syntax definitions from testdata/Packages/
Testing file testdata/Packages/ASP/syntax_test_asp.asp
The test file references syntax definition file: Packages/ASP/HTML-ASP.sublime-syntax
Ok(Success(4071))
Testing file testdata/Packages/ASP/syntax_test_asp2.asp2
The test file references syntax definition file: Packages/ASP/HTML-ASP2.sublime-syntax
Ok(Success(16))
exiting with code 0

Emboldened by that, I decided to try some more syntaxes out (SublimeText/PackageDev#98), but found a bug. In ST, it is always possible for a match pattern that will pop to use back references captured from the original push pattern, regardless of whether that pattern is a direct child of the context that was pushed, or was included from it at any nesting. In syntect, it only seems to allow back references that refer to the original push pattern if the pop pattern is a direct child of the context that was pushed onto the stack.

In syntect, an included context trying to access the back references currently causes a panic (please expand the arrow to the left of this paragraph for the backtrace).
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/option.rs:323
stack backtrace:
   1:       0x2d4be997fa - std::sys::imp::backtrace::tracing::imp::write::h3188f035833a2635
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:42
   2:       0x2d4be9c4df - std::panicking::default_hook::{{closure}}::h6385b6959a2dd25b
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:349
   3:       0x2d4be9c0de - std::panicking::default_hook::he4f3b61755d7fa95
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:365
   4:       0x2d4be9c927 - std::panicking::rust_panic_with_hook::hf00b8130f73095ec
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:553
   5:       0x2d4be9c764 - std::panicking::begin_panic::h6227f62cb2cdaeb4
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:515
   6:       0x2d4be9c6d9 - std::panicking::begin_panic_fmt::h173eadd80ae64bec
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:499
   7:       0x2d4be9c667 - rust_begin_unwind
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:475
   8:       0x2d4bec898d - core::panicking::panic_fmt::h3b2d1e30090844ff
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/panicking.rs:69
   9:       0x2d4bec88c4 - core::panicking::panic::h2596388ccef1871c
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/panicking.rs:49
  10:       0x2d4bc27856 - <core::option::Option<T>>::unwrap::hcbcf6267c1bc19a7
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/macros.rs:21
  11:       0x2d4bca0728 - syntect::parsing::parser::ParseState::parse_next_token::h0647d274e681b640
                        at /home/keith/Documents/syntect-master/src/parsing/parser.rs:201
  12:       0x2d4bc9f74a - syntect::parsing::parser::ParseState::parse_line::h89d6548f1cea9c1e
                        at /home/keith/Documents/syntect-master/src/parsing/parser.rs:103
  13:       0x2d4bbff152 - syntest::test_file::hf2ec9e02ad06e0c2
                        at /home/keith/Documents/syntect-master/examples/syntest.rs:191
  14:       0x2d4bc00314 - syntest::recursive_walk::he846690fb982efe6
                        at /home/keith/Documents/syntect-master/examples/syntest.rs:270
  15:       0x2d4bbffde8 - syntest::main::h78e5d2a4f07ea011
                        at /home/keith/Documents/syntect-master/examples/syntest.rs:256
  16:       0x2d4bea37ba - __rust_maybe_catch_panic
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libpanic_unwind/lib.rs:98
  17:       0x2d4be9d066 - std::rt::lang_start::h65647f6e36cffdae
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panicking.rs:434
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/panic.rs:351
                        at /buildslave/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libstd/rt.rs:57
  18:       0x2d4bc00bf2 - main
  19:     0x7fd5e385782f - __libc_start_main
  20:       0x2d4bbd8ad8 - _start
  21:                0x0 - <unknown>

I've managed to put a small example syntax together that exhibits this behavior:

%YAML 1.2
---
name: Include Backref Pop Test
scope: text.includebackrefpop.syntect
contexts:
  main:
    - match: '(<)(string)\s*(>)'
      scope: meta.tag.xml
      captures:
        1: punctuation.definition.tag.begin.xml
        2: entity.name.tag.localname.xml
        3: punctuation.definition.tag.end.xml
      set:
        - meta_content_scope: meta.inside-value.string.plist
        - include: pop-tag-no-children

  pop-tag-no-children:
    - match: '(</)(\2)\s*(>)'
      scope: meta.tag.xml
      captures:
        1: punctuation.definition.tag.begin.xml
        2: entity.name.tag.localname.xml
        3: punctuation.definition.tag.end.xml
      pop: true
    # - match: '<(?!!)'
    #   scope: punctuation.definition.tag.begin.xml
    #   push: unknown-tag
    #- include: scope:text.xml

  # unknown-tag:
  #   - meta_scope: meta.tag.xml
  #   - meta_content_scope: invalid.illegal.unknown-or-unexpected-tag.plist
  #   - match: '>'
  #     scope: punctuation.definition.tag.end.xml
  #     pop: true

which can be used with the following syntax test:

# SYNTAX TEST "include_backref_pop.sublime-syntax"
<string>
</string>
# ^^^^^^ entity.name.tag.localname.xml
@trishume
Copy link
Owner

trishume commented Sep 6, 2017

I was able to get the ASP syntax tests to pass by changing the syntax definition to not include a with_prototype conflict

What do you mean by "with_prototype conflict"? Do you know what differing behaviour is avoided by this change? I aim for compatibility even in edge cases and would like to at least know, in case it is an easy fix.

it is always possible for a match pattern that will pop to use back references captured from the original push pattern

shiiiiiit. If I remember correctly, I look for pops with back-references as an optimization so that I don't have to save the regions when that isn't necessary. I'll probably have to fix this to look deeper using the normal context iteration logic. I'm not sure how tricky this will be but it might be tricky.

@keith-hall
Copy link
Collaborator Author

Do you know what differing behaviour is avoided by this change?

I've made some tests and it seems that in ST, with_prototypes are applied in FIFO order, but syntect appears to be applying them in LIFO order. This is why the unmodified ASP syntax definition works in ST, but in syntect the match rules from the wrong with_prototype are used instead of [1] and [2], and thus explains why removing the nested with_prototypes enables it to work in syntect.

This can be observed in ST with the following syntax:

%YAML 1.2
---
# See http://www.sublimetext.com/docs/3/syntax.html
scope: test.with_prototype
contexts:
  main:
    - match: 'hello'
      scope: string
      push: other1
      with_prototype:
        - match: 'other1?'
          scope: comment
        - match: '\d+'
          scope: constant.numeric
    - match: 'bar'
      scope: punctuation.separator
  other1:
    - match: 'world'
      scope: string
      push: other2
      with_prototype:
        - match: '(an)?other2?'
          scope: storage.type
        - match: '\d+'
          scope: keyword
        - match: 'something_else'
          scope: entity.name.prototypes
    - match: '(?=bar)'
      pop: true
  other2:
    - match: 'foo'
      scope: invalid.deprecated
      pop: true

the idea behind this example syntax is that it should push two contexts - both using a different with_prototype - onto the stack at the same time, and both with_prototypes try to match the same text, and scope them differently so we can see which with_prototype wins.

So if we have sample text hello other 123 world other2 123 something_else another2 foo in ST:

  • none of the numbers are scoped as keyword.
  • other in other2 is scoped as comment
  • something_else is scoped as expected
  • another2 is scoped as storage.type

This shows that the patterns from the first pushed with_prototype (pushed with the other1 context) are taking precedence over the patterns from the second with_prototype (which was pushed together with the other2 context), and the only way the patterns from the latter with_prototype get matched is if the former with_prototype patterns don't match at that position in the document.

@trishume
Copy link
Owner

trishume commented Sep 8, 2017

@keith-hall I thought I fixed that discrepancy in #103 with d0b253c. Which version are you testing with?

@keith-hall
Copy link
Collaborator Author

I'm testing with the latest version, 1.7.2. I created a syntax test file for my example above and tested and it does indeed pass in syntect, so maybe I'm wrong about the problem. (Sorry, should have checked that earlier but didn't get chance.)
# SYNTAX TEST "with_proto.sublime-syntax"
hello other 123 world other2 123 something_else another2 foo
# ^^^ string
#     ^^^^^ comment
#           ^^^ constant.numeric
#               ^^^^^ string
#                     ^^^^^ comment
#                          ^ constant.numeric
#                            ^^^ constant.numeric
#                                ^^^^^^^^^^^^^^ entity.name.prototypes
#                                               ^^^^^^^^ storage.type
#                                                        ^^^ invalid.deprecated
loading syntax definitions from testdata/Packages/
Testing file testdata/Packages/with_proto_test/syntax_test_with_proto.txt
The test file references syntax definition file: with_proto.sublime-syntax
Ok(Success(50))
exiting with code 0

But I'm confused because the failures for ASP show that it is scoping <% as punctuation.section.embedded.begin.asp when the test is expecting punctuation.section.embedded.begin.inside-block.asp, and removing the nested with_prototype from the syntax definition fixes it. This indicates to me that it is a problem with the order in which the nested with_prototypes are being matched.

@keith-hall
Copy link
Collaborator Author

The issue of popping with backrefs also applies to with_prototype patterns - the captures from the match at the time the with_prototype was pushed should apply to the patterns in the with_prototype.

trishume added a commit that referenced this issue Dec 9, 2017
Fails at runtime due to RefCell already mutably borrowed errors
@trishume
Copy link
Owner

trishume commented Dec 9, 2017

So I figured out why this happens, but it's tricky to fix, and even trickier to fix properly. Basically the actual regex compiling logic handles this case fine, but the captures don't get saved since the context that only indirectly includes patterns with backrefs isn't marked as needing backrefs.

The proper way to fix this is after linking, walk over the tree and mark contexts as needing backrefs if any context they include needs backrefs, but I don't want to write another tree pass right now.

I attempted a fix in https://github.com/trishume/syntect/tree/fix-104 but it fails with RefCell runtime errors.

Do you know if any syntax actually uses this behaviour?

@keith-hall
Copy link
Collaborator Author

The PackageDev ST package uses it in it's TextMate Preferences syntax definition, but it might not be used in the official ST Packages repo atm.

@keith-hall
Copy link
Collaborator Author

It might be useful to benchmark how much difference it makes whether we just always store the captures vs performing the context uses backref check (i.e. https://github.com/forkeith/syntect/tree/104_include_backref_pop_fix)

@keith-hall
Copy link
Collaborator Author

keith-hall commented Dec 10, 2017

EDIT: sorry, didn't look closely enough, the `heredocs-body` contexts with the pop backref patterns don't look like they are included from other contexts

@Keats
Copy link
Contributor

Keats commented Dec 12, 2017

I ran the benches on https://github.com/forkeith/syntect/tree/104_include_backref_pop_fix and I don't see any significant differences with the current master

@trishume
Copy link
Owner

I ran the new fancy Criterion benchmarks on the change of removing the optimization and there's a substantial performance regression:

highlight/"jquery.js"   time:   [582.92 ms 589.41 ms 600.91 ms]
                        change: [+5.7030% +7.6302% +9.9087%] (p = 0.00 < 0.05)
                        Performance has regressed.
parse/"jquery.js"       time:   [625.20 ms 648.15 ms 664.82 ms]
                        change: [+6.2669% +9.8446% +13.594%] (p = 0.00 < 0.05)
                        Performance has regressed.

And with #167 I note that at least with the current master Packages there's no problems caused by this optimization. So for now I'll leave this open and not remove the optimization, pending the fix I mention in a comment above involving a new tree walk.

@keith-hall
Copy link
Collaborator Author

Now that all the contexts are in a flat arena, I wonder if this would be easier to achieve.

A test case that could be used in the parser (maybe it would also make sense to have a test in the syntax_set, directly checking the uses_backrefs field on the context1 context?):

    #[test]
    fn can_include_backrefs() {
        let syntax = SyntaxDefinition::load_from_str(r#"
                name: Backref Include Test
                scope: source.backrefinc
                contexts:
                  main:
                    - match: (a)
                      scope: a
                      push: context1
                  context1:
                    - include: context2
                  context2:
                    - match: \1
                      scope: b
                      pop: true
                "#, true, None).unwrap();

        expect_scope_stacks_with_syntax(&"aa", &["<a>", "<b>"], syntax);
    }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants