Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add code for analyzing dependencies between syntaxes #1762

Merged
merged 1 commit into from
Aug 7, 2021

Conversation

Enselic
Copy link
Collaborator

@Enselic Enselic commented Jul 30, 2021

This will eventually allow us to improve the startup speed of bat. See #951.

It adds code to analyze dependencies between SyntaxDefinitions. It currently comes up with the following list of independent syntaxes.

One thing I'm curious about is if you think this list, and the code I use to produce it, looks reasonable? Or can you spot any obvious dependencies that it fails to find?

I'm sure we're going to have to tweak the code that does the analysis, but it would be nice to get the basics of it in place so we can iterate on it.

List of independent syntaxes

Independent SyntaxSets:
["Plain Text"]
["ASP", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "HTML (ASP)", "ASP"]
["HTML (ASP)", "ASP", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "HTML (ASP)"]
["ActionScript"]
["AppleScript"]
["Batch File"]
["NAnt Build File"]
["C#"]
["C++", "C"]
["C"]
["CSS"]
["Clojure", "Regular Expression"]
["D"]
["DMD Output"]
["Diff"]
["Erlang"]
["HTML (Erlang)", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "Erlang"]
["Git Attributes", "Git Common", "Git Config", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Git Commit", "Git Common", "Git Config", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Git Common", "Git Config", "Git Common", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Git Config", "Git Common", "Git Config", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Git Ignore", "Git Common", "Git Config", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Git Link", "Git Common", "Git Config", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Git Log", "Git Common", "Git Config", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "Git Commit"]
["Git Mailmap", "Git Common", "Git Config", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Git Rebase Todo", "Git Common", "Git Config", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Go"]
["Graphviz (DOT)", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS"]
["Groovy", "Javadoc", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS"]
["HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS"]
["Haskell"]
["Literate Haskell", "LaTeX", "TeX", "YAML", "XML", "SQL", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "Ruby", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "HTML", "R", "Python", "Regular Expressions (Python)", "PHP Source", "Regular Expressions (PHP)", "JSON", "Perl", "Regular Expression", "Lua", "Lisp", "Java", "Javadoc", "Haskell", "C++", "C", "Objective-C++", "Objective-C", "Go", "Diff"]
["JSON"]
["Java Server Page (JSP)", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "Java", "Javadoc", "Java Server Page (JSP)"]
["Java", "Javadoc", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS"]
["Javadoc", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS"]
["Java Properties"]
["JavaScript", "Regular Expressions (Javascript)"]
["Regular Expressions (Javascript)"]
["BibTeX"]
["LaTeX Log"]
["LaTeX", "TeX", "YAML", "XML", "SQL", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "Ruby", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "HTML", "R", "Python", "Regular Expressions (Python)", "PHP Source", "Regular Expressions (PHP)", "JSON", "Perl", "Regular Expression", "Lua", "Lisp", "LaTeX", "Java", "Javadoc", "Haskell", "C++", "C", "Objective-C++", "Objective-C", "Go", "Diff"]
["TeX"]
["Lisp"]
["Lua"]
["Make Output"]
["Makefile", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Markdown", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "Regular Expression", "C++", "C", "Objective-C++", "Objective-C", "Ruby", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "SQL", "Go", "R", "PHP", "PHP Source", "Regular Expressions (PHP)", "JSON", "XML", "Rust", "C#", "Java", "Javadoc", "Graphviz (DOT)", "Python", "Regular Expressions (Python)"]
["MultiMarkdown", "Markdown", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "Regular Expression", "C++", "C", "Objective-C++", "Objective-C", "Ruby", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "SQL", "Go", "R", "PHP", "PHP Source", "Regular Expressions (PHP)", "JSON", "XML", "Rust", "C#", "Java", "Javadoc", "Graphviz (DOT)", "Python", "Regular Expressions (Python)"]
["MATLAB"]
["OCaml", "camlp4", "OCaml"]
["OCamllex", "OCaml", "camlp4"]
["OCamlyacc", "OCaml", "camlp4"]
["camlp4", "OCaml"]
["Objective-C++", "C", "C++", "Objective-C"]
["Objective-C", "C"]
["PHP Source", "Regular Expressions (PHP)", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "JSON", "SQL", "XML"]
["PHP", "PHP Source", "Regular Expressions (PHP)", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "JSON", "SQL", "XML"]
["Regular Expressions (PHP)"]
["Pascal"]
["Perl", "XML", "Regular Expression", "JSON", "SQL", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "HTML", "CSS"]
["Python", "Regular Expressions (Python)", "SQL"]
["Regular Expressions (Python)"]
["R Console", "R"]
["R"]
["Rd (R Documentation)", "R", "LaTeX", "TeX", "YAML", "XML", "SQL", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "Ruby", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "HTML", "Python", "Regular Expressions (Python)", "PHP Source", "Regular Expressions (PHP)", "JSON", "Perl", "Regular Expression", "Lua", "Lisp", "Java", "Javadoc", "Haskell", "C++", "C", "Objective-C++", "Objective-C", "Go", "Diff"]
["HTML (Rails)", "Ruby on Rails", "Ruby", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "SQL", "HTML"]
["JavaScript (Rails)", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "Ruby on Rails", "Ruby", "CSS", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "SQL", "HTML"]
["Ruby Haml", "Ruby on Rails", "Ruby", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "SQL", "HTML", "Ruby Haml"]
["Ruby on Rails", "Ruby", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "SQL", "HTML"]
["SQL (Rails)", "Ruby on Rails", "Ruby", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "SQL", "HTML"]
["Regular Expression"]
["reStructuredText", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS"]
["Ruby", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "SQL", "HTML"]
["Cargo Build Results"]
["Rust"]
["SQL"]
["Scala"]
["Bourne Again Shell (bash)", "commands-builtin-shell-bash", "Bourne Again Shell (bash)"]
["Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["commands-builtin-shell-bash", "Bourne Again Shell (bash)"]
["HTML (Tcl)", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "Tcl"]
["Tcl"]
["Textile", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS"]
["XML"]
["YAML"]
["AWK", "Regular Expression"]
["Apache Conf"]
["AsciiDoc (Asciidoctor)", "XML"]
["ARM Assembly"]
["Assembly (x86_64)"]
["CMake C Header", "C"]
["CMake C++ Header", "C++", "C"]
["CMake", "CMakeCommands", "CMake", "Regular Expression"]
["CMakeCache"]
["CMakeCommands", "CMake", "CMakeCommands", "Regular Expression"]
["Comma Separated Values"]
["Cabal"]
["CoffeeScript"]
["CpuInfo"]
["Crystal", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "Crystal", "C++", "C", "SQL", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Dart Analysis Output"]
["Dart"]
["Dockerfile"]
["DotENV"]
["Elixir", "Regular Expressions (Elixir)"]
["HTML (EEx)", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "Elixir", "Regular Expressions (Elixir)"]
["Regular Expressions (Elixir)"]
["Elm Compile Messages", "Elm", "GLSL"]
["Elm Documentation", "Elm", "GLSL"]
["Elm", "GLSL"]
["Email", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "XML"]
["F#"]
["Friendly Interactive Shell (fish)"]
["Fortran (Fixed Form)"]
["Fortran (Modern)"]
["Fortran Namelist"]
["GFortran Build Results"]
["OpenMP (Fortran)"]
["fstab"]
["GLSL"]
["GraphQL"]
["Man Page (groff/troff)"]
["group"]
["HTML (Twig)", "Ruby", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "SQL", "HTML", "Python", "Regular Expressions (Python)"]
["hosts"]
["INI"]
["JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)"]
["HTML (Jinja2)", "HTML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "Jinja2"]
["Jinja2"]
["jsonnet"]
["Julia", "Regular Expressions (Python)"]
["Kotlin"]
["Less"]
["Lean"]
["Manpage", "C"]
["MemInfo"]
["nginx", "Lua"]
["Nim", "Nim"]
["Ninja"]
["Nix"]
["orgmode", "Ruby", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "SQL", "HTML", "Python", "Regular Expressions (Python)", "Lisp", "LaTeX", "TeX", "YAML", "XML", "R", "PHP Source", "Regular Expressions (PHP)", "JSON", "Perl", "Regular Expression", "Lua", "Java", "Javadoc", "Haskell", "C++", "C", "Objective-C++", "Objective-C", "Go", "Diff"]
["passwd"]
["PowerShell"]
["Protocol Buffer", "Protocol Buffer (TEXT)"]
["Protocol Buffer (TEXT)"]
["Puppet"]
["PureScript", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)"]
["QML", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)"]
["Rego"]
["resolv"]
["Robot Framework"]
["SCSS", "Sass", "YAML"]
["Sass", "YAML"]
["Salt State (SLS)", "YAML", "Jinja2"]
["SML"]
["Strace"]
["Stylus"]
["Solidity"]
["Vyper"]
["Svelte", "CSS", "Stylus", "CoffeeScript", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "TypeScript", "JavaScript", "Less", "Sass", "YAML", "SCSS"]
["Swift", "Swift"]
["SystemVerilog"]
["Navigational Bar SV"]
["TOML"]
["JSON (Terraform)", "JSON"]
["Terraform"]
["TypeScript"]
["TypeScriptReact"]
["Verilog"]
["VimL"]
["Vue Component", "Stylus", "SCSS", "Sass", "YAML", "TypeScript", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CoffeeScript", "CSS", "Less"]
["Zig"]
["gnuplot"]
["HTTP Request and Response", "JavaScript (Babel)", "GraphQL", "Regular Expressions (Javascript)", "CSS", "Plain Text", "JSON", "HTML", "XML"]
["log"]
["requirements.txt"]
["Highlight non-printables"]
["Authorized Keys", "SSH Common", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "SSH Crypto"]
["Known Hosts", "SSH Crypto", "SSH Common"]
["Private Key"]
["SSH Common"]
["SSH Config", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "SSH Common", "SSH Crypto"]
["SSH Crypto", "SSH Common"]
["SSHD Config", "SSH Common", "SSH Crypto", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["syslog", "log", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["varlink"]
\```

@keith-hall
Copy link
Collaborator

Some syntaxes (I'm specifically thinking of Git Common) may be marked as hidden and thus perhaps shouldn't need an independent syntax set. I noticed that some syntaxes also list themselves as a dependency. Also I'm not entirely sure I understand what the end result will be - literally a syntax set for every "root" syntax as shown in the list? In some cases it may make sense to group them, especially if they only have one dependency difference etc. - again I'm thinking of all the Git related syntaxes here (and Markdown + MultiMarkdown) - they could be bundled into one syntax set, unless its not worth it and the time saved loading one or two syntaxes will outweigh the deduplication?

I'm not seeing any obvious dependencies missing, so that"s good 👍

@Enselic Enselic force-pushed the syntax-dependency-analysis branch 2 times, most recently from 9d4e6a9 to a8f1409 Compare August 1, 2021 17:33
@Enselic
Copy link
Collaborator Author

Enselic commented Aug 1, 2021

Thanks a lot for taking a look. That was very valuable input, especially considering it comes from such a prominent syntax expert as yourself :)

I have now updated the commit with the following changes, based on your input:

  • Skip generating independent syntax sets with a hidden syntax as a base. Hidden syntax are still pulled in via a dependency though, of course.
  • Fix the bug where the same syntax was twice in the same SyntaxSet.
  • Clean up the code a lot and add some of clarifying docs.

Also I'm not entirely sure I understand what the end result will be - literally a syntax set for every "root" syntax as shown in the list?

Yes. But in the latest code I skip the hidden syntaxes.

It is a good point that we might not need both the SyntaxSets ["C"] and ["C++", "C"], for example, since the latter can be used to highlight pure C. On the other hand, as you point out, loading only C for C is going to be faster than also loading C++. But that will result in a larger binary... So we will have to experiment a bit before we know how to handle that.

But, as outlined in my plan, I am aiming for an MVP where we only improve startup time when highlighting source files without dependencies. And there are quite many (expand list below to see).

So it is fine to merge the code as is in that regard. Then we can investigate and experiment with the right way forward for syntaxes with dependencies at a later point.

Anyway, I consider this code to be ready for a "real" code review. We are still blocked on a new release of syntect, but I think there is no need to wait on doing a proper code review because of that. (We need a new release since we use the new SyntaxSetBuilder::syntaxes() method that only currently exist in syntect git master.)

Here is what the new full list of independent syntax sets looks like:

Independent syntaxes
["ActionScript"]
["Apache Conf"]
["AppleScript"]
["ARM Assembly"]
["AsciiDoc (Asciidoctor)", "XML"]
["ASP", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "HTML (ASP)"]
["Assembly (x86_64)"]
["Authorized Keys", "SSH Common", "SSH Crypto", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["AWK", "Regular Expression"]
["Batch File"]
["BibTeX"]
["Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["C"]
["C#"]
["C++", "C"]
["Cabal"]
["camlp4", "OCaml"]
["Clojure", "Regular Expression"]
["CMake C Header", "C"]
["CMake C++ Header", "C++", "C"]
["CMake", "Regular Expression", "CMakeCommands"]
["CMakeCache"]
["CoffeeScript"]
["Comma Separated Values"]
["CpuInfo"]
["Crystal", "SQL", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "C++", "C"]
["CSS"]
["D"]
["Dart Analysis Output"]
["Dart"]
["Diff"]
["Dockerfile"]
["DotENV"]
["Elixir", "Regular Expressions (Elixir)"]
["Elm", "GLSL"]
["Email", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "XML"]
["Erlang"]
["F#"]
["Fortran (Fixed Form)"]
["Fortran (Modern)"]
["Fortran Namelist"]
["Friendly Interactive Shell (fish)"]
["fstab"]
["Git Attributes", "Git Common", "Git Config", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Git Commit", "Git Common", "Git Config", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Git Config", "Git Common", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Git Ignore", "Git Common", "Git Config", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Git Link", "Git Common", "Git Config", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Git Log", "Git Commit", "Git Common", "Git Config", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Git Mailmap", "Git Common", "Git Config", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Git Rebase Todo", "Git Common", "Git Config", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["GLSL"]
["gnuplot"]
["Go"]
["GraphQL"]
["Graphviz (DOT)", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS"]
["Groovy", "Javadoc", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS"]
["group"]
["Haskell"]
["Highlight non-printables"]
["hosts"]
["HTML (ASP)", "ASP", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS"]
["HTML (EEx)", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "Elixir", "Regular Expressions (Elixir)"]
["HTML (Erlang)", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "Erlang"]
["HTML (Jinja2)", "Jinja2", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS"]
["HTML (Rails)", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "Ruby on Rails", "Ruby", "SQL", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["HTML (Tcl)", "Tcl", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS"]
["HTML (Twig)", "Ruby", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "SQL", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "Python", "Regular Expressions (Python)"]
["HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS"]
["HTTP Request and Response", "JSON", "XML", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "Plain Text"]
["INI"]
["Java Properties"]
["Java Server Page (JSP)", "Java", "Javadoc", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS"]
["Java", "Javadoc", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS"]
["JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL"]
["JavaScript (Rails)", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "Ruby on Rails", "Ruby", "HTML", "CSS", "SQL", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["JavaScript", "Regular Expressions (Javascript)"]
["Jinja2"]
["JSON"]
["jsonnet"]
["Julia", "Regular Expressions (Python)"]
["Known Hosts", "SSH Common", "SSH Crypto"]
["Kotlin"]
["LaTeX Log"]
["LaTeX", "TeX", "C", "C++", "Diff", "Go", "Haskell", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "Java", "Javadoc", "JSON", "Lisp", "Lua", "Objective-C", "Objective-C++", "Perl", "Regular Expression", "SQL", "XML", "PHP Source", "Regular Expressions (PHP)", "Python", "Regular Expressions (Python)", "R", "Ruby", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "YAML"]
["Lean"]
["Less"]
["Lisp"]
["Literate Haskell", "Haskell", "LaTeX", "TeX", "C", "C++", "Diff", "Go", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "Java", "Javadoc", "JSON", "Lisp", "Lua", "Objective-C", "Objective-C++", "Perl", "Regular Expression", "SQL", "XML", "PHP Source", "Regular Expressions (PHP)", "Python", "Regular Expressions (Python)", "R", "Ruby", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "YAML"]
["log"]
["Lua"]
["Makefile", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Man Page (groff/troff)"]
["Manpage", "C"]
["Markdown", "XML", "SQL", "Python", "Regular Expressions (Python)", "Graphviz (DOT)", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "JSON", "Java", "Javadoc", "C#", "Rust", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "PHP Source", "Regular Expressions (PHP)", "PHP", "R", "Go", "Ruby", "Shell-Unix-Generic", "Objective-C", "C", "Objective-C++", "C++", "Regular Expression"]
["MATLAB"]
["MemInfo"]
["MultiMarkdown", "Markdown", "XML", "SQL", "Python", "Regular Expressions (Python)", "Graphviz (DOT)", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "JSON", "Java", "Javadoc", "C#", "Rust", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "PHP Source", "Regular Expressions (PHP)", "PHP", "R", "Go", "Ruby", "Shell-Unix-Generic", "Objective-C", "C", "Objective-C++", "C++", "Regular Expression"]
["NAnt Build File"]
["nginx", "Lua"]
["Nim"]
["Ninja"]
["Nix"]
["Objective-C", "C"]
["Objective-C++", "C++", "C", "Objective-C"]
["OCaml", "camlp4"]
["OCamllex", "OCaml", "camlp4"]
["OCamlyacc", "OCaml", "camlp4"]
["orgmode", "Lisp", "LaTeX", "TeX", "C", "C++", "Diff", "Go", "Haskell", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "Java", "Javadoc", "JSON", "Lua", "Objective-C", "Objective-C++", "Perl", "Regular Expression", "SQL", "XML", "PHP Source", "Regular Expressions (PHP)", "Python", "Regular Expressions (Python)", "R", "Ruby", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "YAML"]
["Pascal"]
["passwd"]
["Perl", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "Regular Expression", "CSS", "SQL", "XML", "HTML", "JSON"]
["PHP", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "PHP Source", "SQL", "XML", "JSON", "Regular Expressions (PHP)"]
["Plain Text"]
["PowerShell"]
["Private Key"]
["Protocol Buffer (TEXT)"]
["Protocol Buffer", "Protocol Buffer (TEXT)"]
["Puppet"]
["PureScript", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL"]
["Python", "SQL", "Regular Expressions (Python)"]
["QML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL"]
["R Console", "R"]
["R"]
["Rd (R Documentation)", "LaTeX", "TeX", "C", "C++", "Diff", "Go", "Haskell", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "Java", "Javadoc", "JSON", "Lisp", "Lua", "Objective-C", "Objective-C++", "Perl", "Regular Expression", "SQL", "XML", "PHP Source", "Regular Expressions (PHP)", "Python", "Regular Expressions (Python)", "R", "Ruby", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "YAML"]
["Rego"]
["Regular Expression"]
["requirements.txt"]
["resolv"]
["reStructuredText", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS"]
["Robot Framework"]
["Ruby Haml", "Ruby on Rails", "Ruby", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "SQL", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Ruby on Rails", "Ruby", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "SQL", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Ruby", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "SQL", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Rust"]
["Salt State (SLS)", "Jinja2", "YAML"]
["Sass", "YAML"]
["Scala"]
["SCSS", "Sass", "YAML"]
["SML"]
["Solidity"]
["SQL (Rails)", "Ruby on Rails", "Ruby", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS", "SQL", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["SQL"]
["SSH Config", "SSH Common", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "SSH Crypto"]
["SSHD Config", "SSH Common", "SSH Crypto", "Shell-Unix-Generic", "Bourne Again Shell (bash)", "commands-builtin-shell-bash"]
["Strace"]
["Stylus"]
["Svelte", "CSS", "SCSS", "Sass", "YAML", "Stylus", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "Less", "JavaScript", "CoffeeScript", "TypeScript"]
["Swift"]
["syslog", "Bourne Again Shell (bash)", "commands-builtin-shell-bash", "log"]
["SystemVerilog"]
["Tcl"]
["Terraform"]
["TeX"]
["Textile", "HTML", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "CSS"]
["TOML"]
["TypeScript"]
["TypeScriptReact"]
["varlink"]
["Verilog"]
["VimL"]
["Vue Component", "JavaScript (Babel)", "Regular Expressions (Javascript)", "GraphQL", "SCSS", "Sass", "YAML", "CoffeeScript", "CSS", "TypeScript", "Stylus", "Less"]
["Vyper"]
["XML"]
["YAML"]
["Zig"]

@Enselic Enselic changed the title Draft: Analyze dependencies between SyntaxDefinitions, and print result Add code for analyzing dependencies between syntaxes Aug 1, 2021
src/assets.rs Outdated
Comment on lines 115 to 118
if false {
// To trigger this code, run:
// cargo run -- cache --build --source assets --blank --target /tmp
crate::syntax_analysis::print_syntax_dependencies(&syntax_set_builder);
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to merge this as-is? (with the if false?)

Other ideas:

  • hide behind a cargo-feature (so we can trigger it with something like cargo run --features print_syntax_dependencies -- cache --build …)
  • enable it with a environment variable if std::env::var("BAT_PRINT_SYNTAX_DEPENDENCIES").is_ok() { …. Contrary to the other two options, this would always include the necessary code in the binary. The env-var check wouldn't hurt at this place though.

Copy link
Collaborator Author

@Enselic Enselic Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed code review!

I considered adding a cargo-feature for it, but since those are practically part of the formal public API, and since the if false is a temporary thing, I opted out of it.

I do however really like the idea of triggering the code with an env var. Will fix!

src/lib.rs Outdated
@@ -40,6 +40,7 @@ mod preprocessor;
mod pretty_printer;
pub(crate) mod printer;
pub mod style;
mod syntax_analysis;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshedding, but maybe use another name here? It sounds a bit like we are actually performing syntax analysis (like on an AST after parsing) here. Maybe syntax_set_analysis? Or syntax_dependencies.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with syntax_dependencies 👍

/// Used to look up (by name) what dependencies a given [SyntaxDefinition] has
type SyntaxToDependencies = HashMap<String, Vec<Dependency>>;

/// Used to look up what [SyntaxDefinition] that corresponds to a given [Dependency]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Used to look up what [SyntaxDefinition] that corresponds to a given [Dependency]
/// Used to look up which [SyntaxDefinition] corresponds to a given [Dependency]

(if I'm interpreting the sentence correctly)

Comment on lines 7 to 8
/// Used to look up (by name) what dependencies a given [SyntaxDefinition] has
type SyntaxToDependencies = HashMap<String, Vec<Dependency>>;
Copy link
Owner

@sharkdp sharkdp Aug 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with the doc comment. An alternative route that I also like, is to use typedefs to express the intent:

type SyntaxName = String;
type SyntaxToDependencies = HashMap<SyntaxName, Vec<Dependency>>;

which might not need a doc comment at all.

Comment on lines 23 to 24
/// Change to true to make syntax dependency analysis print more details of what it is seeing
const VERBOSE: bool = false;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would prefer a runtime switch here (a verbose: bool member in SyntaxSetDependencyBuilder).

Or to get rid of it completely by either removing the verbose=true path or by always being verbose. But only if the verbose-mode output is still readable (haven't tested it yet)

Copy link
Collaborator Author

@Enselic Enselic Aug 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that the "unlinked context" code from Keith prints mostly the same thing as my verbose outputt. But in a nicer format, because my output is repeated a lot. So I'll just remove it from my code. And from what I understand, when we differ, it is because it is a non-fatal dependency that is missing.

For the record, my code warned about these extra missing dependencies that Keith code did not:

WARNING: No syntax found ByName("OpenMP")
WARNING: No syntax found ByScope(<source.js.css>)
WARNING: No syntax found ByScope(<source.lean.markdown>)
WARNING: No syntax found ByScope(<text.dart-doccomments>)
WARNING: No syntax found ByScope(<text.html.php>)

I'll unconditionally keep

eprintln!("WARNING: Unknown dependencies for {}", name);

though, because that shall never happen. I'll change it into an ERROR: ....

.syntaxes()
.iter()
.map(|syntax| &syntax.name)
.collect::<Vec<&String>>();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For .collect, it's often enough to specify the container and let the inner type be inferred:

  .collect::<Vec<_>>();

src/syntax_analysis.rs Outdated Show resolved Hide resolved
src/syntax_analysis.rs Outdated Show resolved Hide resolved
src/syntax_analysis.rs Outdated Show resolved Hide resolved
Comment on lines 95 to 96
.map(|context| &context.patterns)
.flatten()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use flat_map here and below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. That's a very nice simplification.

Copy link
Owner

@sharkdp sharkdp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a pleasure to review this, thank you very much! A module with non-trivial functionality structured in a way that makes it easy to understand and read 👍

src/syntax_analysis.rs Outdated Show resolved Hide resolved
src/syntax_analysis.rs Outdated Show resolved Hide resolved
@Enselic Enselic force-pushed the syntax-dependency-analysis branch 2 times, most recently from 082408c to a3b178c Compare August 3, 2021 07:37
@Enselic
Copy link
Collaborator Author

Enselic commented Aug 3, 2021

Again, big thanks for the detailed code review 🙏

All comments should be addressed now, so feel free to take a second look when you get time.

@Enselic Enselic requested a review from sharkdp August 3, 2021 07:59
@Enselic
Copy link
Collaborator Author

Enselic commented Aug 5, 2021

(New commit(s) is/was just a rebase on top of origin/master, mainly for the cargo fmt commit.)

And also to generate independent SyntaxSets. This will later be used
to improve bat startup time.
@Enselic Enselic merged commit 47d955a into sharkdp:master Aug 7, 2021
@Enselic Enselic deleted the syntax-dependency-analysis branch August 20, 2021 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants