Skip to content

Latest commit

 

History

History
135 lines (108 loc) · 3.91 KB

updating-a-grammar.md

File metadata and controls

135 lines (108 loc) · 3.91 KB

How to upgrade the grammar for a language

Let's call our language "X".

Here are the main components:

  • the OCaml code generator ocaml-tree-sitter: generates OCaml parsing code from tree-sitter grammars extended with ... and such. Publishes code into the git repos of the form semgrep-X.
  • the original tree-sitter grammar tree-sitter-X e.g., tree-sitter-ruby: the original tree-sitter grammar for the language. This is the git submodule lang/semgrep-grammars/src/tree-sitter-X in ocaml-tree-sitter. It is installed at the project's root in node_modules by invoking npm install.
  • syntax extensions to support semgrep patterns, such as ellipses (...) and metavariables ($FOO). This is lang/semgrep-grammars/src/semgrep-X. It can be tested from that folder with make && make test.
  • an automatically-modified grammar for language X in lang/X. It is modified so as to accommodate various requirements of the ocaml-tree-sitter code generator. lang/X/src and lang/X/ocaml-src contain the C/C++/OCaml code that will published into semgrep-X e.g. semgrep-ruby and used by semgrep.
  • semgrep-X: provides generated OCaml/C parsers as a dune project. Is a submodule of semgrep.
  • semgrep: uses the parsers provided by semgrep-X, which produce a CST. The program's CST or pattern's CST is further transformed into an AST suitable for pattern matching.

Make sure the above is clear in your mind before proceeding further. If you have questions, the best way is reach out on our community Slack channel.

Before upgrading

Make sure the grammar.js file or equivalent source files defining the grammar are included in the fyi.list file in ocaml-tree-sitter/lang/X.

Why: It is important for tracking and understanding the changes made at the source.

How: See How to add support for a new language.

Upgrade the tree-sitter-X submodule

Say you want to upgrade (or downgrade) tree-sitter-X from some old commit to commit 602f12b. This uses the git submodule way, without anything weird. The commands might be something like this:

git submodule update --init --recursive --depth 1
git checkout -b upgrade-X
cd lang/semgrep-grammars/src/tree-sitter-X
  git fetch origin --unshallow
  git checkout 602f12b
  cd ..
npm install

Testing

First, build and install ocaml-tree-sitter normally, based on the instructions found in the main README.

./configure
make setup
make
make install

Then, build support for the languages in lang/. The following commands will build and test all languages at once:

cd lang
  make
  make test

If this works, we're all set. Commit the new commit for the tree-sitter-X submodule:

git status
git commit -a
git push origin upgrade-X

We can now consider publishing the code to semgrep-X.

Publishing

From the lang folder of ocaml-tree-sitter, we'll perform the release. This step redoes some of the work that was done earlier and checks that everything is clean before committing and pushing the changes to semgrep-X.

cd lang
  ./release --dry-run X  # dry-run release
  ...                    # inspect things
  ./release X  # commit and push to semgrep-X

Using the parsers

From the semgrep repository, point to the latest semgrep-X and see what changes. If the source grammar.js was included, git diff should help figure out the changes since the last version.

Conclusion

The main difficulty is to understand how the different git projects interact and to not make mistakes when dealing with git submodules, which takes a bit of practice.

See also

How to add support for a new language