Skip to content

Commit

Permalink
docs: publish blog post about new scanner (#168)
Browse files Browse the repository at this point in the history
  • Loading branch information
plusvic authored Jul 31, 2024
1 parent 9ebb0e5 commit 4a9f9e1
Showing 1 changed file with 70 additions and 0 deletions.
70 changes: 70 additions & 0 deletions site/content/blog/a-new-parser-for-yara/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
---
title: "An new parser for YARA"
description: ""
summary: ""
date: 2024-07-31T00:00:00+01:00
lastmod: 2024-07-31T00:00:00+01:00
draft: true
weight: 50
categories: [ ]
tags: [ ]
contributors: [ "Victor M. Alvarez" ]
pinned: false
homepage: false
seo:
title: "" # custom title (optional)
description: "" # custom description (recommended)
canonical: "" # custom canonical URL (optional)
noindex: false # false (default) or true
---

One of the design goals for YARA-X was to create a parser that could be reused
in various tools like code formatters, linters, automatic rule generators. In
YARA, the parser is so tightly coupled with the code generator that it cannot
be repurposed. This forced developers to write their own parser for YARA rules,
leading to many unofficial parsers that often fell behind the official version.

From the outset, YARA-X aimed to address this by providing a reusable parser
that produced both an Abstract Syntax Tree (AST), and a Concrete Syntax Tree
(CST), also known as a lossless syntax tree. The CST retains all source code
details, like comments, newlines, and spacing, which are crucial for tools like
code formatters. This parser was initially based in the
excellent [Pest](https://pest.rs/) library.

However, starting with version 0.6.0, we decided to replace Pest with our own
custom-made parser. The reasons are twofold: first, the Pest parser is not
error-tolerant and aborts parsing at the first syntax error; second, the
produced CST is not modifiable, making it impractical for use cases like
automated code refactoring.

This issue was highlighted by [Tomáš Ďuriš](https://github.com/TommYDeeee)
and [Marek Milkovič](https://github.com/metthal) from Gen Digital. At Gen
Digital they are heavy users of YARA and were excited about the YARA-X project.
They contacted me early on, offering their help and many interesting ideas. One
of areas in which they wanted to contribute was in creating a Language Server
for Visual Studio Code.

A Visual Studio Code Language Server implements the Language Server Protocol
(LSP), which allows for features such as code completion, error checking,
navigation, and refactoring. It enhances the coding experience by providing
real-time feedback and intelligent code editing features. However, while the
Pest-based parser was an improvement over the legacy YARA parser, it was still
insufficient for implementing an LSP.

With the help of Tomáš Ďuriš, who conducted the initial research and
prototyping, I embarked on a major refactoring effort. This resulted in the
complete removal of the Pest-based parser and the creation of a new parser that
addresses all the previously mentioned shortcomings.

The new parser is error-resilient, and in the future it will be capable of
producing a modifiable CST. Additionally, it is faster for certain rules that
were pathologically bad cases for the Pest-based parser. For instance, this
seemingly simple YARA rule fails to compile with YARA-X 0.5.0 but works
perfectly with version 0.6.0.

```yara
rule bad { condition: (((((((((( true )))))))))) }
```

With these changes, the groundwork has been laid for developing more advanced
and powerful tools that can leverage the improved parsing capabilities.

0 comments on commit 4a9f9e1

Please sign in to comment.