Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
223 changes: 213 additions & 10 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,18 @@
members = [
"crates/djc-core",
"crates/djc-html-transformer",
"crates/djc-template-parser",
]
resolver = "2"

[workspace.dependencies]
pyo3 = { version = "0.27.1", features = ["extension-module"] }
quick-xml = "0.38.3"
pest = "2.8.3"
pest_derive = "2.8.3"
Copy link
Contributor Author

@JuroOravec JuroOravec Oct 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The template syntax parsing was implemented using Pest. Pest works in 3 parts:

  1. "grammar rules" - definition of patterns that are supported in the.. language? I'm not sure about the correct terminology.

    Pest defines it's own language for defining these rules, see djc-template-parser/src/grammar.pest.

    This is similar to Backus–Naur Form, e.g.

    <postal-address> ::= <name-part> <street-address> <zip-part>
    <name-part> ::= <personal-part> <last-name> <opt-suffix-part> <EOL> | <name-part>
    <street-address> ::= <house-num> <street-name> <opt-apt-num> <EOL>
    <zip-part> ::= <town-name> "," <state-code> <ZIP-code> <EOL>
    

    Or the MDN's formal syntax, e.g. here:

    border-left-width = 
      <line-width>  
    
    <line-width> = 
      [<length [0,∞]>](https://developer.mozilla.org/en-US/docs/Web/CSS/length)  [|](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_values_and_units/Value_definition_syntax#single_bar)
      thin            [|](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_values_and_units/Value_definition_syntax#single_bar)
      medium          [|](https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_values_and_units/Value_definition_syntax#single_bar)
      thick
    

    Well and this Pest grammar is where all the permissible patterns are defined. E.g. here's a high-level example for a {% ... %} template tag (NOTE: outdated version):

 // The full tag is a sequence of attributes
 // E.g. `{% slot key=val key2=val2 %}`
 tag_wrapper = { SOI ~ django_tag ~ EOI }
 
 django_tag = { "{%" ~ tag_content ~ "%}" }
 
 // The contents of a tag, without the delimiters
 tag_content = ${
     spacing*                             // Optional leading whitespace/comments
     ~ tag_name                           // The tag name must come first, MAY be preceded by whitespace
     ~ (spacing+ ~ attribute)*            // Then zero or more attributes, MUST be separated by whitespace/comments
     ~ spacing*                           // Optional trailing whitespace/comments
     ~ self_closing_slash?                // Optional self-closing slash
     ~ spacing*                           // More optional trailing whitespace
 }
  1. Parsing and handling of the matched grammar rules.

    So each defined rule has its own name, e.g. django_tag.

    When a text is parsed with Pest in Rust, we get a list of parsed rules (or a single rule?).

    Since the grammar definition specifies the entire {% .. %} template tag, and we pass in a string starting and ending in {% ... %}, we should match exactly the top-level tag_wrapper rule.

    If we match anything else in its place, we raise an error.

    Once we have tag_wrapper, we walk down it, rule by rule, constructing the AST from the patterns we come across.

  2. Constructing the AST.

    The AST consists of these nodes - Tag, TagAttr, TagToken, TagValue, TagValueFilter

    • Tag - the entire {% ... %}, e.g {% my_tag x ...[1, 2, 3] key=val / %}

    • The first word inside a Tag is the tag_name, e.g. my_tag.

    • After the tag name, there are zero or more TagAttrs. This is ALL inputs, both positional and keyword

      • Tag attrs are x, ...[1, 2, 3], key=val
      • If a tag attribute has a key, that's stored on TagAttrs.
      • But ALL TagAttrs MUST have a value.
    • TagValue holds a single value, may have a filter, e.g. "cool"|upper

      • TagValue may be of different kinds, e.g. string, int, float, literal list, literal dict, variable, translation _('mystr'), etc. The specific kind is identified by what rules we parse, and the resulting TagValue nodes are distinguished by the ValueKind, an enum with values like "string", "float", etc.
      • Since TagValue can be also e.g. literal lists, TagValues may contain other TagValues. This implies that:
        1. Lists and dicts themselves can have filters applied to them, e.g. [1, 2, 3]|append:4
        2. items inside lists and dicts can too have filters applied to them. e.g. [1|add:1, 2|add:2]
    • Any TagValue can have 0 or more filters applied to it. Filters have a name and an optional argument, e.g. 3|add:2 - filter name add, arg 2. These filters are held by TagValueFilter.

      • While the filter name is a plain identifier, the argument can be yet another TagValue. so even using literal lists and dicts at the position of filter argument is permitted, e.g. [1]|extend:[2, 3]
    • Lastly, TagToken is a secondary object used by the nodes above. It contains info about the original raw string, and the line / col where the string was found.

The final AST can look like this:

INPUT:

{% my_tag value|lower %}

AST:

Tag {
    name: TagToken {
        token: "my_tag".to_string(),
        start_index: 3,
        end_index: 9,
        line_col: (1, 4),
    },
    attrs: vec![TagAttr {
        key: None,
        value: TagValue {
            token: TagToken {
                token: "value".to_string(),
                start_index: 10,
                end_index: 15,
                line_col: (1, 11),
            },
            children: vec![],
            spread: None,
            filters: vec![TagValueFilter {
                arg: None,
                token: TagToken {
                    token: "lower".to_string(),
                    start_index: 16,
                    end_index: 21,
                    line_col: (1, 17),
                },
                start_index: 15,
                end_index: 21,
                line_col: (1, 16),
            }],
            kind: ValueKind::Variable,
            start_index: 10,
            end_index: 21,
            line_col: (1, 11),
        },
        is_flag: false,
        start_index: 10,
        end_index: 21,
        line_col: (1, 11),
    }],
    is_self_closing: false,
    syntax: TagSyntax::Django,
    start_index: 0,
    end_index: 24,
    line_col: (1, 4),
}

thiserror = "2.0.17"
regex = "1.12.2"
lazy_static = "1.5.0"

# https://ohadravid.github.io/posts/2023-03-rusty-python
[profile.release]
Expand Down
1 change: 1 addition & 0 deletions crates/djc-core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,6 @@ crate-type = ["cdylib"]

[dependencies]
djc-html-transformer = { path = "../djc-html-transformer" }
djc-template-parser = { path = "../djc-template-parser" }
pyo3 = { workspace = true }
quick-xml = { workspace = true }
Loading
Loading