Skip to content

Practice 4 ‐ Textual editors

Oszkár Semeráth edited this page Oct 2, 2024 · 8 revisions

The goal of this laboratory session is to gain practical experience with grammars, parsers, and textual editors. For this practice, we will use the online editors of the Langium framework.

Running example

In the lab session, we will use the same running example as in the previous; the example DFD specification of the Document Similarity Estimation algorithm is visible below.

Data-flow Diagram of the Document Similarity Estimation

A DFD consists of the following elements:

  • Workers transform inputs to output. For example, the Tokenize process transforms Strings to Lists of Strings. It is important to note that a Worker type may have multiple instances in a diagram, for example there are two Tokenizer node instances in the process.
  • Each node may have one or more unique input pins, which consume input values of a specific type. For example, the Scalar Product node has two input pins "1" and "2", each accepting Vectors.
  • The input pins and the outputs are connected by dedicated channels, which forward the output of a node to the input of another node. The output of a node can be used by multiple input pins, in this case each input pin gets the output. For example, the shingles of a document are processed by two different Scalar product nodes.

In this lab session, we will design a grammar and a textual editor for this problem.

Grammars

In the lab session there will be a short grammar introduction.

Environment

For this lab session, we will use the Langium framework.

  • Open the online editor of Langium Playground at https://langium.org/playground/.
  • Start the running example, and change it to a list of workers. One potential solution could look like this:
grammar TextProcessing

entry Process:
        (workers += Worker | channels += Channel)*;

Worker: "worker" name = ID;
Channel: "channel" from=[Worker] "->" to=[Worker];

// do not hurt the terminals

hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w_]*/;

hidden terminal ML_COMMENT: /\/\*[\s\S]*?\*\//;
hidden terminal SL_COMMENT: /\/\/[^\n\r]*/;
  • Then a possible content look like this:
worker t
worker s
worker p

channel t -> s
channel s -> p
  • Extend the example with the following features, and write the grammar specification in the Grammar window:
grammar TextProcessing

entry Process:
    "process" name = ID "{"
        (workers += Worker)*
        (channels += Channel)*
    "}";

Worker: "worker" type =("tokenizer" | "shingler" | "product") name = ID;
Channel: "channel" from=[Worker] "->" to=[Worker];

// do not hurt the terminals

hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w_]*/;

hidden terminal ML_COMMENT: /\/\*[\s\S]*?\*\//;
hidden terminal SL_COMMENT: /\/\/[^\n\r]*/;
  • An example model can be tried in the Content window.
process selfSimilarity {
    worker tokenizer t
    worker shingler s
    worker product p

    channel t -> s
    channel s -> p
}
  • Observe the Syntax tree by pushing the tree view button.
  • You can try the current stage of the lab via this link: ▷Langium

Types and type system

  • Currently, the types are fixed in the language. Extend the grammar with type declarations (WorkerTypeDeclaration):
grammar TextProcessing

entry Processes: (types += WorkerTypeDeclaration | processes += Process)+;

WorkerTypeDeclaration: "type" name = ID ";";

Process:
    "process" name = ID "{"
        (workers += Worker)*
        (channels += Channel)*
    "}";

Worker: "worker" type = [WorkerTypeDeclaration] name = ID;
Channel: "channel" from=[Worker] "->" to=[Worker];

// do not hurt the terminals

hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w_]*/;

hidden terminal ML_COMMENT: /\/\*[\s\S]*?\*\//;
hidden terminal SL_COMMENT: /\/\/[^\n\r]*/;
  • And example code fragment is available here:
type tokenizer;
type shingler;
type product;

process selfSimilarity {
    worker tokenizer t
    worker shingler s
    worker product p

    channel t -> s
    channel s -> p
}
  • We will explain the differences between the different concepts of types.
  • The current stage of the laboratory session is available via this link: ▷Langium.

Terminals, Numbers

  • We can introduce numbers with the following rule:
terminal INT returns number: /[1-9][0-9]*/;
  • There are alternative number notations, there are two options that allows negative numbers or any sequence of digits:
terminal INT returns number: /-?[0-9]+/;
terminal INT returns number: /[0-9]+/;
  • With the introduction of numbers, pins can be introduced with a simple / notation:
grammar TextProcessing

entry Processes: (types += WorkerTypeDeclaration | processes += Process)+;

WorkerTypeDeclaration: "type" name = ID ("/" inputs = INT)? ";";

Process:
    "process" name = ID "{"
        (workers += Worker)*
        (channels += Channel)*
    "}";

Worker: "worker" type = [WorkerTypeDeclaration] name = ID;
Channel: "channel" from=[Worker] "->" to=[Worker] ("/" inputs = INT)?;

hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w_]*/;

hidden terminal ML_COMMENT: /\/\*[\s\S]*?\*\//;
hidden terminal SL_COMMENT: /\/\/[^\n\r]*/;

terminal INT returns number: /[0-9]+/;
  • And a following model can be used:
type tokenizer/1;
type shingler;
type product/2;

process selfSimilarity {
    worker tokenizer t
    worker shingler s
    worker product p

    channel t -> s
    channel s -> p/1
    channel s -> p/2
}