-
Notifications
You must be signed in to change notification settings - Fork 1
Practice 4 ‐ Textual editors
The goal of this laboratory session is to gain practical experience with grammars, parsers, and textual editors. For this practice, we will use the online editors of the Langium framework.
In the lab session, we will use the same running example as in the previous; the example DFD specification of the Document Similarity Estimation algorithm is visible below.
A DFD consists of the following elements:
-
Workers transform inputs to output. For example, the Tokenize process transforms Strings to Lists of Strings. It is important to note that a Worker type may have multiple instances in a diagram, for example there are two
Tokenizer
node instances in the process. - Each node may have one or more unique input pins, which consume input values of a specific type. For example, the Scalar Product node has two input pins "1" and "2", each accepting Vectors.
- The input pins and the outputs are connected by dedicated channels, which forward the output of a node to the input of another node. The output of a node can be used by multiple input pins, in this case each input pin gets the output. For example, the shingles of a document are processed by two different Scalar product nodes.
In this lab session, we will design a grammar and a textual editor for this problem.
In the lab session there will be a short grammar introduction.
For this lab session, we will use the Langium framework.
- Open the online editor of Langium Playground at https://langium.org/playground/.
- Start the running example, and change it to a list of workers. One potential solution could look like this:
grammar TextProcessing
entry Process:
(workers += Worker | channels += Channel)*;
Worker: "worker" name = ID;
Channel: "channel" from=[Worker] "->" to=[Worker];
// do not hurt the terminals
hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w_]*/;
hidden terminal ML_COMMENT: /\/\*[\s\S]*?\*\//;
hidden terminal SL_COMMENT: /\/\/[^\n\r]*/;
- Then a possible content look like this:
worker t
worker s
worker p
channel t -> s
channel s -> p
- Extend the example with the following features, and write the grammar specification in the Grammar window:
grammar TextProcessing
entry Process:
"process" name = ID "{"
(workers += Worker)*
(channels += Channel)*
"}";
Worker: "worker" type =("tokenizer" | "shingler" | "product") name = ID;
Channel: "channel" from=[Worker] "->" to=[Worker];
// do not hurt the terminals
hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w_]*/;
hidden terminal ML_COMMENT: /\/\*[\s\S]*?\*\//;
hidden terminal SL_COMMENT: /\/\/[^\n\r]*/;
- An example model can be tried in the Content window.
process selfSimilarity {
worker tokenizer t
worker shingler s
worker product p
channel t -> s
channel s -> p
}
- Observe the Syntax tree by pushing the tree view button.
- You can try the current stage of the lab via this link: ▷Langium
- Currently, the types are fixed in the language. Extend the grammar with type declarations (
WorkerTypeDeclaration
):
grammar TextProcessing
entry Processes: (types += WorkerTypeDeclaration | processes += Process)+;
WorkerTypeDeclaration: "type" name = ID ";";
Process:
"process" name = ID "{"
(workers += Worker)*
(channels += Channel)*
"}";
Worker: "worker" type = [WorkerTypeDeclaration] name = ID;
Channel: "channel" from=[Worker] "->" to=[Worker];
// do not hurt the terminals
hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w_]*/;
hidden terminal ML_COMMENT: /\/\*[\s\S]*?\*\//;
hidden terminal SL_COMMENT: /\/\/[^\n\r]*/;
- And example code fragment is available here:
type tokenizer;
type shingler;
type product;
process selfSimilarity {
worker tokenizer t
worker shingler s
worker product p
channel t -> s
channel s -> p
}
- We will explain the differences between the different concepts of types.
- The current stage of the laboratory session is available via this link: ▷Langium.
- We can introduce numbers with the following rule:
terminal INT returns number: /[1-9][0-9]*/;
- There are alternative number notations, there are two options that allows negative numbers or any sequence of digits:
terminal INT returns number: /-?[0-9]+/;
terminal INT returns number: /[0-9]+/;
- With the introduction of numbers, pins can be introduced with a simple
/
notation:
grammar TextProcessing
entry Processes: (types += WorkerTypeDeclaration | processes += Process)+;
WorkerTypeDeclaration: "type" name = ID ("/" inputs = INT)? ";";
Process:
"process" name = ID "{"
(workers += Worker)*
(channels += Channel)*
"}";
Worker: "worker" type = [WorkerTypeDeclaration] name = ID;
Channel: "channel" from=[Worker] "->" to=[Worker] ("/" inputs = INT)?;
hidden terminal WS: /\s+/;
terminal ID: /[_a-zA-Z][\w_]*/;
hidden terminal ML_COMMENT: /\/\*[\s\S]*?\*\//;
hidden terminal SL_COMMENT: /\/\/[^\n\r]*/;
terminal INT returns number: /[0-9]+/;
- And a following model can be used:
type tokenizer/1;
type shingler;
type product/2;
process selfSimilarity {
worker tokenizer t
worker shingler s
worker product p
channel t -> s
channel s -> p/1
channel s -> p/2
}
-
The final version of this lab session can be accessed via the following link: ▷Langium.
-
In this lab session we skip the programing of scope providers, which are typically essential for complex grammars, but requires programming: https://langium.org/docs/learn/workflow/resolve_cross_references/.