The Parser

How Parse() works?

The Parse() method manages tokens and the input token stack, and it does so by alternating between two different tasks. I will call those tasks strokes.

Before anything check if grammar tables have been loaded, because if not, there is nothing to parse from and error message should be returned, indicating that no gramar has been loaded.
We ask if there are any tokens left in the input stack.
If no, it is stroke one - make the lexer produce a token, store it in the input stack, and return massage indicating that we have done a TokenRead.
If yes, it is stroke two - get the token from the top of the input stack - that is the last that was added, without removing it, and decide if it is Ok to be send for parsing.

3.1. If it is an error token, then obviously the lexer had encountered an error, so return the proper error message - GroupError or LexicalError and leave the token on the input stack, so that if Parse() is called again, it will return the same error message.

3.2. If it is noise, discard it (Pop()) from the input stack and start over from 1. Otherwise proceed to parsing the token by calling ParseLALR(token) and look for the result.

3.2.1. If it is "ShiftResult" this means that the token is now in the work stack, so discard it (Pop()) from the input stack and start over from 1.

3.2.2. If it is "ReduceResult" then this means that top N tokens on the work stack have been reduced to a single new token and the token we tried to parse have not been worked in - it served only as an indicator that a reduce is to be performed, and thus needs to be parsed again. That's why we don't remove it from the input stack, but only return massage indicating that a reduction have been done.

3.3.3. If the result is an error, the logic is the same as with the above errors 3.1) - we leave the token on the input stack and return the appropriate message - "InternalError" or "SyntaxError".

3.3.4. And the same with "AcceptResult" - we do nothing and return "Accept".

1) This is an abstract code:

if (grammarTablesNotLoaded) return NotLoadedError;
start: if (inputStack.Count == 0)
{	
	//stroke 1
	inputStack.Push(ProduceToken()); 
	return TokenRead;
}
else
{	
	//stroke 2
	Token t = inputStack.Peek();
	if (t is Error) return AppropriateError;
	if (t is Noise) inputStack.Pop(); goto start;
	switch(ParseLALR(token))
	{
		case Shift: inputStack.Pop(); goto start;
		case Accept: return AppropriateMessage;
		default: return AppropriateError;
	}
}

How does ParseLALR(Token) works?

The ParseLALR(Token) method does the actual parsing - that is, push tokens on the work stack, does reductions on the work stack, and change the parser state - which is an integer variable.

Get the action that need to be performed for the token - "state = LRStateList[curentState]; action = state[token.Symbol];". If there is no action (thus we get null), this means that token of this symbol was not expected at all, thus we have a SyntaxError

1.1 In case of a syntax error, we loop through every action for this state, and add the symbols to a list, called "ExpectedSymbols". It is useful to know the symbols that were expected, and be able to show a message hint to the user - something like "Syntax error on line 10, col 22 - expected integer_value, string_value or semicolon" for example. Then we return a message indicating a SyntaxError.

Else, depending on the Type of the action to be performed, we:

2.1 If Accept, set the reduction flag to true "bool HasReduction = true" (this indicates that the top token on the work stack is a reduction token), and return Accept.

2.2. If Shift, curentState is changed to a new state, and that state is saved in our token, too. Our token is puched on the work stack, and Shift is returned.

2.3. If Reduce, we are going to take last N symbols from the work stack, stick them in a reduction, stick that reduction in a token, push it on the work stack and finally change the state of the parser.

2.3.1. We get a production, for the current reduce action - the reduce action value is the production index - "ProductionList[action.Value];"

2.3.2. Procuce new token.

2.3.2.1. If we allow trimming of reductions, and our production contains only one non-terminal, we Pop a token from the work stack, and change it's symbol to be the head of the production - this is what trimming is. We set the result to ReduceEliminated message. Note that we do not return anything yet.

2.3.2.2. Else, set the reduction flag to true "bool HasReduction = true" (this indicates that the top token on the work stack is a reduction token), then create new reduction object accrdingly, and populate it fron the tokens on the work stack, while popping those tokens. Then, create a new token with that reduction object we just created as data, and production head as symbol, and set the result to ReduceNormal. Note that we do not return anything yet.

2.3.3. Retrieve previous state index from the token now on top of the stack, as we have popped all the tokens that go in the reduction, and get an action for that state and the head symbol of the production - "production.Head". If there is no action, return a message indicating InternalError, else curentState is changed to a new state, and that state is saved in our new token (the one with the reduction). That token is pushed on the work stack, and the result is returned, be it ReduceEliminated or ReduceNormal.

2.4. If we have any other action type - that is "LRActionType.Error" or "LRActionType.Goto", we return InternalError.

1) This is an abstract code:

LRState state = stateList[curentState]; LRAction action = state[token.Symbol];

if (action == null)
	for (int i = 0; i < state.Count(); i++)
		LRAction lra = lRState[i]; ExpectedSymbols.Add(lra.Symbol);

haveReduction = false;
switch (action.Type)
{
	case LRActionType.Accept:
		_haveReduction = true; return ParseResult.Accept;

	case LRActionType.Shift:
		curentState = action.Value;
		token.State = curentState;
		workStack.Push(token);
		return ParseResult.Shift;

	case LRActionType.Reduce:

		Production production = ProductionList[action.Value];
		ParseResult result; Token t;
		if (trimReductions && production.ContainsOneNonTerminal) {
		
			t = _Stack.Pop(); 
			t.Symbol = production.Head; 
			result = ParseResult.ReduceEliminated;
		} else {
			
			haveReduction = true;
			Reduction reduction = new Reduction();
			reduction.Production = production;
			for (int i = production.Count() - 1; i >= 0; i -= 1) reduction[i] = _Stack.Pop();
			t = new Token(production.Head, reduction);
			result = ParseResult.ReduceNormal;
		}

		int stateindex = workStack.Top().State;
		LRState state = LRStateList[stateindex];
		int actionindex = state.IndexOf(production.Head);
		if (actionindex == -1) { return ParseResult.InternalError; }

		LRAction gotoAction = state[actionindex];
		curentState = gotoAction.Value;
		t.State = curentState;
		workStack.Push(t);
		return result;

	default:
		return ParseResult.InternalError;
}

Home
1. Introduction
2. Use the Engine
3. The Parse tree
4. The Grammar tables
5. The Egt file format
6. The Parser
7. The Lexer

Links
1. Home
2. Getting started
3. About
4. Engines
5. Documentation
6. Articles
7. GOLD Meta Language

Pseudo Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The Parser

How Parse() works?

1) This is an abstract code:

How does ParseLALR(Token) works?

1) This is an abstract code:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally