Skip to content
agreppin edited this page Sep 29, 2021 · 5 revisions

The Read part of the REPL.

GetToken()

First version:

void GetToken(void) {
  char *t;
  unsigned char b, x;
  b = q->look;
  t = q->token;
  for (;;) {
    x = XlatSyntax(b);
    if (x != ' ') break;
    b = GetChar();
  }
  if (x) {
    STOS(t, b);
    b = GetChar();
  } else {
    while (b && !x) {
      STOS(t, b);
      b = GetChar();
      x = XlatSyntax(b);
    }
  }
  STOS(t, 0);
  q->look = b;
}

The second version use the fact that GetChar() is assumed to never return '\0' via BIOS int 0x10/0x00. The goal of this modification is to reduce the number of jumps at the assembler level.

...
  for (;;) {
    x = XlatSyntax(b);
    if (x == ' ') {
      b = GetChar();
      continue;
    }

    do {
      STOS(t, b);
      b = GetChar();
      if (x) break;
      x = XlatSyntax(b);
    } while (!x);

    STOS(t, 0);
    q->look = b;
    return;
  }

Then reducing the number of calls, because a call is 3 bytes and a jump is only 2:

void GetToken(void) {
  char *t;
  unsigned char b, x;
  b = q->look;
  t = q->token;
L0:
    x = XlatSyntax(b);
    if (x == ' ') goto L2;
L1: STOS(t, b);
L2: b = GetChar();
    if (x == ' ') goto L0;
    if (x) goto L3;
    x = XlatSyntax(b);
    if (!x) goto L1;
L3:
  STOS(t, 0);
  q->look = b;
  return;
}

removing XlatSyntax()

  • because it does account for some good amount of code to initialize (q.syntax table setup)
  • '\r' it is replaced by '\n' in GetChar() anyway
  • '.' is not absolutely required to parse S-expressions

To recap what GetToken() is required to do:

  1. skip withespace
  2. return '(' or ')', the S-expression delimiters
  3. or return an ATOM in the q.token buffer

Then, minimal parsing rules can be stated as:

  1. anything below or equal to 0x20 ' ' is treated as whitespace
  2. everything below or equal to 0x29 ')' is a reserved/special character

The resulting code can be seen in the repo.

GetObject()

The important thing here is that it is called just after GetToken() ...

Intern()

This function finds or stores an ATOM (in q.token) in the q.str region. It does so by using every possible short opcode x86 strings instructions:

  • cmps -> strcmp()
  • scas -> strchr() / strlen()
  • lods + stos -> stpcpy()

GetList()

This is just the C version without parsing the '.' and the quote ''' syntactic sugar.

As lists are constructed, it call Cons(), described in the Eval() page.

Clone this wiki locally