Skip to content

Latest commit

 

History

History
273 lines (266 loc) · 9.7 KB

presentation.org

File metadata and controls

273 lines (266 loc) · 9.7 KB

Proof of Concept: An implementation of a reader for extensible syntaxes for Emacs Lisp.

Lisp Introduction

Emacs History

  • Originated at MIT.
  • Used the interpreter from TECO.
  • TECO must have been a horrible command language.
  • Later implementations on Lisp Machines (i.e. using Lisp as extension commands.
  • 1984: Rewritten in C and Emacs Lisp for the GNU project (GNU Emacs).
  • Low-level code written in C, often exposed to Lisp.
  • Higher level functionality written in Lisp.
  • Thus, Emacs is not just an editor, it is a Lisp VM.

Lisp History

  • 1960: McCarthy devised a representation of functions based on lambda calculus.
  • Programs as lists with symbols.
  • Prefix notation.
  • Syntax on top was planned, never implemented.

Lisp Types

  • Many types exist.
  • Two are of supreme importance:
    • List
      • Singly-linked
      • Made of cons-cells (ordered pairs)
      • Heterogeneous
      • They have syntax!
      • When evaluated, represents an operation.
    • Symbol
      • Needed in a compiler anyway
      • Exposed to the user (because … why not?)
      • They have syntax too!
      • When evaluated, returns the appropriate slot.

What makes Lisp special?

Macros

  • Many dynamic languages (perl, python, ruby) can create code at runtime and run it.
  • Most cannot parse code at runtime.
  • This makes code manipulation/rewriting very difficult (embedded DSLs).
  • In Lisp, code is represented by Lisp datastructures, thus can be walked.
  • This makes macros possible and powerful.
  • Macros recieve their arguments unevalled, i.e. as code objects, not as the result of running the code.

Example

(dolist (i (number-sequence 0 10))
  (insert (format "%s\n" i)))
  • The list (i (number-sequence 0 10)) is not evaluated.
  • The macro dolist evaluates the second list element once.
  • All but the first argument is repeatedly evaluated.
  • An element from the sequence is bound to the symbol i for each invocation.

Editing Tools

  • Source code in any language must be well-formed.
  • Editing tools should try to enforce this.
  • Should allow editing of structure, not of characters.
  • Lispy and Paredit do so.

Example

Task: Place `cl-values’ around the call to `apply’.

(defun el-reader//read-hash-table (stream _char)
  (cl-values
   (let ((k-v (el-reader/read-delimited-list ?\} stream t)))
     (if (= (mod (length k-v) 2) 1)
         (error "Invalid syntax: {}")
       (apply #'el-reader//ht k-v)))))

Customizable Syntax

  • What this is all about!
  • Lisp enables extensible readers.
  • Syntax is for data, not language constructs:
    ()
    Lists
    []
    Vectors
    ””
    Strings
  • Reader consumes characters, produces Lisp objects.
  • The compiler consumes Lisp objects, returns (byte-)code.

The Reader

Reader vs Parser

  • Why not use the word “parser”?
  • Parsers presume a lexer.
  • Lisp does both at the same time, but exposes the reader to the user of the language.
  • A user can also manipulate where the lexer separates tokens—we’ll see an example of that later with hashtables.

Replacing the built in Reader

  • Need to replace the built-in function read altogether.
  • Yet still want to keep it around.
  • Advice to the rescue!
(define-advice read
    (:around (oldfun &optional stream)
             el-reader//replace-read)
  (if use-el-reader
      (el-reader/read stream)
    (funcall oldfun stream)))

Compatibility

  • Elisp’s read:
(read &optional stream)
  • CL’s read:
(read &optional input-stream eof-error-p eof-value recursive-p)
  • el-reader’s read:
(cl-defun el-reader/read (&optional input-stream
                                    (eof-error-p t)
                                    eof-value
                                    recursive-p
                                    keys))

How does the Reader work?

Readers Digest Version

  • read reads one expression from the stream.
  • If the first character encountered is a macro character, execute that function and use the result.
  • If not, read characters into a token (symbol or number)
    • End the token when whitespace or a macro character is encountered.
  • Escape characters may be used to prevent macro execution or token termination (may include whitespace in a symbol name).

Terminology

General Terms

Terminating macro character
Calls user-supplied function if first char in token, ends read otherwise.
Non-terminating macro character
Calls user-supplied function if first char in token, reads itself otherwise.
Read macro
A pair of a macro character and a function to be called when this character is encountered.
Syntax type
Every instance of every character has exactly one syntax type. Terminating and non-terminating macro characters are syntax types.
Token
An atomic unit of text. Reads as a symbol or number.

Character Syntax Types

Constituent
Part of a token (symbol or number).
Macro character
Can be terminating or non-terminating.
Single escape character
Causes the next character to be treated as a constituent (even if it was a macro character).
'foo\(bar
;; => A symbol with the name "foo(bar"
    
Multiple escape character
Also escapes characters to be constituent, but does so for a stretch of characters until another multiple escape character is encountered.
Whitespace
Characters which end the accumulation of a token, but are otherwise skipped.
Invalid
Characters which may not occur (unused by el-reader).

Character Traits

  • alphabetic
  • digit
  • plus sign
  • minus sign
  • dot
  • decimal point
  • ratio marker
  • exponent marker
  • invalid (unused)

Reader Algorithm (WARNING: very technical!)

Link to the code.

Additional algorithms

read-delimited-list

Link to the code.

Reading lists and dotted pair notation.

Link to the code. Link to the code in =read=.

Interpreting Numbers.

Link to the code.

Data Structures

(with-current-buffer (get-buffer "el-reader.el")
  (occur "(\\(?:defclass\\|cl-defgeneric\\)"))

API Overview

Activation

  • Use buffer local variables.
  • Add the following to the beginning of a file:
(eval-and-compile
  (setf use-el-reader t))
  • Sets the variable use-el-reader to be true, but only for the current buffer (i.e. file).
  • The advice around read honors this variable.

Example

(defun el-reader//ht (&rest args)
  "Create and return a hashtable.

Keys and values are given alternating in args."
  (let ((h (make-hash-table)))
    (cl-loop for (key value) on args by #'cddr
             do (if (and key value) (puthash key value h)
                  (error "Odd number of arguments passed")))
    h))

(defun el-reader//read-hash-table (stream _char)
  (cl-values
   (let ((k-v (el-reader/read-delimited-list ?\} stream t)))
     (if (= (mod (length k-v) 2) 1)
         (error "Invalid syntax: {}")
       (apply #'el-reader//ht k-v)))))

(el-reader/set-macro-character ?\{ #'el-reader//read-hash-table)

(cl-multiple-value-bind (fun _term)
    (el-reader/get-macro-character ?\))
  (el-reader/set-macro-character ?\} fun))

{:foo "foo" :bar 5}

Functions

set-macro-character
Link
get-macro-character
Link
make-dispatch-macro-character
Link
set-dispatch-macro-character
Link
get-dispatch-macro-character
Link
copy-readtable
Link
getch
Link
peek-char
Link
read
Link
read-preserving-whitespace
Link

Variables

*readtable*
Link
*read-base*
Link
*preserve-whitespace*
Link

Differences to Common Lisp

Improvements

  • The Syntax type of a character is now directly settable.
  • Same for traits.
  • Mapping from characters to numbers can be manipulated.

Due to deficiencies in elisp

  • Read macro procedures must manually wrap the return value in a list.
  • No package support.
  • No support for fractions.

Idiosyncrasies

  • Non-terminating dispatching macro character (‘#’)
  • No case conversion.
  • Slight name and signature differences
    • getch
    • peek-char

Future work

What is missing?

  • Not all constructs can be read yet.
  • Because of this, no compilation.

What can be improved?

  • Rewrite in C as part of Emacs.
  • Write a C module.

Editing Tools

  • As editing tools are not aware of el-reader, operations on custom syntaxes often does not work. This is already true of Common Lisp code.