Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add lexer development documentation #1111

Merged
merged 4 commits into from
May 31, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .yardopts
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
--no-private
--protected
--markup-provider=redcarpet
--markup=markdown
- docs/LexerDevelopment.md
87 changes: 6 additions & 81 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

If you'd like to help out with this project, assign yourself something from the [issues][] page, and send me a pull request (even if it's not done yet!). Bonus points for feature branches.

[issues]: https://github.com/jneen/rouge/issues "Help Out"
[issues]: https://github.com/rouge-ruby/rouge/issues "Help Out"
[pygments]: http://pygments.org/ "Pygments"

## Usage
Expand Down Expand Up @@ -86,7 +86,7 @@ $ rougify style monokai.sublime > syntax.css

### Advantages to pygments.rb
* No need to [spawn Python processes](https://github.com/tmm1/pygments.rb).
* We're faster in [almost every measure](https://github.com/jneen/rouge/pull/41#issuecomment-223751572)
* We're faster in [almost every measure](https://github.com/rouge-ruby/rouge/pull/41#issuecomment-223751572)

### Advantages to CodeRay
* The HTML output from Rouge is fully compatible with stylesheets designed for pygments.
Expand Down Expand Up @@ -126,7 +126,7 @@ Rouge is only for UTF-8 strings. If you'd like to highlight a string with a dif

### Installing Ruby

If you're here to implement a lexer for your awesome language, there's a good chance you don't already have a ruby development environment set up. Follow the [instructions on the wiki](https://github.com/jneen/rouge/wiki/Setting-up-Ruby) to get up and running. If you have trouble getting set up, let me know - I'm always happy to help.
If you're here to implement a lexer for your awesome language, there's a good chance you don't already have a ruby development environment set up. Follow the [instructions on the wiki](https://github.com/rouge-ruby/rouge/wiki/Setting-up-Ruby) to get up and running. If you have trouble getting set up, let me know - I'm always happy to help.

### Run the tests

Expand All @@ -141,86 +141,11 @@ To test a lexer visually, run `rackup` from the root and go to `localhost:9292/#

is at http://rubydoc.info/gems/rouge/frames.

### Using the lexer DSL
### Developing lexers

You can probably learn a lot just by reading through the existing lexers. Basically, a lexer consists of a collection of states, each of which has several rules. A rule consists of a regular expression and an action, which yields tokens and manipulates the state stack. Each rule in the state on top of the stack is tried *in order* until a match is found, at which point the action is run, the match consumed from the stream, and the process repeated with the new lexer on the top of the stack. Each lexer has a special state called `:root`, and the initial state stack consists of just this state.
We have [a guide][lexer-dev-doc] on lexer development in the documentation but you'll also learn a lot by reading through the existing lexers.

Here's how you might use it:

``` ruby
class MyLexer < Rouge::RegexLexer
state :root do
# the "easy way"

# simple rules
rule /0x[0-9a-f]+/, Num::Hex

# simple state stack manipulation
rule /{-/, Comment, :next_state
rule /-}/, Comment, :pop!

# the "flexible way"
rule /abc/ do |m|
# m is the match, for accessing match groups manually

# you can do the following things:
pop!
push :another_state
push # assumed to be the current state
state? :some_state # check if the current state is :some_state
in_state? :some_state # check if :some_state is in the state stack

# yield a token. if no second argument is supplied, the value is
# taken to be the whole match.
# The sum of all the tokens yielded must be equivalent to the whole
# match - otherwise characters will go missing from the user's input.
token Generic::Output, m[0]

# calls SomeOtherLexer.lex(str) and yields its output. See the
# HTML lexer for a nice example of this.
# if no second argument is supplied, it is assumed to be the whole
# match string.
delegate SomeOtherLexer, str

# the context object is the lexer itself, so you can stash state here
@count ||= 0
@count += 1

# advanced: push a dynamically created anonymous state
push do
rule /.../, Generic::Output
end
end

rule /(\w+)(:)/ do
# "groups" yields the matched groups in order
groups Name::Label, Punctuation
end
end

start do
# this is run whenever a fresh lex is started
end
end
```

If you're creating a lexer that's very similar to a different lexer, you can use subclassing (see C/C++/ObjC and also QML/Javascript for examples):

``` ruby
class MyLexer < OtherLexer
# independent states
state :my_state do ... end

# override states
state :your_state do ... end

# prepend rules to states
prepend :parent_state do ... end

# append rules to states
append :parent_state do ... end
end
```
[lexer-dev-doc]: https://www.rubydoc.info/github/rouge-ruby/rouge/file/docs/LexerDevelopment.md

Please don't submit lexers that are largely copy-pasted from other files.

Expand Down
Loading