Skip to content
This repository has been archived by the owner on Aug 11, 2024. It is now read-only.

Subtext recursive descent parser #3

Merged
merged 7 commits into from
Nov 1, 2021
Merged

Conversation

gordonbrander
Copy link
Collaborator

Re-implementing Subtext via recursive descent.

  • Two passes instead of 4x passes with Regexp
  • Gives us a DOM to work with so we can extract titles, etc

@gordonbrander gordonbrander changed the title WIP 2021 10 25 subtext tape WIP 2021-10-25 Subtext recursive descent parser Oct 26, 2021
@gordonbrander gordonbrander changed the title WIP 2021-10-25 Subtext recursive descent parser Subtext recursive descent parser Nov 1, 2021
Copy link
Collaborator

@cdata cdata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work 💯

markup: markup,
range: selection
).renderMarkup(url: Slashlink.slashlinkToURLString)
Subtext(markup: markup)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, no longer versioned 📝

var span: Substring
}

struct Bracketlink {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only know about the hyperlink and slashlink link forms - what is the format/semantics of bracketlink?

Copy link
Collaborator Author

@gordonbrander gordonbrander Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdata Covered here in the specification: https://github.com/gordonbrander/subtext/blob/main/specification.md#bracketed-urls. TLDR, they are a syntax form that allows linking to non-http/https protocols.

This won't link:
ipfs://asdfasdfasfasdf
dat://asdfasdfasfasdf

This will link:
<ipfs://asdfasdfasfasdf>
<dat://asdfasdfasfasdf>

You can write HTTP urls either way:
This will link: https://example.com
So will this: <https://example.com>

URL syntax is very very open-ended, so it is difficult to auto-link the general form of URLs. Bracket links mean we can autolink http urls, and support other protocols without forcing parsers to maintain a whitelist of exotic protocols. Easy things are easy, difficult things are possible.


import Foundation

struct Tape<T>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎗️ I like this metaphor

}

/// Move forward one element
@discardableResult mutating func consume() -> T.SubSequence {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does @discardableResult imply?

(my intuition is that this has something to do with reference counting)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdata Ordinarily XCode will complain if you don't assign a return value to a variable, reasoning that this is a mistake. The language provides this decorator thingy to say "it's ok to discard return value, sometimes I just use this function for its side effects".


/// Move forward one element
@discardableResult mutating func consume() -> T.SubSequence {
let subsequence = collection[currentIndex...currentIndex]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a range operator (...) required here?

Copy link
Collaborator Author

@gordonbrander gordonbrander Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdata It is! It causes us to get a T.Subsequence, instead of a T.Item. We use Subsequences everywhere in Tape because they maintain index references to the underlying collection.

}

/// Get current subsequence
var subsequence: T.SubSequence {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC this is the form of an accessor (a "getter" in this case) in Swift 📝

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is shorthand for a get-only accessor.

}

/// Peek forward, and consume if match
mutating func consumeMatch(_ subsequence: T.SubSequence) -> Bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ 📝

Copy link
Collaborator Author

@gordonbrander gordonbrander Nov 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdata Yeah, _ here means "don't use argument label on call-side".

Swift is strange in that arguments are both positional at call-site, AND labeled. I think this is an Objective-C holdover? Anyway, you can avoid having to use the argument label on call side this way. I am of two minds about this. Some swift code uses this form for first argument, and labels for extra arguments. In more recent code, I've defaulted to keeping labels more of the time, even though it is verbose.

return self.subsequence
}

/// Get a single-item SubSequence offset by `forward` of the `currentStartIndex`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this comment is out of date - forward does not appear in the method implementation (perhaps you meant offset).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!


/// Capture word-boundary-delimited forms at beginning of line.
tape.start()
if let inline = consumeInlineWordBoundaryForm(tape: &tape) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be possible to satisfy this condition as part of the loop to reduce the method's complexity a little.

@gordonbrander gordonbrander merged commit 6da9c52 into main Nov 1, 2021
bfollington added a commit that referenced this pull request Feb 8, 2024
)

Yet another riff on
#1104

## Design

This PR changes the design by making removing the concept of an
orchestrator. Instead, we have

- Classifiers
  - May be composed together into a single classifier
- Routes
- Receives a request struct which contains the input and
classifications, and a `process(:)` function.
  - Returns a string or nil.
- Routes may safely recursively call the parent router using the
`process(:)` function.
- Router
- Has a classifier, which it runs for each request. Classifier may be
composed of multiple classifiers.
- Has an array of routes. Routes are run top-to-bottom for each request.
  - Router exits on the first route match.

This gives routes a high degree of expressivity, since a route may
rewrite the input using information from the classifications, and then
recurse back into the router. Routes may also call out to specialized
sub-routers, allowing us to construct trees of routing, with each router
able to recurse on itself. This is similar in principle to many
rule-based NLP systems such as AIML or ChatScript that use hierarchy and
recursion to pick apart an input and dispatch parts of it to different
subsystems, before returning a result.

The previous orchestrator model only had weights to work with to make a
choice between results. This approach has much more nuanced control over
the result, since the result can be returned from specific and
specialized branches within the tree of routers.

## Concepts

- `PromptClassification`: an individual classification with weight
- `PromptClassifierProtocol`: given an input, produces classifications
- `PromptClassifier`: composes classifiers together
- `PromptRouteRequest`: a request struct containing the context for a
given route request, including original input, classifications, and a
function to recursively call into the routes router.
- `PromptRouteProtocol`: a route
- `PromptRoute`: a concrete implementation of `PromptRouteProtocol`
taking a closure as a definition
- `PromptRouter`: given an array of routes and a classifier, produces a
result with `process(:) -> String?`
- `PromptRouterRequest`: an ephemeral actor that exists for the duration
of a `PromptRouter.process(:)` request. This actor tracks the recursion
depth and makes sure requests don't recurse too deeply.
- Aside: I was pleased to learn that actors do not cost any more than
classes
- `PromptService`: configures classifiers, routes, and routers.

---------

Co-authored-by: Ben Follington <5009316+bfollington@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants