Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

addresses issue #310 #311

Merged
merged 4 commits into from
Jun 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 84 additions & 3 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,12 @@

## Basics

Most of this document is concerned with the mechanics of raising issues
and posting Pull Requests to offer improvements to Quamina. Following
this, there is a section entitled **Developing** that describes
technology issues that potential contributors will face
and tools that might be helpful.

Quamina is hosted in this GitHub repository
at `github.com/timbray/quamina` and welcomes
contributions.
Expand All @@ -12,7 +18,7 @@ This is important because possibly Quamina already
does what you want, in which case perhaps what’s
needed is a documentation fix. Possibly the idea
has been raised before but failed to convince Quamina’s
maintainers. (Doesnt mean it won’t find favor now;
maintainers. (Doesn't mean it won’t find favor now;
times change.)

Assuming there is agreement that a change in Quamina
Expand All @@ -27,7 +33,7 @@ The coding style suggested by the Go community is
used in Quamina. See the
[style doc](https://github.com/golang/go/wiki/CodeReviewComments) for details.

Try to limit column width to 120 characters for both code and markdown documents
Try to limit column width to 120 characters for both code and Markdown documents
such as this one.

### Format of the Commit Message
Expand Down Expand Up @@ -64,7 +70,7 @@ is recommended to break up your commits using distinct prefixes.

### Signing commits

Commits should be signed (not just the `-s` “signd off on”) with
Commits should be signed (not just the `-s` “signed off on”) with
any of the [styles GitHub supports](https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits).
Note that you can use `git config` to arrange that your commits are
automatically signed with the right key.
Expand Down Expand Up @@ -99,3 +105,78 @@ instructions for installing it.

When opening a new issue, try to roughly follow the commit message format
conventions above.

## Developing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great section! Since we talked about OS env vars, how would a dev enable the pretty printer easily? Would be nice to explain this here too.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. CONTRIBUTING now contains a mini developer's user guide.


### Automata

Quamina works by compiling the Patterns together into a Nondeterministic
Finite Automaton (NFA) which proceeds byte-at-a-time through the UTF-encoded
fields and values. NFAs are nondeterministic in the sense that a byte value
may cause multiple transitions to different states.

The general workflow, for some specific pattern type, is to write code to build
an automaton that matches that type. Examples are the functions `makeStringFA()` in
`value_matcher.go` and `makeShellStyleAutomaton()` in `shell_style.go`. Then,
insert calls to the automaton builder in `value_matcher.go`, which is reasonably
straightforward code. It takes care of merging new automata with existing ones
as required.

### Testing

A straightforward way to test a new feature is exemplified by `TestLongCase()` in
`shell_style_test.go`:

1. Make a `coreMatcher` by calling `newCoreMatcher()`
2. Add patterns to it by calling `addPattern()`
3. Make test data and examine matching behavior by calling `matchesForJSONEvent()`

### Prettyprinting NFAs

NFAs can be difficult to build and to debug. For this reason, code
is provided in `prettyprinter.go` which produces human-readable NFA
representations.

To use the prettyprinter, make an instance with `newPrettyPrinter()` - the only
argument is a seed used to generate state numbers. Then, instead of calling
`addPattern()`, call `addPatternWithPrinter()`, passing your prettyprinter into
the automaton-building code. New automata are created by `valueMatcher` calls,
see `value_matcher.go`. Ensure that the prettyprinter is passed to your
automaton-matching code; an example of this is in the `makeShellStyleAutomaton()`
function. Then, in your automaton-building code, use `prettyprinter.labelTable()`
to attach meaningful labels to the states of your automaton. Then at
some convenient point, call `prettyprinter.printNFA()` to generate the NFA printout;
real programmers debug with Print statements.

### Prettyprinter output

`makeShellStyleAutomaton()` code has `prettyprinter` call-outs to
label the states and transitions it creates, and the `TestPP()` test in
`prettyprinter_test.go` uses this. The pattern being matched is `"x*9"` and
the prettyprinter output is:

```
758 [START HERE] '"' → [910 on " at 0]
910 [on " at 0] 'x' → [821 gS at 2]
821 [gS at 2] '9' → [551 gX on 9 at 3] / ★ → [821 gS at 2]
551 [gX on 9 at 3] '"' → [937 on " at 4] / '9' → [551 gX on 9 at 3] / ★ → [821 gS at 2]
937 [on " at 4] '9' → [551 gX on 9 at 3] / 'ℵ' → [820 last step at 5] / ★ → [821 gS at 2]
820 [last step at 5] [1 transition(s)]
```

Each line represents one state.

Each step gets a 3-digit number and a text description. The construct `★ →` represents
a default transition, which occurs in the case that none of the other transitions match. The
symbol `ℵ` represents the end of the input value.

In this particular NFA, the `makeShellStyleAutomaton` code labels states corresponding to
the `*` "glob" character with text including `gS` for "glob spin" and states that escape the
"glob spin" state with `gX` for "glob exit".

Most of the NFA-building code does not exercise the prettyprinter. Normally, you would insert
such code while debugging a particular builder and remove it after completion. Since the
shell-style builder is unusually complex, the prettyprinting code is retained in anticipation
of future issues and progress to full regular-expression NFAs.


8 changes: 7 additions & 1 deletion core_matcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,12 @@ func (m *coreMatcher) fields() *coreFields {
// addPattern - the patternBytes is a JSON text which must be an object. The X is what the matcher returns to indicate
// that the provided pattern has been matched. In many applications it might be a string which is the pattern's name.
func (m *coreMatcher) addPattern(x X, patternJSON string) error {
return m.addPatternWithPrinter(x, patternJSON, sharedNullPrinter)
}

// addPatternWithPrinter can be called from debugging and under-development code to allow viewing pretty-printed
// NFAs
func (m *coreMatcher) addPatternWithPrinter(x X, patternJSON string, printer printer) error {
timbray marked this conversation as resolved.
Show resolved Hide resolved
patternFields, err := patternFromJSON([]byte(patternJSON))
if err != nil {
return err
Expand Down Expand Up @@ -97,7 +103,7 @@ func (m *coreMatcher) addPattern(x X, patternJSON string) error {
case existsFalseType:
ns = state.addExists(false, field)
default:
ns = state.addTransition(field)
ns = state.addTransition(field, printer)
}

nextStates = append(nextStates, ns...)
Expand Down
4 changes: 2 additions & 2 deletions field_matcher.go
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ func (m *fieldMatcher) addExists(exists bool, field *patternField) []*fieldMatch
return []*fieldMatcher{trans}
}

func (m *fieldMatcher) addTransition(field *patternField) []*fieldMatcher {
func (m *fieldMatcher) addTransition(field *patternField, printer printer) []*fieldMatcher {
timbray marked this conversation as resolved.
Show resolved Hide resolved
// we build the new updateable state in freshStart so that we can blast it in atomically once computed
current := m.fields()
freshStart := &fmFields{
Expand All @@ -119,7 +119,7 @@ func (m *fieldMatcher) addTransition(field *patternField) []*fieldMatcher {
// cases where this doesn't happen and reduce the number of fieldMatchStates
var nextFieldMatchers []*fieldMatcher
for _, val := range field.vals {
nextFieldMatchers = append(nextFieldMatchers, vm.addTransition(val))
nextFieldMatchers = append(nextFieldMatchers, vm.addTransition(val, printer))

// if the val is a number, let's add a transition on the canonicalized number
// TODO: Only do this if asked
Expand Down
67 changes: 4 additions & 63 deletions nfa.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,10 @@ func traverseOneFAStep(table *smallTable, index int, val []byte, transitions []*
return transitions
}
index++
// 1. Note no effort to traverse multiple next-steps in parallel. The traversal compute is tiny and the
// necessary concurrency apparatus would almost certainly outweigh it
// 2. TODO: It would probably be better to implement this iteratively rather than recursively.
// The recursion will potentially go as deep as the val argument is long.
for _, nextStep := range nextSteps.steps {
transitions = append(transitions, nextStep.fieldTransitions...)
transitions = traverseOneFAStep(nextStep.table, index, val, transitions)
Expand Down Expand Up @@ -101,66 +105,3 @@ func mergeFAStates(state1, state2 *faState, keyMemo map[faStepKey]*faState) *faS

return combined
}

/**************************************/
/* debugging apparatus from here down */
/**************************************/
/*
func (t *smallTable) dump() string {
return dump1(&faState{table: t}, 0, make(map[*smallTable]bool))
}
func dump1(fas *faState, indent int, already map[*smallTable]bool) string {
t := fas.table
s := " " + st2(t) + "\n"
for _, step := range t.steps {
if step != nil {
for _, state := range step.steps {
_, ok := already[state.table]
if !ok {
already[state.table] = true
s += dump1(state, indent+1, already)
}
}
}
}
return s
}
func (t *smallTable) shortDump() string {
return fmt.Sprintf("%d-%s", t.serial, t.label)
}

func (n *faNext) String() string {
var snames []string
for _, step := range n.steps {
snames = append(snames, fmt.Sprintf("%d %s", step.table.serial, step.table.label))
}
return "[" + strings.Join(snames, " · ") + "]"
}

func stString(t *smallTable) string {
var rows []string

for i := range t.ceilings {
c := t.ceilings[i]
if i == 0 {
c = 0
} else {
if c != valueTerminator && c != byte(byteCeiling) {
c = t.ceilings[i-1]
}
}
var trailer string
if i == len(t.ceilings)-1 && c != valueTerminator && c != byte(byteCeiling) {
trailer = "…"
} else {
trailer = ""
}
if t.steps[i] != nil {
rows = append(rows, fmt.Sprintf("%s%s:%s ", branchChar(c), trailer, t.steps[i].String()))
} else {
rows = append(rows, fmt.Sprintf("%s%s:nil ", branchChar(c), trailer))
}
}
return fmt.Sprintf("s%d [%s] ", t.serial, t.label) + strings.Join(rows, "/ ")
}
*/
4 changes: 2 additions & 2 deletions nfa_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ func TestFocusedMerge(t *testing.T) {

for _, shellStyle := range shellStyles {
str := `"` + shellStyle + `"`
automaton, matcher := makeShellStyleAutomaton([]byte(str))
automaton, matcher := makeShellStyleAutomaton([]byte(str), &nullPrinter{})
automata = append(automata, automaton)
matchers = append(matchers, matcher)
}
Expand All @@ -76,7 +76,7 @@ func TestFocusedMerge(t *testing.T) {
s := statsAccum{
fmVisited: make(map[*fieldMatcher]bool),
vmVisited: make(map[*valueMatcher]bool),
stVisited: make(map[any]bool),
stVisited: make(map[*smallTable]bool),
}
faStats(merged, &s)
fmt.Println(s.stStats())
Expand Down
Loading