Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/stanza] Fully decompose the tokenize package #26241

Conversation

djaglowski
Copy link
Member

@djaglowski djaglowski commented Aug 28, 2023

Major additional refactoring, following #26040.

I will peel off several smaller PRs while working towards this overall set of changes.

  • Breaks apart the tokenize package:
    • Deletes SplitterConfig and instead makes its fields more composable.
    • Pulls all flushing concerns into a new flush package which can apply a stateful time-based flush behavior to any bufio.SplitFunc.
      • Adds dedicated test coverage for this package.
    • This leaves only the Multline struct in the tokenize package, so renames the package to split and the config to split.Config.
      • Enhances test coverage for this package.

@djaglowski djaglowski changed the title Pkg stanza tokenize simplify 2 [pkg/stanza] Fully decompose the tokenize package Aug 28, 2023
@djaglowski djaglowski force-pushed the pkg-stanza-tokenize-simplify-2 branch from 570c03e to cc8d5a6 Compare August 28, 2023 20:02
@github-actions github-actions bot requested a review from atoulme August 28, 2023 20:03
@djaglowski djaglowski force-pushed the pkg-stanza-tokenize-simplify-2 branch from cc8d5a6 to b7c82bc Compare August 30, 2023 20:12
djaglowski added a commit that referenced this pull request Aug 31, 2023
This just changes a very recently added package name. I think it matches
typical naming conventions better, especially when viewed as part of
#26241.
@djaglowski djaglowski force-pushed the pkg-stanza-tokenize-simplify-2 branch 2 times, most recently from f7be577 to fa3b87e Compare September 7, 2023 18:00
djaglowski added a commit that referenced this pull request Sep 8, 2023
Subset of #26241

This finally removes the `SplitterConfig` struct, which at this point is
only a wrapper around `MultilineConfig`.
djaglowski added a commit that referenced this pull request Sep 11, 2023
Subset of #26241 

Follows #26540 

- Rename `MultilineConfig` to `split.Config`
- Remove `Multiline`, previously a struct representation that only
wrapped a split func
- Remove `NewMultilineConfig`, because `split.Config` is just two simple
fields with "" defaults.
- Condense references in tests
- Substantially increate test coverage in `split` package.
djaglowski added a commit that referenced this pull request Sep 12, 2023
Another few remaining parts of #26241. This basically follows up on
renaming the `multiline` package by renaming a few remaining to
"multiline" throughout the codebase.
@djaglowski
Copy link
Member Author

This has been broken down into several other PRs. What little remains may be picked up in future PRs.

@djaglowski djaglowski closed this Sep 12, 2023
djaglowski added a commit that referenced this pull request Sep 13, 2023
Follows
#26241

Previously, split funcs were responsible for applying trim funcs. This
PR increases composability by applying trim funcs as a wrapper around
split funcs.

One nuance that was surfaced here is that the newline split func was not
handling the case where a line starts with a newline. When this happens,
we need to tell the scanner to advance, but we still want to return a
`""` token, rather than nil. This is covered by existing tests, but
previously it was "fixed" by the trim func which would return an empty
slice when the token was nil. Now, the newline split func will
explicitly handle this case, while the trim func will return the
original value if it is nil or empty.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant