Skip to content

Commit

Permalink
Export library code in pkg/ (#1391)
Browse files Browse the repository at this point in the history
* Export library code in `pkg/`

* new doc page
  • Loading branch information
johnkerl authored Sep 10, 2023
1 parent 93b7c8e commit 268a96d
Show file tree
Hide file tree
Showing 358 changed files with 1,076 additions and 693 deletions.
22 changes: 11 additions & 11 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -30,33 +30,33 @@ install: build
# ----------------------------------------------------------------
# Unit tests (small number)
unit-test ut: build
go test github.com/johnkerl/miller/internal/pkg/...
go test github.com/johnkerl/miller/pkg/...

ut-lib:build
go test github.com/johnkerl/miller/internal/pkg/lib...
go test github.com/johnkerl/miller/pkg/lib...
ut-scan:build
go test github.com/johnkerl/miller/internal/pkg/scan/...
go test github.com/johnkerl/miller/pkg/scan/...
ut-mlv:build
go test github.com/johnkerl/miller/internal/pkg/mlrval/...
go test github.com/johnkerl/miller/pkg/mlrval/...
ut-bifs:build
go test github.com/johnkerl/miller/internal/pkg/bifs/...
go test github.com/johnkerl/miller/pkg/bifs/...
ut-input:build
go test github.com/johnkerl/miller/internal/pkg/input/...
go test github.com/johnkerl/miller/pkg/input/...

bench:build
go test -run=nonesuch -bench=. github.com/johnkerl/miller/internal/pkg/...
go test -run=nonesuch -bench=. github.com/johnkerl/miller/pkg/...
bench-mlv:build
go test -run=nonesuch -bench=. github.com/johnkerl/miller/internal/pkg/mlrval/...
go test -run=nonesuch -bench=. github.com/johnkerl/miller/pkg/mlrval/...
bench-input:build
go test -run=nonesuch -bench=. github.com/johnkerl/miller/internal/pkg/input/...
go test -run=nonesuch -bench=. github.com/johnkerl/miller/pkg/input/...

# ----------------------------------------------------------------
# Regression tests (large number)
#
# See ./regression_test.go for information on how to get more details
# for debugging. TL;DR is for CI jobs, we have 'go test -v'; for
# interactive use, instead of 'go test -v' simply use 'mlr regtest
# -vvv' or 'mlr regtest -s 20'. See also internal/pkg/terminals/regtest.
# -vvv' or 'mlr regtest -s 20'. See also pkg/terminals/regtest.
regression-test: build
go test -v regression_test.go

Expand All @@ -65,7 +65,7 @@ regression-test: build
# go fmt ./... finds experimental C files which we want to ignore.
fmt format:
-go fmt ./cmd/...
-go fmt ./internal/pkg/...
-go fmt ./pkg/...
-go fmt ./regression_test.go

# ----------------------------------------------------------------
Expand Down
50 changes: 25 additions & 25 deletions README-dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,10 +61,10 @@ During the coding of Miller, I've been guided by the following:
* Names of files, variables, functions, etc. should be fully spelled out (e.g. `NewEvaluableLeafNode`), except for a small number of most-used names where a longer name would cause unnecessary line-wraps (e.g. `Mlrval` instead of `MillerValue` since this appears very very often).
* Code should not be too clever. This includes some reasonable amounts of code duplication from time to time, to keep things inline, rather than lasagna code.
* Things should be transparent. For example, the `-v` in `mlr -n put -v '$y = 3 + 0.1 * $x'` shows you the abstract syntax tree derived from the DSL expression.
* Comments should be robust with respect to reasonably anticipated changes. For example, one package should cross-link to another in its comments, but I try to avoid mentioning specific filenames too much in the comments and README files since these may change over time. I make an exception for stable points such as [cmd/mlr/main.go](./cmd/mlr/main.go), [mlr.bnf](./internal/pkg/parsing/mlr.bnf), [stream.go](./internal/pkg/stream/stream.go), etc.
* Comments should be robust with respect to reasonably anticipated changes. For example, one package should cross-link to another in its comments, but I try to avoid mentioning specific filenames too much in the comments and README files since these may change over time. I make an exception for stable points such as [cmd/mlr/main.go](./cmd/mlr/main.go), [mlr.bnf](./pkg/parsing/mlr.bnf), [stream.go](./pkg/stream/stream.go), etc.
* *Miller should be pleasant to write.*
* It should be quick to answer the question *Did I just break anything?* -- hence `mlr regtest` functionality.
* It should be quick to find out what to do next as you iteratively develop -- see for example [cst/README.md](./internal/pkg/dsl/cst/README.md).
* It should be quick to find out what to do next as you iteratively develop -- see for example [cst/README.md](./pkg/dsl/cst/README.md).
* *The language should be an asset, not a liability.*
* One of the reasons I chose Go is that (personally anyway) I find it to be reasonably efficient, well-supported with standard libraries, straightforward, and fun. I hope you enjoy it as much as I have.

Expand All @@ -83,10 +83,10 @@ sequence of key-value pairs. The basic **stream** operation is:

So, in broad overview, the key packages are:

* [internal/pkg/stream](./internal/pkg/stream) -- connect input -> transforms -> output via Go channels
* [internal/pkg/input](./internal/pkg/input) -- read input records
* [internal/pkg/transformers](./internal/pkg/transformers) -- transform input records to output records
* [internal/pkg/output](./internal/pkg/output) -- write output records
* [pkg/stream](./pkg/stream) -- connect input -> transforms -> output via Go channels
* [pkg/input](./pkg/input) -- read input records
* [pkg/transformers](./pkg/transformers) -- transform input records to output records
* [pkg/output](./pkg/output) -- write output records
* The rest are details to support this.

## Directory-structure details
Expand All @@ -98,30 +98,30 @@ So, in broad overview, the key packages are:
* This package defines the grammar for Miller's domain-specific language (DSL) for the Miller `put` and `filter` verbs. And, GOCC is a joy to use. :)
* It is used on the terms of its open-source license.
* [golang.org/x/term](https://pkg.go.dev/golang.org/x/term):
* Just a one-line Miller callsite for is-a-terminal checking for the [Miller REPL](./internal/pkg/terminals/repl/README.md).
* Just a one-line Miller callsite for is-a-terminal checking for the [Miller REPL](./pkg/terminals/repl/README.md).
* It is used on the terms of its open-source license.
* See also [./go.mod](go.mod). Setup:
* `go get github.com/goccmack/gocc`
* `go get golang.org/x/term`

### Miller per se

* The main entry point is [cmd/mlr/main.go](./cmd/mlr/main.go); everything else in [internal/pkg](./internal/pkg).
* [internal/pkg/entrypoint](./internal/pkg/entrypoint): All the usual contents of `main()` are here, for ease of testing.
* [internal/pkg/platform](./internal/pkg/platform): Platform-dependent code, which as of early 2021 is the command-line parser. Handling single quotes and double quotes is different on Windows unless particular care is taken, which is what this package does.
* [internal/pkg/lib](./internal/pkg/lib):
* Implementation of the [`Mlrval`](./internal/pkg/types/mlrval.go) datatype which includes string/int/float/boolean/void/absent/error types. These are used for record values, as well as expression/variable values in the Miller `put`/`filter` DSL. See also below for more details.
* [`Mlrmap`](./internal/pkg/types/mlrmap.go) is the sequence of key-value pairs which represents a Miller record. The key-lookup mechanism is optimized for Miller read/write usage patterns -- please see [mlrmap.go](./internal/pkg/types/mlrmap.go) for more details.
* [`context`](./internal/pkg/types/context.go) supports AWK-like variables such as `FILENAME`, `NF`, `NR`, and so on.
* [internal/pkg/cli](./internal/pkg/cli) is the flag-parsing logic for supporting Miller's command-line interface. When you type something like `mlr --icsv --ojson put '$sum = $a + $b' then filter '$sum > 1000' myfile.csv`, it's the CLI parser which makes it possible for Miller to construct a CSV record-reader, a transformer-chain of `put` then `filter`, and a JSON record-writer.
* [internal/pkg/climain](./internal/pkg/climain) contains a layer which invokes `internal/pkg/cli`, which was split out to avoid a Go package-import cycle.
* [internal/pkg/stream](./internal/pkg/stream) is as above -- it uses Go channels to pipe together file-reads, to record-reading/parsing, to a chain of record-transformers, to record-writing/formatting, to terminal standard output.
* [internal/pkg/input](./internal/pkg/input) is as above -- one record-reader type per supported input file format, and a factory method.
* [internal/pkg/output](./internal/pkg/output) is as above -- one record-writer type per supported output file format, and a factory method.
* [internal/pkg/transformers](./internal/pkg/transformers) contains the abstract record-transformer interface datatype, as well as the Go-channel chaining mechanism for piping one transformer into the next. It also contains all the concrete record-transformers such as `cat`, `tac`, `sort`, `put`, and so on.
* [internal/pkg/parsing](./internal/pkg/parsing) contains a single source file, `mlr.bnf`, which is the lexical/semantic grammar file for the Miller `put`/`filter` DSL using the GOCC framework. All subdirectories of `internal/pkg/parsing/` are autogen code created by GOCC's processing of `mlr.bnf`. If you need to edit `mlr.bnf`, please use [tools/build-dsl](./tools/build-dsl) to autogenerate Go code from it (using the GOCC tool). (This takes several minutes to run.)
* [internal/pkg/dsl](./internal/pkg/dsl) contains [`ast_types.go`](internal/pkg/dsl/ast_types.go) which is the abstract syntax tree datatype shared between GOCC and Miller. I didn't use a `internal/pkg/dsl/ast` naming convention, although that would have been nice, in order to avoid a Go package-dependency cycle.
* [internal/pkg/dsl/cst](./internal/pkg/dsl/cst) is the concrete syntax tree, constructed from an AST produced by GOCC. The CST is what is actually executed on every input record when you do things like `$z = $x * 0.3 * $y`. Please see the [internal/pkg/dsl/cst/README.md](./internal/pkg/dsl/cst/README.md) for more information.
* The main entry point is [cmd/mlr/main.go](./cmd/mlr/main.go); everything else in [pkg](./pkg).
* [pkg/entrypoint](./pkg/entrypoint): All the usual contents of `main()` are here, for ease of testing.
* [pkg/platform](./pkg/platform): Platform-dependent code, which as of early 2021 is the command-line parser. Handling single quotes and double quotes is different on Windows unless particular care is taken, which is what this package does.
* [pkg/lib](./pkg/lib):
* Implementation of the [`Mlrval`](./pkg/types/mlrval.go) datatype which includes string/int/float/boolean/void/absent/error types. These are used for record values, as well as expression/variable values in the Miller `put`/`filter` DSL. See also below for more details.
* [`Mlrmap`](./pkg/types/mlrmap.go) is the sequence of key-value pairs which represents a Miller record. The key-lookup mechanism is optimized for Miller read/write usage patterns -- please see [mlrmap.go](./pkg/types/mlrmap.go) for more details.
* [`context`](./pkg/types/context.go) supports AWK-like variables such as `FILENAME`, `NF`, `NR`, and so on.
* [pkg/cli](./pkg/cli) is the flag-parsing logic for supporting Miller's command-line interface. When you type something like `mlr --icsv --ojson put '$sum = $a + $b' then filter '$sum > 1000' myfile.csv`, it's the CLI parser which makes it possible for Miller to construct a CSV record-reader, a transformer-chain of `put` then `filter`, and a JSON record-writer.
* [pkg/climain](./pkg/climain) contains a layer which invokes `pkg/cli`, which was split out to avoid a Go package-import cycle.
* [pkg/stream](./pkg/stream) is as above -- it uses Go channels to pipe together file-reads, to record-reading/parsing, to a chain of record-transformers, to record-writing/formatting, to terminal standard output.
* [pkg/input](./pkg/input) is as above -- one record-reader type per supported input file format, and a factory method.
* [pkg/output](./pkg/output) is as above -- one record-writer type per supported output file format, and a factory method.
* [pkg/transformers](./pkg/transformers) contains the abstract record-transformer interface datatype, as well as the Go-channel chaining mechanism for piping one transformer into the next. It also contains all the concrete record-transformers such as `cat`, `tac`, `sort`, `put`, and so on.
* [pkg/parsing](./pkg/parsing) contains a single source file, `mlr.bnf`, which is the lexical/semantic grammar file for the Miller `put`/`filter` DSL using the GOCC framework. All subdirectories of `pkg/parsing/` are autogen code created by GOCC's processing of `mlr.bnf`. If you need to edit `mlr.bnf`, please use [tools/build-dsl](./tools/build-dsl) to autogenerate Go code from it (using the GOCC tool). (This takes several minutes to run.)
* [pkg/dsl](./pkg/dsl) contains [`ast_types.go`](pkg/dsl/ast_types.go) which is the abstract syntax tree datatype shared between GOCC and Miller. I didn't use a `pkg/dsl/ast` naming convention, although that would have been nice, in order to avoid a Go package-dependency cycle.
* [pkg/dsl/cst](./pkg/dsl/cst) is the concrete syntax tree, constructed from an AST produced by GOCC. The CST is what is actually executed on every input record when you do things like `$z = $x * 0.3 * $y`. Please see the [pkg/dsl/cst/README.md](./pkg/dsl/cst/README.md) for more information.

## Nil-record conventions

Expand Down Expand Up @@ -153,15 +153,15 @@ nil through the reader/transformer/writer sequence.

## More about mlrvals

[`Mlrval`](./internal/pkg/types/mlrval.go) is the datatype of record values, as well as expression/variable values in the Miller `put`/`filter` DSL. It includes string/int/float/boolean/void/absent/error types, not unlike PHP's `zval`.
[`Mlrval`](./pkg/types/mlrval.go) is the datatype of record values, as well as expression/variable values in the Miller `put`/`filter` DSL. It includes string/int/float/boolean/void/absent/error types, not unlike PHP's `zval`.

* Miller's `absent` type is like Javascript's `undefined` -- it's for times when there is no such key, as in a DSL expression `$out = $foo` when the input record is `$x=3,y=4` -- there is no `$foo` so `$foo` has `absent` type. Nothing is written to the `$out` field in this case. See also [here](https://miller.readthedocs.io/en/latest/reference-main-null-data) for more information.
* Miller's `void` type is like Javascript's `null` -- it's for times when there is a key with no value, as in `$out = $x` when the input record is `$x=,$y=4`. This is an overlap with `string` type, since a void value looks like an empty string. I've gone back and forth on this (including when I was writing the C implementation) -- whether to retain `void` as a distinct type from empty-string, or not. I ended up keeping it as it made the `Mlrval` logic easier to understand.
* Miller's `error` type is for things like doing type-uncoerced addition of strings. Data-dependent errors are intended to result in `(error)`-valued output, rather than crashing Miller. See also [here](https://miller.readthedocs.io/en/latest/reference-main-data-types) for more information.
* Miller's number handling makes auto-overflow from int to float transparent, while preserving the possibility of 64-bit bitwise arithmetic.
* This is different from JavaScript, which has only double-precision floats and thus no support for 64-bit numbers (note however that there is now [`BigInt`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt)).
* This is also different from C and Go, wherein casts are necessary -- without which int arithmetic overflows.
* See also [here](https://miller.readthedocs.io/en/latest/reference-main-arithmetic) for the semantics of Miller arithmetic, which the [`Mlrval`](./internal/pkg/types/mlrval.go) class implements.
* See also [here](https://miller.readthedocs.io/en/latest/reference-main-arithmetic) for the semantics of Miller arithmetic, which the [`Mlrval`](./pkg/types/mlrval.go) class implements.

## Performance optimizations

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ See also [building from source](https://miller.readthedocs.io/en/latest/build.ht
* You can do `./configure --prefix=/some/install/path` before `make install` if you want to install somewhere other than `/usr/local`.
* Without `make`:
* To build: `go build github.com/johnkerl/miller/cmd/mlr`.
* To run tests: `go test github.com/johnkerl/miller/internal/pkg/...` and `mlr regtest`.
* To run tests: `go test github.com/johnkerl/miller/pkg/...` and `mlr regtest`.
* To install: `go install github.com/johnkerl/miller/cmd/mlr` will install to _GOPATH_`/bin/mlr`.
* See also the doc page on [building from source](https://miller.readthedocs.io/en/latest/build).
* For more developer information please see [README-dev.md](./README-dev.md).
Expand Down
2 changes: 1 addition & 1 deletion cmd/experiments/colors/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ package main

import (
"fmt"
"github.com/johnkerl/miller/internal/pkg/colorizer"
"github.com/johnkerl/miller/pkg/colorizer"
)

const boldString = "\u001b[1m"
Expand Down
2 changes: 1 addition & 1 deletion cmd/mlr/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ import (
"strings"
"time"

"github.com/johnkerl/miller/internal/pkg/entrypoint"
"github.com/johnkerl/miller/pkg/entrypoint"
"github.com/pkg/profile" // for trace.out
)

Expand Down
2 changes: 1 addition & 1 deletion cmd/scan/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ import (
"fmt"
"os"

"github.com/johnkerl/miller/internal/pkg/scan"
"github.com/johnkerl/miller/pkg/scan"
)

func main() {
Expand Down
2 changes: 1 addition & 1 deletion cmd/sizes/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ package main
import (
"fmt"

"github.com/johnkerl/miller/internal/pkg/mlrval"
"github.com/johnkerl/miller/pkg/mlrval"
)

func main() {
Expand Down
1 change: 1 addition & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ nav:
- "Auxiliary commands": "reference-main-auxiliary-commands.md"
- "Manual page": "manpage.md"
- "Building from source": "build.md"
- "Miller as a library": "miller-as-library.md"
- "How to create a new release": "how-to-release.md"
- "Documents for previous releases": "release-docs.md"
- "Glossary": "glossary.md"
Expand Down
2 changes: 1 addition & 1 deletion docs/src/build.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Two-clause BSD license [https://github.com/johnkerl/miller/blob/master/LICENSE.t
* `make` creates the `./mlr` (or `.\mlr.exe` on Windows) executable
* Without `make`: `go build github.com/johnkerl/miller/cmd/mlr`
* `make check` runs tests
* Without `make`: `go test github.com/johnkerl/miller/internal/pkg/...` and `mlr regtest`
* Without `make`: `go test github.com/johnkerl/miller/pkg/...` and `mlr regtest`
* `make install` installs the `mlr` executable and the `mlr` manpage
* Without make: `go install github.com/johnkerl/miller/cmd/mlr` will install to _GOPATH_`/bin/mlr`

Expand Down
2 changes: 1 addition & 1 deletion docs/src/build.md.in
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Two-clause BSD license [https://github.com/johnkerl/miller/blob/master/LICENSE.t
* `make` creates the `./mlr` (or `.\mlr.exe` on Windows) executable
* Without `make`: `go build github.com/johnkerl/miller/cmd/mlr`
* `make check` runs tests
* Without `make`: `go test github.com/johnkerl/miller/internal/pkg/...` and `mlr regtest`
* Without `make`: `go test github.com/johnkerl/miller/pkg/...` and `mlr regtest`
* `make install` installs the `mlr` executable and the `mlr` manpage
* Without make: `go install github.com/johnkerl/miller/cmd/mlr` will install to _GOPATH_`/bin/mlr`

Expand Down
4 changes: 2 additions & 2 deletions docs/src/how-to-release.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ In this example I am using version 6.2.0 to 6.3.0; of course that will change fo

* Update version found in `mlr --version` and `man mlr`:

* Edit `internal/pkg/version/version.go` from `6.2.0-dev` to `6.3.0`.
* Edit `pkg/version/version.go` from `6.2.0-dev` to `6.3.0`.
* Edit `miller.spec`: `Version`, and `changelog` entry
* Run `make dev` in the Miller repo base directory
* The ordering in this makefile rule is important: the first build creates `mlr`; the second runs `mlr` to create `manpage.txt`; the third includes `manpage.txt` into one of its outputs.
Expand Down Expand Up @@ -69,6 +69,6 @@ In this example I am using version 6.2.0 to 6.3.0; of course that will change fo

* Afterwork:

* Edit `internal/pkg/version/version.go` to change version from `6.3.0` to `6.3.0-dev`.
* Edit `pkg/version/version.go` to change version from `6.3.0` to `6.3.0-dev`.
* `make dev`
* Commit and push.
Loading

0 comments on commit 268a96d

Please sign in to comment.