Wreck is a binary format for efficient streaming, storing and exchanging "vectors" that maximizes efficiency and minimizes overhead. The format annotates "vectors" with unique and sort keys making it possible to store and lookup without doing a full deserialization.
Compactness: Aim to minimize the size of the serialized data. This involves using efficient encoding schemes, and minimizing metadata overhead.
Simplicity: The format is easy to parse and generate using streaming codecs. It avoids overly complex structures and uses only byte sequences.
Efficiency: Ensure that both serialization and deserialization processes are fast. Use fixed-size data types where possible to speed up parsing, and minimize the need for complex computations or lookups.
Cross-platform Compatibility: The format uses only little endian encoding and octet-streams. It avoids any character encodings. Floats are encoded using IEEE 754 binary representation. It ensures the correctness across different platforms and architectures.
The format simplify the implementation through establishing dependencies to external stream codecs in the following aspects. It makes a limitation that codec cannot be used standalone and requires applications to negotiate these parameters.
Security: Use external stream ciphers.
Integrity: Use external streaming error detection and error correction schemas.
Compression: Use external compression. The nature of the data does not allowing extreme gains with compression. Gzip saves only 16% with best compression.
//
// 0x00 : 4 byte | Block Size (L)
// 0x04 : 2 byte | Vector Size (V)
// 0x06 : 2 byte | Sort Key Size (S)
// 0x08 : 1 byte * V | Vector
// 0xZZ : 1 byte * S | Sort Key
// 0xXX : 1 byte * | Unique Key
//
The latest version of the module is available at main
branch. All development, including new features and bug fixes, take place on the main
branch using forking and pull requests as described in contribution guidelines. The stable version is available via Golang modules.
Use go get
to retrieve the library and add it as dependency to your application.
go get -u github.com/kshard/wreck
// Create writer for []float32 vector
w := wreck.NewWriter[float32](out)
// Writer vector
if err := w.Write(uniqueKey, sortKey, vector); err != nil {
// ...
}
// Create scanner for []float32 vector
r := wreck.NewScanner[float32](in)
// Scan through stream
for r.Scan() {
// consume vector
r.UniqueKey()
r.Vector()
}
if err := r.Err(); err != nil {
// ...
}
- Large vector streams
- Output
Writer[T any]
- Input
Scanner[T any]
- Output
- Batching vectors, using JSON as primary protocol
- On-the-wire protocol encoding/decoding with
WriterJSON
andReaderJSON
- Output
Writer[T any]
- Input
Scanner[T any]
- On-the-wire protocol encoding/decoding with
- Transmitting one vector in the packet
- Output
Encoder[T any]
- Input
Decoder[T any]
- Output
The library is MIT licensed and accepts contributions via GitHub pull requests:
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Added some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request
The build and testing process requires Go version 1.21 or later.
The commit message helps us to write a good release note, speed-up review process. The message should address two question what changed and why. The project follows the template defined by chapter Contributing to a Project of Git book.
If you experience any issues with the library, please let us know via GitHub issues. We appreciate detailed and accurate reports that help us to identity and replicate the issue.