Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdc: add apache arrow parquet library and writer #99288

Merged
merged 3 commits into from
Apr 6, 2023

Commits on Apr 6, 2023

  1. cdc: add apache arrow parquet library

    This commit installs the apache arrow parquet library for Go
    at version 11. The release can be found here:
    https://github.com/apache/arrow/releases/tag/go%2Fv11.0.0
    
    This library is licensed under the Apache License 2.0.
    
    Informs: cockroachdb#99028
    Epic: None
    Release note: None
    jayshrivastava committed Apr 6, 2023
    Configuration menu
    Copy the full SHA
    905d70b View commit details
    Browse the repository at this point in the history
  2. util/parquet: create parquet writer library

    This change implements a `Writer` struct in the new `util/parquet` package.
    This `Writer` writes datums the `io.Writer` sink
    using a configurable parquet version (defaults to v2.6).
    
    The package implements several features internally required to write in the parquet format:
    - schema creation
    - row group / column page management
    - encoding/decoding of CRDB datums to parquet datums
    Currently, the writer only supports types found in the TPCC workload, namely INT, DECIMAL, STRING
    UUID, TIMESTAMP and BOOL.
    
    This change also adds a benchmark and tests which verify the correctness of the
    writer and test utils for reading datums from parquet files.
    
    Informs: cockroachdb#99028
    Epic: None
    Release note: None
    jayshrivastava committed Apr 6, 2023
    Configuration menu
    Copy the full SHA
    47c8727 View commit details
    Browse the repository at this point in the history
  3. changefeedccl: add parquet writer

    This change adds the file `parquet.go` which contains
    helper functions to help create parquet writers
    and export data via `cdcevent.Row` structs.
    
    This change also adds tests to ensure rows are written
    to parquet files correctly.
    
    Epic: None
    Release note: None
    jayshrivastava committed Apr 6, 2023
    Configuration menu
    Copy the full SHA
    d6acc93 View commit details
    Browse the repository at this point in the history