Skip to content

Latest commit

 

History

History
266 lines (203 loc) · 17.3 KB

CHANGES.md

File metadata and controls

266 lines (203 loc) · 17.3 KB

Parquet

Version 2.10.0

New Feature

  • PARQUET-758 - Add Float16/Half-float logical type
  • PARQUET-2261 - Add statistics for better estimating unencoded/uncompressed sizes and finer grained filtering

Improvement

Task

  • Document dictionary page position
  • Fix broken link for Plain Boolean
  • Fix typo under "Unsigned Integers"
  • MINOR: Add FIXED_LEN_BYTE_ARRAY Type
  • MINOR: Fix typo in parquet.thrift
  • MINOR: Fix typo in PageIndex.md

Bug

Version 2.9.0

Bug

  • PARQUET-1862 - Fix comment on statistics field in Thrift file
  • PARQUET-2011 - Update the doc for data types having parameters as precision instead of unit

Improvement

Task

  • PARQUET-1777 - Add Parquet logo vector files to repo
  • PARQUET-2013 - [Format] Mention that converted types are deprecated

Version 2.8.0

New Feature

Improvement

  • PARQUET-1672 - [DOC] Broken link to "How To Contribute" section in Parquet-MR project
  • PARQUET-1708 - Fix Thrift compiler warning

Task

Version 2.7.0

Sub-task

Bug

  • PARQUET-1437 - Misleading comment in parquet.thrift
  • PARQUET-1554 - Compilation error when upgrading Scrooge version
  • PARQUET-1561 - Inconsistencies in the Parquet Delta Encoding specification

New Feature

Improvement

Task

  • PARQUET-1433 - Parquet-format doesn't compile with Thrift 0.10.0
  • PARQUET-1572 - Clarify the definition of timestamp types
  • PARQUET-1585 - Update old external links in the code base
  • PARQUET-1627 - Update specification so that legacy timestamp logical types can be written for local semantics as well

Version 2.6.0

Bug

  • PARQUET-1266 - LogicalTypes union in parquet-format doesn't include UUID

Improvement

  • PARQUET-1290 - Clarify maximum run lengths for RLE encoding
  • PARQUET-1387 - Nanosecond precision time and timestamp - parquet-format
  • PARQUET-1400 - Deprecate parquet-mr related code in parquet-format

Task

Version 2.5.0

Bug

  • PARQUET-323 - INT96 should be marked as deprecated
  • PARQUET-1064 - Deprecate type-defined sort ordering for INTERVAL type
  • PARQUET-1065 - Deprecate type-defined sort ordering for INT96 type
  • PARQUET-1145 - Add license to .gitignore and .travis.yml
  • PARQUET-1156 - dev/merge_parquet_pr.py problems
  • PARQUET-1236 - Upgrade org.slf4j:slf4j-api:1.7.2 to 1.7.12
  • PARQUET-1242 - parquet.thrift refers to wrong releases for the new compressions
  • PARQUET-1251 - Clarify ambiguous min/max stats for FLOAT/DOUBLE
  • PARQUET-1258 - Update scm developer connection to github

New Feature

Improvement

Task

Version 2.4.0

Bug

  • PARQUET-255 - Typo in decimal type specification
  • PARQUET-322 - Document ENUM as a logical type
  • PARQUET-412 - Format: Do not shade slf4j-api
  • PARQUET-419 - Update dev script in parquet-cpp to remove incubator.
  • PARQUET-655 - The LogicalTypes.md link in README.md points to the old Parquet GitHub repository
  • PARQUET-1031 - Fix spelling errors, whitespace, GitHub urls
  • PARQUET-1032 - Change link in Encodings.md for variable length encoding
  • PARQUET-1050 - The comment of Parquet Format Thrift definition file error
  • PARQUET-1076 - [Format] Switch to long key ids in KEYs file
  • PARQUET-1091 - Wrong and broken links in README
  • PARQUET-1102 - Travis CI builds are failing for parquet-format PRs
  • PARQUET-1134 - Release Parquet format 2.4.0
  • PARQUET-1136 - Makefile is broken

Improvement

  • PARQUET-371 - Bumps Thrift version to 0.9.3
  • PARQUET-407 - Incorrect delta-encoding example
  • PARQUET-428 - Support INT96 and FIXED_LEN_BYTE_ARRAY types
  • PARQUET-601 - Add support in Parquet to configure the encoding used by ValueWriters
  • PARQUET-609 - Add Brotli compression to Parquet format
  • PARQUET-757 - Add NULL type to Bring Parquet logical types to par with Arrow
  • PARQUET-804 - parquet-format README.md still links to the old Google group
  • PARQUET-922 - Add index pages to the format to support efficient page skipping
  • PARQUET-1049 - Make thrift version a property in pom.xml

Task

  • PARQUET-450 - Small typos/issues in parquet-format documentation
  • PARQUET-667 - Update committers lists to point to apache website
  • PARQUET-1124 - Add new compression codecs to the Parquet spec
  • PARQUET-1125 - Add UUID logical type

Version 2.2.0

Version 2.1.0

  • ISSUE 84: Add metadata in the schema for storing decimals.
  • ISSUE 89: Added statistics to the data page header
  • ISSUE 86: Fix minor formatting, correct some wording under the "Error recovery" se...
  • ISSUE 82: exclude thrift source from jar
  • ISSUE 80: Upgrade maven-shade-plugin to 2.1 to compile with mvn 3.1.1

Version 2.0.0

  • ISSUE 79: Reorganize encodings and add details
  • ISSUE 78: Added sorted flag to dictionary page headers.
  • ISSUE 77: fix plugin versions
  • ISSUE 75: refactor dictionary encoding
  • ISSUE 64: new data page and stats
  • ISSUE 74: deprecate and remove group_var_int encoding
  • ISSUE 76: add mention of boolean on RLE
  • ISSUE 73: reformat encodings
  • ISSUE 71: refactor documentation for 2.0 encodings
  • ISSUE 66: Block strings
  • ISSUE 67: Add ENUM ConvertedType
  • ISSUE 58: Correct unterminated comment for SortingColumn.
  • ISSUE 51: Add metadata to specify row groups are sorted.

Version 1.0.0

  • ISSUE 46: Update readme to include 4 byte length in rle columns
  • ISSUE 47: fixed typo in readme.md
  • ISSUE 45: Typo in describing preferred row group size
  • ISSUE 43: add dictionary encoding details
  • ISSUE 41: Update readme with details about RLE encoding
  • ISSUE 39: Added created_by optional file metadata.
  • ISSUE 40: add details about the page size fields
  • ISSUE 35: this embeds and renames the thrift dependency in the jar, allowing people to use a different version of thrift in parallel
  • ISSUE 36: adding the encoding to the dictionary page
  • ISSUE 34: Corrected typo
  • ISSUE 32: Add layout diagram to README and fix typo
  • ISSUE 31: Restore encoding changes