Skip to content

Commit

Permalink
Merge branch 'main' into readme
Browse files Browse the repository at this point in the history
  • Loading branch information
rtyler authored Sep 15, 2023
2 parents fa149f1 + 9d1857d commit a44f148
Show file tree
Hide file tree
Showing 118 changed files with 6,365 additions and 2,597 deletions.
9 changes: 4 additions & 5 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,7 @@ jobs:
runs-on: ${{ matrix.os }}
env:
# Disable full debug symbol generation to speed up CI build and keep memory down
# "1" means line tables only, which is useful for panic tracebacks.
RUSTFLAGS: -C debuginfo=1
RUSTFLAGS: -C debuginfo=line-tables-only
# Disable incremental builds by cargo for CI which should save disk space
# and hopefully avoid final link "No space left on device"
CARGO_INCREMENTAL: 0
Expand All @@ -94,8 +93,8 @@ jobs:
env:
CARGO_INCREMENTAL: 0
# Disable full debug symbol generation to speed up CI build and keep memory down
# "1" means line tables only, which is useful for panic tracebacks.
RUSTFLAGS: "-C debuginfo=1"
# <https://doc.rust-lang.org/cargo/reference/profiles.html>
RUSTFLAGS: "-C debuginfo=line-tables-only"
# https://github.com/rust-lang/cargo/issues/10280
CARGO_NET_GIT_FETCH_WITH_CLI: "true"
RUST_BACKTRACE: "1"
Expand Down Expand Up @@ -149,7 +148,7 @@ jobs:
parquet2_test:
runs-on: ubuntu-latest
env:
RUSTFLAGS: "-C debuginfo=0"
RUSTFLAGS: "-C debuginfo=line-tables-only"
CARGO_INCREMENTAL: 0

steps:
Expand Down
16 changes: 9 additions & 7 deletions .github/workflows/python_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ jobs:
name: Python Build (Python 3.7 PyArrow 8.0.0)
runs-on: ubuntu-latest
env:
RUSTFLAGS: "-C debuginfo=0"
RUSTFLAGS: "-C debuginfo=line-tables-only"
CARGO_INCREMENTAL: 0

# use the same environment we have for python release
Expand All @@ -66,14 +66,16 @@ jobs:
- name: Build and install deltalake
run: |
# Needed for openssl build
yum install -y perl-IPC-Cmd
pip install virtualenv
virtualenv venv
source venv/bin/activate
make setup
# Install minimum PyArrow version
pip install -e .[pandas,devel] pyarrow==8.0.0
env:
RUSTFLAGS: "-C debuginfo=0"
RUSTFLAGS: "-C debuginfo=line-tables-only"

- name: Run tests
run: |
Expand Down Expand Up @@ -123,13 +125,13 @@ jobs:
- name: Run tests
run: |
source venv/bin/activate
python -m pytest -m '((s3 or azure) and integration) or not integration'
python -m pytest -m '((s3 or azure) and integration) or not integration and not benchmark'
- name: Test without pandas
run: |
source venv/bin/activate
pip uninstall --yes pandas
python -m pytest -m "not pandas and not integration"
python -m pytest -m "not pandas and not integration and not benchmark"
pip install pandas
- name: Build Sphinx documentation
Expand All @@ -141,7 +143,7 @@ jobs:
name: Python Benchmark
runs-on: ubuntu-latest
env:
RUSTFLAGS: "-C debuginfo=0"
RUSTFLAGS: "-C debuginfo=line-tables-only"
CARGO_INCREMENTAL: 0

steps:
Expand Down Expand Up @@ -191,7 +193,7 @@ jobs:
name: PySpark Integration Tests
runs-on: ubuntu-latest
env:
RUSTFLAGS: "-C debuginfo=0"
RUSTFLAGS: "-C debuginfo=line-tables-only"
CARGO_INCREMENTAL: 0

steps:
Expand Down Expand Up @@ -231,7 +233,7 @@ jobs:
name: Running with Python ${{ matrix.python-version }}
runs-on: ubuntu-latest
env:
RUSTFLAGS: "-C debuginfo=0"
RUSTFLAGS: "-C debuginfo=line-tables-only"
CARGO_INCREMENTAL: 0

strategy:
Expand Down
12 changes: 12 additions & 0 deletions .github/workflows/python_release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,14 @@ jobs:
steps:
- uses: actions/checkout@v3

# We use extra recent Cargo.toml syntax, so we need at least Rust 1.71.0
- name: Install newer rust
uses: actions-rs/toolchain@v1
with:
profile: default
toolchain: stable
override: true

- name: Publish to pypi (without sdist)
uses: messense/maturin-action@v1
env:
Expand Down Expand Up @@ -95,6 +103,8 @@ jobs:
target: x86_64-unknown-linux-gnu
command: publish
args: --skip-existing -m python/Cargo.toml ${{ env.FEATURES_FLAG }}
# for openssl build
before-script-linux: yum install -y perl-IPC-Cmd

- name: Publish manylinux to pypi aarch64 (without sdist)
uses: messense/maturin-action@v1
Expand All @@ -104,6 +114,8 @@ jobs:
target: aarch64-unknown-linux-gnu
command: publish
args: --skip-existing -m python/Cargo.toml --no-sdist ${{ env.FEATURES_FLAG }}
# for openssl build
before-script-linux: yum install -y perl-IPC-Cmd

release-docs:
needs:
Expand Down
57 changes: 57 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,62 @@
# Changelog

## [rust-v0.15.0](https://github.com/delta-io/delta-rs/tree/rust-v0.15.0) (2023-09-06)

[Full Changelog](https://github.com/delta-io/delta-rs/compare/rust-v0.14.0...rust-v0.15.0)

**Implemented enhancements:**

- Configurable number of retries for transaction commit loop [\#1595](https://github.com/delta-io/delta-rs/issues/1595)

**Fixed bugs:**

- Unable to read table using VM Managed Identity on Azure [\#1462](https://github.com/delta-io/delta-rs/issues/1462)
- Unable to query by partition column [\#1445](https://github.com/delta-io/delta-rs/issues/1445)

**Merged pull requests:**

- fix: update python test [\#1608](https://github.com/delta-io/delta-rs/pull/1608) ([wjones127](https://github.com/wjones127))
- chore: update datafusion to 30, arrow to 45 [\#1606](https://github.com/delta-io/delta-rs/pull/1606) ([scsmithr](https://github.com/scsmithr))
- fix: just make pyarrow 12 the max [\#1603](https://github.com/delta-io/delta-rs/pull/1603) ([wjones127](https://github.com/wjones127))
- fix: support partial statistics in JSON [\#1599](https://github.com/delta-io/delta-rs/pull/1599) ([CurtHagenlocher](https://github.com/CurtHagenlocher))
- feat: allow configurable number of `commit` attempts [\#1596](https://github.com/delta-io/delta-rs/pull/1596) ([cmackenzie1](https://github.com/cmackenzie1))
- fix: querying on date partitions \(fixes \#1445\) [\#1594](https://github.com/delta-io/delta-rs/pull/1594) ([watfordkcf](https://github.com/watfordkcf))
- refactor: clean up arrow schema defs [\#1590](https://github.com/delta-io/delta-rs/pull/1590) ([polynomialherder](https://github.com/polynomialherder))
- feat: add metadata for operations::write::WriteBuilder [\#1584](https://github.com/delta-io/delta-rs/pull/1584) ([abhimanyusinghgaur](https://github.com/abhimanyusinghgaur))
- feat: add metadata for deletion vectors [\#1583](https://github.com/delta-io/delta-rs/pull/1583) ([aersam](https://github.com/aersam))
- fix: remove alpha classifier [\#1578](https://github.com/delta-io/delta-rs/pull/1578) ([marcelotrevisani](https://github.com/marcelotrevisani))
- refactor: use pa.table.cast in delta\_arrow\_schema\_from\_pandas [\#1573](https://github.com/delta-io/delta-rs/pull/1573) ([ion-elgreco](https://github.com/ion-elgreco))

## [rust-v0.14.0](https://github.com/delta-io/delta-rs/tree/rust-v0.14.0) (2023-08-01)

[Full Changelog](https://github.com/delta-io/delta-rs/compare/rust-v0.13.0...rust-v0.14.0)

**Implemented enhancements:**

- Define common dependencies in Cargo Workspace [\#1572](https://github.com/delta-io/delta-rs/issues/1572)
- Make `delta_datafusion::find_files` public [\#1559](https://github.com/delta-io/delta-rs/issues/1559)

**Fixed bugs:**

- Excessive integration test sizes causing builds to fail [\#1550](https://github.com/delta-io/delta-rs/issues/1550)
- Slack invite link is not working [\#1530](https://github.com/delta-io/delta-rs/issues/1530)

**Merged pull requests:**

- fix: correct whitespace in delta protocol reader minimum version error message [\#1576](https://github.com/delta-io/delta-rs/pull/1576) ([polynomialherder](https://github.com/polynomialherder))
- chore: move deps to `[workspace.dependencies]` [\#1575](https://github.com/delta-io/delta-rs/pull/1575) ([cmackenzie1](https://github.com/cmackenzie1))
- chore: update `datafusion` to `28` and arrow to `43` [\#1571](https://github.com/delta-io/delta-rs/pull/1571) ([cmackenzie1](https://github.com/cmackenzie1))
- ci: don't run benchmark in debug mode [\#1566](https://github.com/delta-io/delta-rs/pull/1566) ([wjones127](https://github.com/wjones127))
- ci: install newer rust for macos python release [\#1565](https://github.com/delta-io/delta-rs/pull/1565) ([wjones127](https://github.com/wjones127))
- feat: make find\_files public [\#1560](https://github.com/delta-io/delta-rs/pull/1560) ([yjshen](https://github.com/yjshen))
- feat!: bulk delete for vacuum [\#1556](https://github.com/delta-io/delta-rs/pull/1556) ([Blajda](https://github.com/Blajda))
- chore: address some integration test bloat of disk usage for development [\#1552](https://github.com/delta-io/delta-rs/pull/1552) ([rtyler](https://github.com/rtyler))
- docs: port docs to mkdocs [\#1548](https://github.com/delta-io/delta-rs/pull/1548) ([MrPowers](https://github.com/MrPowers))
- chore: disable incremental builds in CI for saving space [\#1545](https://github.com/delta-io/delta-rs/pull/1545) ([rtyler](https://github.com/rtyler))
- fix: revert premature merge of an attempted fix for binary column statistics [\#1544](https://github.com/delta-io/delta-rs/pull/1544) ([rtyler](https://github.com/rtyler))
- chore: increment python version [\#1542](https://github.com/delta-io/delta-rs/pull/1542) ([wjones127](https://github.com/wjones127))
- feat: add restore command in python binding [\#1529](https://github.com/delta-io/delta-rs/pull/1529) ([loleek](https://github.com/loleek))

## [rust-v0.13.1](https://github.com/delta-io/delta-rs/tree/rust-v0.13.1) (2023-07-18)

**Fixed bugs:**
Expand Down
50 changes: 46 additions & 4 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,10 +1,52 @@
[workspace]
members = [
"rust",
"python",
]
members = ["rust", "python"]
exclude = ["proofs", "delta-inspect"]
resolver = "2"

[profile.release-with-debug]
inherits = "release"
debug = true

# Reducing the debuginfo for the test profile in order to trim the disk and RAM
# usage during development
# <https://github.com/delta-io/delta-rs/issues/1550?
[profile.test]
debug = "line-tables-only"

[workspace.dependencies]
# arrow
arrow = { version = "45" }
arrow-array = { version = "45" }
arrow-buffer = { version = "45" }
arrow-cast = { version = "45" }
arrow-ord = { version = "45" }
arrow-row = { version = "45" }
arrow-schema = { version = "45" }
arrow-select = { version = "45" }
parquet = { version = "45" }

# datafusion
datafusion = { version = "30" }
datafusion-expr = { version = "30" }
datafusion-common = { version = "30" }
datafusion-proto = { version = "30" }
datafusion-sql = { version = "30" }
datafusion-physical-expr = { version = "30" }

# serde
serde = { version = "1", features = ["derive"] }
serde_json = "1"

# "stdlib"
bytes = { version = "1" }
chrono = { version = "0.4", default-features = false, features = ["clock"] }
regex = { version = "1" }
thiserror = { version = "1" }
url = { version = "2" }
uuid = { version = "1" }

# runtime / async
async-trait = { version = "0.1" }
futures = { version = "0.3" }
tokio = { version = "1" }
num_cpus = { version = "1" }
6 changes: 3 additions & 3 deletions dev/release/update_change_log.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@
set -e

LANGUAGE="rust"
SINCE_VERSION="0.6.0"
FUTURE_RELEASE="0.7.0"
SINCE_VERSION=${SINCE_VERSION:-"0.6.0"}
FUTURE_RELEASE=${FUTURE_RELEASE:-"0.7.0"}

# only consider tags of the correct language
if [ "$LANGUAGE" == "rust" ]; then
Expand Down Expand Up @@ -62,4 +62,4 @@ sed -i.bak "$(( $LINE_COUNT-3 )),$ d" "${OUTPUT_PATH}"
cat $HISTORIAL_PATH >> $OUTPUT_PATH

# Remove temporary files
rm $HISTORIAL_PATH
rm $HISTORIAL_PATH
7 changes: 7 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Python deltalake package

This is the documentation for the native Python implementation of Delta Lake. It is based on the delta-rs Rust library and requires no Spark or JVM dependencies. For the PySpark implementation, see [delta-spark](https://docs.delta.io/latest/api/python/index.html) instead.

This module provides the capability to read, write, and manage [Delta Lake](https://delta.io/) tables from Python without Spark or Java. It uses [Apache Arrow](https://arrow.apache.org/) under the hood, so is compatible with other Arrow-native or integrated libraries such as [Pandas](https://pandas.pydata.org/), [DuckDB](https://duckdb.org/), and [Polars](https://www.pola.rs/).

Note: This module is under active development and some features are experimental. It is not yet as feature-complete as the PySpark implementation of Delta Lake. If you encounter a bug, please let us know in our [GitHub repo](https://github.com/delta-io/delta-rs/issues).
9 changes: 9 additions & 0 deletions docs/installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Installation

## Using Pip

``` bash
pip install deltalake
```

NOTE: official binary wheels are linked against openssl statically for remote objection store communication. Please file Github issue to request for critical openssl upgrade.
Loading

0 comments on commit a44f148

Please sign in to comment.