Releases: bodo-ai/Bodo
Releases · bodo-ai/Bodo
2025.2
🎉 Highlights
We've started to revamp our CSV and JSON reader and writer to work more inline with our Parquet I/O. As part of this release, we standardized multiple smaller features and filesystem support. We’ve just begun, so look forward to more changes!
In addition, we also started working on improving our compilation time by first creating a global cache for internal functions. This can provide dramatic speedups on subsequent uses of Bodo.
✨ New Features
- Support reading CSV/JSON/Parquet data from HuggingFace by @ehsantn in #175 and #176
- Support Google Cloud Storage (GCS) for CSV/JSON/Parquet I/O by @ehsantn in #169
- Support glob format in CSV read by @ehsantn in #180
- Support zstd compression in CSV/JSON I/O by @ehsantn in #27
- Improved reliability of spawn destructors for lazy data structures by @ehsantn in #161
🏎️ Performance Improvements
- Improved compilation time of dataframe unboxing by @ehsantn in #147
- Improved compilation time by caching functions internal to Bodo across program runs by @DrTodd13 in #154
🐞 Bug Fixes
- Improved the error message when Bodo detects an OOM to provide potential solutions and paths forward by @srilman in #205
- Fixed a bug in parallel read of JSON lines files by @ehsantn in #178
- Fix the Pandas warning that appears when using to_csv or to_json by @hadia206 in #216 and #234
⚙️ Dependency Upgrades
- Upgraded Calcite to 1.38 by @IsaacWarren in #148, #190, and #195
- Upgraded Numba to 0.61 by @ehsantn in #98 and #201
💖 New Contributors
Full Changelog: 2025.1...2025.2
2025.1
What's Changed
- DockerHub Image Workflow by @hadia206 in #55
- [BSE-4389] Add Bodo Platform SDK Guide in the quick start. by @mbojanowski in #84
- Add bioinformatics kmers example by @ehsantn in #95
- Add LLM query example with Ollama by @ehsantn in #94
- Rename "bodo.submit" to "bodo.spawn" by @ehsantn in #93
- Update scikit-learn version in docs by @ehsantn in #97
- [BSE-4386] Add Linux ARM Support by @srilman in #83
- BSE-4358: Python S3 Table support by @IsaacWarren in #96
- Add a manual testing file for BodoSQL by @njriasan in #104
- Add preprocess the pile AI example by @ehsantn in #82
- Skip ollama/preprocess_thepile examples in Nightly tests "Run Examples" by @scott-routledge2 in #102
- Add files via upload by @marquisdepolis in #105
- Disable the ARM Nightly Tests by @srilman in #109
- Adding 2024.12.3 release notes by @knassre-bodo in #110
- Update docs for @bodo.wrap_python by @ehsantn in #112
- BSE-4403: Manylinux 2_28 by @IsaacWarren in #74
- BSE-4403: Fix hdf5 import by @IsaacWarren in #116
- BSE-4358: Use a separate random generator for table name by @IsaacWarren in #117
- Move Relational Algebra Python APIs to an explicit Python Entry Point API by @njriasan in #107
- Refactor Parser Configuration to a Helper function by @njriasan in #119
- Standardize non-constructor APIs to use Python Entry Point by @njriasan in #108
- Moved validator configuration to its own function by @njriasan in #121
- [BSE-4452] Remove simple Java Constructors for PythonEntryPoint by @njriasan in #113
- Use warning instead of value error when tests are not being run in
check_query
by @scott-routledge2 in #118 - Move intervalToNanos out of the Calcite files and into our parser utils functions by @njriasan in #120
- [BSE-4441] Fix Nightly E2E tests. by @scott-routledge2 in #103
- [BSE-4454] Tweak Buffer Pool Memory Allocation Default to Avoid OS OOMs outside of Platform by @srilman in #114
- [BSE-4452] Port remaining Python Entry Code to use the Defined Interface by @njriasan in #115
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #123
- Fix bodosql tests nightly by @scott-routledge2 in #125
- BSE-4460: Add java test infra to PR CI by @IsaacWarren in #127
- Replace feedback repo with main repo in docs by @ehsantn in #132
- Fix typo in NO_HDF5 check by @IsaacWarren in #131
- BSE-4358: BodoSQL support for S3 tables by @IsaacWarren in #126
- Replace stream operator state typing to use Numba type refinement by @ehsantn in #124
New Contributors
- @pre-commit-ci made their first contribution in #123
Full Changelog: 2024.12.3...2025.1
2024.12.3
What's Changed
- [BSE-4392] Add README and docs getting started examples to unittests by @scott-routledge2 in #73
- Switch to GitHub Hosted Instances for CI by @srilman in #70
- Handle nested objmode functions in UDF inlining by @ehsantn in #81
- Add Codecov by @hadia206 in #69
- 2024.12.2 Release notes by @IsaacWarren in #85
- [BSE-4404] Add local notebooks, update README by @scott-routledge2 in #77
- Remove df.append and other removed pandas APIs from docs by @ehsantn in #87
- [BSE-4374] Patch numba for object mode globals in spawn mode by @scott-routledge2 in #80
- Update License Copyright by @njriasan in #91
- [BSE-4415] Support large counts in MPI scatterv by @scott-routledge2 in #86
- [BSE-4419] Support large counts in MPI gatherv by @scott-routledge2 in #89
- Create a simple JIT wrapper for regular Python functions by @ehsantn in #88
- Make bodo compatible with newer pandas versions to enable UDF engine support by @scott-routledge2 in #92
Full Changelog: 2024.12.2...2024.12.3
2024.12.2
What's Changed
- Update docs quick start examples by @ehsantn in #67
- Add files via upload by @marquisdepolis in #68
- [BSE-4385] Add locally runnable benchmarks by @scott-routledge2 in #65
- Handle slicing with negative start/step in lazy wrappers by @ehsantn in #71
- Fix spawn CI hang by @ehsantn in #75
- Fix JSON write is_lines handling by @ehsantn in #72
- Avoid logging in del functions by @ehsantn in #76
- Add GitHub Repo Link to PyPi by @srilman in #79
Full Changelog: 2024.10...2024.12.2
2024.12.1
What's Changed
- [BSE-4340] Spark version of NYC Taxi benchmark by @ehsantn in #19
- Update Azure user environment variable name by @hadia206 in #24
- [BSE-4249] to_parquet by @yingyee0111 in #8
- BSE-4343: Fix sccache in pip build by @IsaacWarren in #25
- [BSE-4337] Trigger BodoSQL customer tests when changes in BodoSQL/calcite_sql by @mbojanowski in #23
- Only Test Python 3.12 for MacOS Conda Builds in Pull Requests by @srilman in #30
- [BSE-4271] Add benchmark scripts for dask by @scott-routledge2 in #14
- [BSE-4325] Remove AWS PR CI Files by @srilman in #4
- Exclude develop branch from nightly runs by @hadia206 in #32
- Update examples for open source release by @ehsantn in #28
- [BSE-4356] Move the Dev Docker Code to the
buildscripts/
Folder by @srilman in #36 - Fix contact link by @ehsantn in #35
- Enable Pre-Commit CI by @srilman in #37
- BSE-4355: Add readme for azurefs-sas-token-provider by @IsaacWarren in #38
- Remove old readme file by @ehsantn in #34
- Add 2024.11 and 2024.12 release notes by @hadia206 in #42
- [BSE-4204] Arrow 18 Upgrade by @srilman in #12
- Update README.md with description by @marquisdepolis in #41
- [BSE-4341] Benchmark against Modin/Ray by @scott-routledge2 in #22
- Fix conda release issues by @hadia206 in #44
- Support caching for IPython cells by @ehsantn in #45
- [BSE-4352] Fix to_parquet nightly issues by @yingyee0111 in #43
- BSE-4340: Update spark benchmark by @IsaacWarren in #47
- Adding logo files by @marquisdepolis in #48
- Update README.md by @marquisdepolis in #49
- Fix BodoSQL use in Jupyter (without JIT) by @ehsantn in #51
- BSE-4340: Spark benchmark EMR terraform by @IsaacWarren in #50
- [BSE-4203] Use Pixi for Pip Build Deps by @srilman in #46
- [BSE-4249] to_sql by @yingyee0111 in #27
- [BSE-4341] Update benchmarking and finalize scripts by @scott-routledge2 in #31
- BSE-4340: Run emr benchmark 3 times by @IsaacWarren in #56
Full Changelog: 2024.12...2024.12.1
2024.12
What's Changed
- Fix pyarrow import in CI pipelines by @scott-routledge2 in #11
- add openjdk version by @IsaacWarren in #15
- Minor fixes in platform install script by @ehsantn in #13
- BSE-4290: Add pip to docs by @IsaacWarren in #5
- [BSE-4271] Add NYC Taxi benchmark code by @ehsantn in #6
- Rename develop -> main by @hadia206 in #20
- [BSE-4333] Avoid Cythonizing transform files by @ehsantn in #18
- BSE-4272: Remove fftw as a dependency by @IsaacWarren in #3
- Only import bodosql if necessary in gatherv by @IsaacWarren in #16
New Contributors
- @scott-routledge2 made their first contribution in #11
Full Changelog: 2024.11...2024.12