This release contains several new features, tons of fixes and two new exciting experimental new integrations:
- Experimental new CSV parser based on Deephaven-CSV. See below for more information.
- Experimental new
GeoDataFrame
class for working with geographical data (from GeoJson/Shapefile) and plotting it with Kandy. See below for more information. - Full
BigInteger
support:
Just like we support theBigDecimal
numbers, DataFrame now also supportsBigInteger
in parsing, converting, statistics, column arithmetics, etc. - Custom SQL Database registration (read user guide)
- Improved parsing:
Parsing and convertingString
columns to other types is now faster.
We addedString
->Char
parsing.
We also introduce the new experimentalParserOptions.useFastDoubleParser
setting, which uses FastDoubleParser for faster and more flexibleDouble
parsing. - We continue improving our Compiler Plugin with every release. See below for more information.
- See this notebook for some more information about the changes.
New Experimental CSV integration
DataFrame's CSV parsing has been based on Apache Commons CSV from the beginning. While this has been sufficient for most applications, it had some issues like running out of memory, performance, and our API lacking in clarity, documentation, and completeness.
For DataFrame 0.15, we introduce a new separate package org.jetbrains.kotlinx:dataframe-csv
which tries to solve all these issues at once. It's based on Deephaven-CSV which makes it faster and more memory efficient. And since we built it from the ground up, we made sure the API was complete, predictable, and documented carefully.
To try it yourself, explicitly add the dependency org.jetbrains.kotlinx:dataframe-csv
to your project. In notebooks you can add enableExperimentalCsv=true
to the %use-magic, like %use dataframe(enableExperimentalCsv=true)
.
Use the new DataFrame.readCsv()
/DataFrame.readTsv()
/DataFrame.readDelim()
functions over the old DataFrame.readCSV()
ones.
We happily await your feedback!
New Experimental Geo integration
Kandy v0.8 introduces geo-plotting which allows you to visualize geospatial/geographical data using the awesome Kandy DSL. To make working with this geographical data (from GeoJson/Shapefile) easier, we happily accepted the GeoDataFrame PR from the Kandy team.
To try it yourself, explicitly add the dependency org.jetbrains.kotlinx:dataframe-geo
to your project or notebook (with the repository maven("https://repo.osgeo.org/repository/release")
) and use GeoDataFrame.readGeoJson()
or GeoDataFrame.readShapeFile()
to get started!
Features
- New CSV implementation by @Jolanrensen in #903
- GeoDataFrame init by @AndreiKingsley in #909
- Change default flatten parent-child separator to "_" by @Jolanrensen in #920
- Split OpenAPI in module needed for user projects and module needed for code-generation by @koperagen in #916
- Support read unstructured excel file by @khm0651 in #901
- Fast double parser by @Jolanrensen in #935
- Implemented custom SQL DB registration by @zaleslaw in #917
- Render FormattedFrame stored inside columns as HTML by @koperagen in #944
- Adding some missing converters by @Jolanrensen in #958
- Full
BigInteger
support by @Jolanrensen in #972
Compiler Plugin
- [Compiler plugin] Lower frontend generated implicit receivers by @koperagen in #869
- Generate valid code in transform(call) when interpret(call) fails by @koperagen in #907
- [Compiler plugin] Support dataFrameOf(Pair<String, List) by @koperagen in #908
- [Compiler plugin] Add a mechanism to handle function calls to stdlib that can appear as df api arguments by @koperagen in #914
- [Compiler plugin] Generate ColumnName annotations on frontend for all names that contain illegal characters by @koperagen in #913
- Revert insertGenericTreeImpl by @koperagen in #923
- [Compiler plugin] Propagate nullability in toDataFrame tree conversion by @koperagen in #942
- Add castTo(Function) overload for workflows that use compiler plugin by @koperagen in #948
- [Compiler plugin] Setup call transformer pipeline to handle (...) -> DataRow functions by @koperagen in #918
- Compiler plugin read improvements by @koperagen in #949
- [Compiler plugin] Support valueCounts by @koperagen in #951
Fixes
- Adding contracts for
Anycol.isValueColumn
etc. for smart-casting by @Jolanrensen in #882 - Fix publish indexes docs by @Jolanrensen in #885
- Update algolia index builder by @koperagen in #895
- Find KSP Configurations that are Added Later by @mgroth0 in #881
- Partially inline AnyFrame typealias in return type position by @koperagen in #888
- Deprecating
DataFrame.read("", delimiter =)
by @Jolanrensen in #902 - Parsing improvements by @Jolanrensen in #874
- Fixed local classes being inferred as
Any
by changing visibility check by @Jolanrensen in #929 - Open File in readExcel in read-only mode by @koperagen in #931
- Adds the binary compatibility validator plugin by @Jolanrensen in #938
- Fixes nulls in framecols and improves column creation situation by @Jolanrensen in #925
- Specify slf4j-api instead of slf4j-simple. by @erikogenvik in #934
- [Important Fix!] Parse started to removed unselected columns by @Jolanrensen in #947
- Fixed error message for ColumnAccessor by @zaleslaw in #953
- fix crs by @AndreiKingsley in #955
- Added inferNullability test for other databases by @zaleslaw in #954
- Fix: disabled FastDoubleParser debug logs overload in the tests by @Jolanrensen in #956
describe()
fixes by @Jolanrensen in #937- Fix IO closing & add new useful extensions by @AndreiKingsley in #960
- Remove dependency on fuel by @koperagen in #969
- [fix] Parser should be skipped after it fails to parse value by @koperagen in #975
- small fix for file clash at impl/io/readDelim.kt by @Jolanrensen in #976
- Fixed csv dependencies by @Jolanrensen in #977
- Bumped deprecations of
startsWith
andendsWith
in CS DSL to Error by @Jolanrensen in #978 - Version bumps for 0.15 by @Jolanrensen in #980
Docs and Examples
- READMEs by @Jolanrensen in #900
- Make it clear that dataframes are immutable by @dmcg in #924
- Add examples for rename by @koperagen in #952
New Contributors
- @khm0651 made their first contribution in #901
- @erikogenvik made their first contribution in #934
- @dmcg made their first contribution in #924
- @AndreiKingsley made their first contribution in #909
Full Changelog: v0.14.2...v0.15.0