Releases: mlcommons/croissant
Releases · mlcommons/croissant
v1.0.10
What's Changed
- Update README.md by @ccl-core in #749
- Example of a dataset with nested fields. by @ccl-core in #745
- Add the web-of-science dataset (from parquet) by @ccl-core in #752
- Remove editor tests by @ccl-core in #753
- BoundigBox feature defaults to crs 1.0 by @ccl-core in #755
- Release 1.0.10 by @ccl-core in #756
Full Changelog: v1.0.9...v1.0.10
v1.0.9
What's Changed
- Isolate a
.call()
method in operations. by @marcenacp in #736 - Keys in a RecordSet should be a list of ids references. by @ccl-core in #740
- Cache the result of each operation. by @marcenacp in #741
- Allow datasets with joins when generating with Apache Beam. by @marcenacp in #743
- Fix discrepancies with the specs by @ccl-core in #742
- Use ids to reference a field or a node. by @ccl-core in #744
- Check that the mapping is valid after setting it. by @marcenacp in #747
- New release mlcroissant==1.0.9 by @ccl-core in #748
Full Changelog: v1.0.8...v1.0.9
v1.0.8
What's Changed
- Adding the levanti dataset. by @ccl-core in #727
- Make nodes and operations pickable. by @marcenacp in #729
- Add splits to the huggingface-mnist dataset by @ccl-core in #726
- Allow to parallelize operations in mlcroissant with Apache Beam. by @marcenacp in #730
- More features around Beam. by @marcenacp in #731
- Remove
pipeline
argument from ReadFromCroissant and usebeam.ptransform_fn
. by @marcenacp in #734 - New release mlcroissant==1.0.8. by @marcenacp in #735
Full Changelog: v1.0.7...v1.0.8
v1.0.7
What's Changed
- Add URLs to pyproject.toml by @PGijsbers in #705
- Implement filtering in the case of filename regular expression and add a test for this feature. by @marcenacp in #716
- Fix broken Unit tests. by @ccl-core in #717
- Add more info links on how to do releases. by @ccl-core in #718
- Apply filters to a Hugging Face dataset to avoid repeating all variants. by @marcenacp in #719
- Move filters from Dataset init to
self.records
by @ccl-core in #720 - Release 1.0.7 by @ccl-core in #721
Full Changelog: v1.0.6...v1.0.7
v1.0.6
What's Changed
- git lfs download fileObject and read gzipped files by @ccl-core in #636
- update readme code example to new hf and croissant api by @luisoala in #642
- Add a dataset with a repeated field by @ccl-core in #644
- Updates to the Croissant turtle definition to align with the spec, and… by @benjelloun in #634
- Remove flores notebook from the automatically checked notebook. by @marcenacp in #652
- Use regex-based version casting that accept by @ccl-core in #658
- update readme with paper proceedings info by @luisoala in #665
- Editor RAI tab by @JoanGi in #578
- Fix typo in schema:Enumeration name by @benjelloun in #669
- Small fixes to the Croissant specification by @benjelloun in #666
- Add four record sets to anthropic hh hlhf by @ccl-core in #670
- Fix end-to-end tests by @marcenacp in #672
- Rerun Croissant Health reports for Hugging Face and OpenML by @marcenacp in #660
- Fix small bugs for splits. by @ccl-core in #680
- camera-ready pdf link by @luisoala in #701
- Introduce mlc.DataType.SPLIT for consistency. by @ccl-core in #709
- Add example output of a dataset with splits. by @ccl-core in #710
- Change DataType.SPLIT to use croissant 1.0 specs by @ccl-core in #712
- When creating an
mlc.Metadata
object, share the graph with all nodes. by @marcenacp in #713 - DataTypes should be URIRef by @ccl-core in #714
- Publish mlcroissant==1.0.6. by @ccl-core in #715
Full Changelog: v1.0.5...v1.0.6
v1.0.5
v1.0.4
v1.0.3
v1.0.2
- Continue the implementation of Croissant standard 1.0.
- Implement the new reference mechanism.
- Add more schema.org properties.
- Fix the cardinality of JSON-LD properties.