-
Notifications
You must be signed in to change notification settings - Fork 97
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
from_parquet: fix partition keys extraction when caching is enabled
When caching is enabled, the dataset is opened using the path to the cache. This means that the original partition keys get lost and need to be re-derived from the original path. This commit adds a helper function that extracts partition keys from original file path and updates the record with the partition keys. I could not find a better way than this _hacky_ solution. Feel free to suggest a better way to handle this. I have added a test. Without this fix, the test fails as it would fail to find `first_name` in the record and would default to `None`. In a real `DataChain` code, this would violate the schema and raise a pydantic's `ValidationError`.
- Loading branch information
Showing
2 changed files
with
54 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters