-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable defining nested data types #193
Conversation
6261e97
to
5be2a42
Compare
5be2a42
to
08bbc95
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @PhilippeMoussalli, left some small comments.
fondant/schema.py
Outdated
""" | ||
Types based on: | ||
- https://arrow.apache.org/docs/python/api/datatypes.html#api-types | ||
- https://pola-rs.github.io/polars/py-polars/html/reference/datatypes.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now just based on arrow, right?
fondant/schema.py
Outdated
Returns: | ||
The validated `pa.DataType` object. | ||
""" | ||
if isinstance(data_type, str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if isinstance(data_type, str): | |
if not isinstance(data_type, Type): |
I think this is a bit more robust. What if I pass in something that is not a Type
nor a string? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch
fondant/schema.py
Outdated
if json_schema["type"] == "array": | ||
items = json_schema["items"] | ||
if isinstance(items, dict): | ||
return cls.list(Type.from_json(items)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return cls.list(Type.from_json(items)) | |
return cls.list(cls.from_json(items)) |
fondant/schema.py
Outdated
if isinstance(items, dict): | ||
return cls.list(Type.from_json(items)) | ||
else: | ||
return Type(json_schema["type"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return Type(json_schema["type"]) | |
return cls(json_schema["type"]) |
@@ -60,16 +60,16 @@ def test_subset_fields(): | |||
subset = Subset(specification=subset_spec, base_path="/tmp") | |||
|
|||
# add a field | |||
subset.add_field(name="data2", type_=Type.binary) | |||
subset.add_field(name="data2", type_=Type("binary")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit unfortunate that we lose the ability to reference types by attribute (and therefore autocomplete, static analysis, etc., ...). Unfortunately I don't see a straightforward way to keep them either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah indeed, you would need to know/type all possible combinations which is not very feasible. But it's not user-facing so it should not have that big of an impact.
PR that addresses #178 Inspiration: https://swagger.io/docs/specification/data-models/data-types/#:~:text=the%20null%20value.-,Arrays,-Arrays%20are%20defined (credit to @GeorgesLorre) It includes: * Changing the common json schema to enable defining array types * Changing the `Type` class to be dynamic rather than typed to enable defining nested structures without explicitly defining them * Adding an addition test file for `schema.py` * Updating the current existing examples (embedding + segmentation)
PR that addresses #178
Inspiration: https://swagger.io/docs/specification/data-models/data-types/#:~:text=the%20null%20value.-,Arrays,-Arrays%20are%20defined (credit to @GeorgesLorre)
It includes:
Type
class to be dynamic rather than typed to enable defining nested structures without explicitly defining themschema.py