Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable defining nested data types #193

Merged
merged 9 commits into from
Jun 9, 2023
Merged

Enable defining nested data types #193

merged 9 commits into from
Jun 9, 2023

Conversation

PhilippeMoussalli
Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli commented Jun 8, 2023

PR that addresses #178
Inspiration: https://swagger.io/docs/specification/data-models/data-types/#:~:text=the%20null%20value.-,Arrays,-Arrays%20are%20defined (credit to @GeorgesLorre)

It includes:

  • Changing the common json schema to enable defining array types
  • Changing the Type class to be dynamic rather than typed to enable defining nested structures without explicitly defining them
  • Adding an addition test file for schema.py
  • Updating the current existing examples (embedding + segmentation)

@PhilippeMoussalli PhilippeMoussalli added Core Core framework Components Implementation of components labels Jun 8, 2023
@PhilippeMoussalli PhilippeMoussalli added this to the 0.2.0 milestone Jun 8, 2023
@PhilippeMoussalli PhilippeMoussalli self-assigned this Jun 8, 2023
@PhilippeMoussalli PhilippeMoussalli linked an issue Jun 8, 2023 that may be closed by this pull request
5 tasks
@PhilippeMoussalli PhilippeMoussalli force-pushed the enable-nested-data branch 4 times, most recently from 6261e97 to 5be2a42 Compare June 9, 2023 07:15
Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PhilippeMoussalli, left some small comments.

"""
Types based on:
- https://arrow.apache.org/docs/python/api/datatypes.html#api-types
- https://pola-rs.github.io/polars/py-polars/html/reference/datatypes.html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now just based on arrow, right?

Returns:
The validated `pa.DataType` object.
"""
if isinstance(data_type, str):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if isinstance(data_type, str):
if not isinstance(data_type, Type):

I think this is a bit more robust. What if I pass in something that is not a Type nor a string? :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

if json_schema["type"] == "array":
items = json_schema["items"]
if isinstance(items, dict):
return cls.list(Type.from_json(items))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return cls.list(Type.from_json(items))
return cls.list(cls.from_json(items))

if isinstance(items, dict):
return cls.list(Type.from_json(items))
else:
return Type(json_schema["type"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return Type(json_schema["type"])
return cls(json_schema["type"])

@@ -60,16 +60,16 @@ def test_subset_fields():
subset = Subset(specification=subset_spec, base_path="/tmp")

# add a field
subset.add_field(name="data2", type_=Type.binary)
subset.add_field(name="data2", type_=Type("binary"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit unfortunate that we lose the ability to reference types by attribute (and therefore autocomplete, static analysis, etc., ...). Unfortunately I don't see a straightforward way to keep them either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah indeed, you would need to know/type all possible combinations which is not very feasible. But it's not user-facing so it should not have that big of an impact.

@RobbeSneyders RobbeSneyders merged commit 56c2dd9 into main Jun 9, 2023
@RobbeSneyders RobbeSneyders deleted the enable-nested-data branch June 9, 2023 13:27
Hakimovich99 pushed a commit that referenced this pull request Oct 16, 2023
PR that addresses #178 
Inspiration:
https://swagger.io/docs/specification/data-models/data-types/#:~:text=the%20null%20value.-,Arrays,-Arrays%20are%20defined
(credit to @GeorgesLorre)

It includes: 
* Changing the common json schema to enable defining array types 
* Changing the `Type` class to be dynamic rather than typed to enable
defining nested structures without explicitly defining them
* Adding an addition test file for `schema.py`
* Updating the current existing examples (embedding + segmentation)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Components Implementation of components Core Core framework
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Allow handling of nested data types
3 participants