-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
structured data operators for serializing tablerows, triples and keyvalue pairs added #589
Conversation
csrajmohan
commented
Feb 20, 2024
- Added new serialization operators for structured data: Table rows, Triples & KeyValue pairs
- Renamed table_operators -> struct_data_operators so that it best captures all structured data operators like the ones for Tables, Triples etc. and makes better sense to add other structured data operators to this in future.
- Added 3 new datatask cards to validate new operators
- wiki_bio : KeyValue Pair to Text(Data2Text task), Uses KeyValuePairs Serializer
- dart : Triples to Text(Data2Text task), Uses TriplesSerializer
- tablerow_classify : Table row classification using a Kaggle dataset(Heart disease prediction), Uses TableRowSerializer
f061516
to
4b8fd85
Compare
prepare/cards/tablerow_classify.py
Outdated
MapInstanceValues(mappers={"label": {"0": "Normal", "1": "Heart Disease"}}), | ||
AddFields( | ||
fields={ | ||
"text_type": "Person", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"text_type": "Person", | |
"text_type": "person medical record", |
I think it is more descrptive, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
prepare/tasks/generation.py
Outdated
@@ -3,7 +3,7 @@ | |||
|
|||
add_to_catalog( | |||
FormTask( | |||
inputs=["input"], | |||
inputs=["input", "type_of_input"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe would be also good to have type of output?
inputs=["input", "type_of_input"], | |
inputs=["input", "type_of_input", "type_of_output"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks really good. I really appreciate that you use the existing tasks and even improve them!! Go over my comment and ask for review when done and we can get it merged!
In case its needed: the test is currently failing because of import from |
import for test file fixed |
87f1d9a
to
19d1f8d
Compare
…ange Signed-off-by: Rajmohan <rajmohanc1@in.ibm.com>
19d1f8d
to
493ad02
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #589 +/- ##
==========================================
+ Coverage 88.38% 88.46% +0.08%
==========================================
Files 87 87
Lines 7698 7779 +81
==========================================
+ Hits 6804 6882 +78
- Misses 894 897 +3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great contribution!