-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse DataPack on backend #232
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still too many low-level calls and reliance on the internal serialization format. This does not utilize the power of parsing the data pack from backend, since at the back end we should use all Forte interfaces to make this task more standard. The usage of DataStore is making the problem even worse.
@@ -371,10 +351,9 @@ def new_link(request, document_id): | |||
link = received_json_data.get('data') | |||
link["py/state"]['_tid'] = link_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like here we use a certain data structure very similar to the Forte format, and need to use some internal understanding. I feel like this would make the system still depend on some certain Forte version.
""" | ||
|
||
ANNOTATION_LIST = "annotations" | ||
LINK_LIST = "links" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why only Annotation
and Link
have the special class variable, how about Group
or other top
level ontologies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because Stave
users can only add/edit/delete these two types of top ontologies. Looks like we haven't design any mechanism for users to manage Group
or other types of ontologies through Stave
UI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually we can, by enabling the Group
extension
|
||
# Clear the previous type info to avoid conflicts in onto | ||
# definitions | ||
DataStore._type_attributes = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to access the DataStore
interface? I think this is a low level interface and if we access this that means we actually don't have backward incompatibility.
Btw, not having backward incompatibility should be fine, as long as we know which Forte-Stave version pairs work, but having access to the low level implementation details here looks problematic.
""" | ||
if self._pack is None: | ||
# Add entry to DataPack with legacy format | ||
if self._data_store._is_subclass( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Too many usages of low-level details here, can we use the public API of data pack only?
Note that using the public API we should know whether a type_name is a subclass of Annotation.
if self._data_store._is_subclass( | ||
type_name=entry_dict["py/object"], cls=Annotation | ||
): | ||
self._pack_json['py/state']["annotations"].append(entry_dict) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we have to keep the py/state
structure here? These are stuff used by jsonpickle.
def _transform_pack_json(self): | ||
""" | ||
Transform a DataPack dictionary with legacy format to a json for | ||
frontend rendering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This transform still looks manual to me. If we want to support legacy format we should directly use Forte data pack API. For example:
for annotation in pack.get("Annotation"):
annotations.append(
"span": {
"begin": annotation.begin,
"end": annotation.end
},
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes most of this part is still manual parsing. To support legacy format, we don't have access to the pack
object since the it's not deserializable using the latest forte
. We only have access to a json object like this which is derived from old version of forte.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think we can feel free to directly use Forte
to parse the pack
object. So for Stave to work, we simply need to install the correct Forte version. For example, we can update the example db with the new Forte format in the master branch now. To use the old forte format we simply downgrade Stave to a prior version.
Cross-version compatibility is another issue that we can deal with separately. My opinion is that that's not very important right now. Even if we want to deal with that, it probably should happen in Forte repo not Stave repo
Block by asyml/forte#881. |
This PR fixes #230 .
Description of changes
The
DataPack
parsing is moved from frontend to backend.Possible influences of this PR
We still maintain support to the previous DataPack format so that the example projects won't break.