-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interface to fetch entries in primitive types from DataPack
#900
Conversation
Codecov Report
@@ Coverage Diff @@
## master #900 +/- ##
==========================================
+ Coverage 80.87% 80.93% +0.05%
==========================================
Files 253 253
Lines 19619 19677 +58
==========================================
+ Hits 15867 15925 +58
Misses 3752 3752
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be a few more places to refactor in the current get
method:
entry_type
can just be a string. This will save the cost ofget_full_module_name
range_annotation
could be atid
instead of an entry object.
To maintain backward compatibility, there is a potential workaround:
- We may add a new
get_raw
method with the following signaturedef get_raw( # type: ignore self, entry_type: str, range_annotation_tid: Optional[int] = None, components: Optional[Union[str, Iterable[str]]] = None, include_sub_type: bool = True, ) -> Iterator[List]
- And then in the
get
method, we can do:def get( # type: ignore self, entry_type: Union[str, Type[EntryType]], range_annotation: Optional[Union[Annotation, AudioAnnotation]] = None, components: Optional[Union[str, Iterable[str]]] = None, include_sub_type: bool = True ): # Convert entry_type to string if it's Type[EntryType] ... # Convert range_annotation to tid ... # Convert result from get_raw to entry objects for entry_data in get_raw(...): yield self.get_entry(tid=entry_data[TID])
You can try out other solutions as well.
quick comment on the title, not "fetch entries directly from Data Store", but fetch primitive types from data pack. Data store is still invisible to users. |
DataPack
This PR is the first step towards fixing #881
Description of changes
Current, when fetching entries from a
DataPack
orMultiPack
using theget
method, Forte converts data store entries into object form. We wanted a way for users to directly interact withDataStore
entries. In this PR, we provide a modification to theget
method ofDataPack
to be able to return an entry in its primitive form directly fromDataStore
without needing to be converted to an object.Additionally, since
DataStore
entries are not very interpretable (since they are in alist
format), this PR introduces a way to retain data store entries in their primitive form and also represent them in a more interpretable way by converting it to adictionary
. This happens by thetransform_data_store_entry
method indata_store.py
. An example of this is as follows:Possible influences of this PR.
By allowing
DataPack
orMultiRack
to fetch entries in their primitive form, users can interact withDataStore
more easily.Test Conducted
The working of the
get
method with theget_raw
attribute set toTrue
was tested indata_pack_test.py
andmulti_pack_test.py