Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interface to fetch entries in primitive types from DataPack #900

Merged
merged 13 commits into from
Jan 3, 2023

Conversation

Pushkar-Bhuse
Copy link
Collaborator

This PR is the first step towards fixing #881

Description of changes

Current, when fetching entries from a DataPack or MultiPack using the get method, Forte converts data store entries into object form. We wanted a way for users to directly interact with DataStore entries. In this PR, we provide a modification to the get method of DataPack to be able to return an entry in its primitive form directly from DataStore without needing to be converted to an object.

Additionally, since DataStore entries are not very interpretable (since they are in a list format), this PR introduces a way to retain data store entries in their primitive form and also represent them in a more interpretable way by converting it to a dictionary. This happens by the transform_data_store_entry method in data_store.py. An example of this is as follows:

# Entry of type 'ft.onto.base_ontology.Sentence'
            data_store_entry = [
                171792711812874531962213686690228233530,
                'ft.onto.base_ontology.Sentence',
                0,
                164,
                0,
                '-',
                0,
                {},
                {},
                {}
            ]

            transformed_entry = pack.transform_data_store_entry(
                data_store_entry
            )

            # transformed_entry = {
            #   'begin': 0,
            #   'end': 164,
            #   'payload_idx': 0,
            #   'speaker': '-',
            #   'part_id': 0,
            #   'sentiment': {},
            #   'classification': {},
            #   'classifications': {},
            #   'tid': 171792711812874531962213686690228233530,
            #   'type': 'ft.onto.base_ontology.Sentence'}
            # }

Possible influences of this PR.

By allowing DataPack or MultiRack to fetch entries in their primitive form, users can interact with DataStore more easily.

Test Conducted

The working of the get method with the get_raw attribute set to True was tested in data_pack_test.py and multi_pack_test.py

@Pushkar-Bhuse Pushkar-Bhuse requested a review from mylibrar August 24, 2022 20:32
@codecov
Copy link

codecov bot commented Aug 24, 2022

Codecov Report

Merging #900 (77f4483) into master (72e8bce) will increase coverage by 0.05%.
The diff coverage is 92.59%.

@@            Coverage Diff             @@
##           master     #900      +/-   ##
==========================================
+ Coverage   80.87%   80.93%   +0.05%     
==========================================
  Files         253      253              
  Lines       19619    19677      +58     
==========================================
+ Hits        15867    15925      +58     
  Misses       3752     3752              
Impacted Files Coverage Δ
tests/forte/data/data_store_serialization_test.py 98.43% <ø> (ø)
tests/forte/data/data_store_test.py 95.58% <ø> (ø)
forte/data/multi_pack.py 83.01% <80.00%> (+0.82%) ⬆️
forte/data/data_pack.py 84.90% <86.36%> (-0.37%) ⬇️
forte/data/data_store.py 93.31% <95.23%> (+0.39%) ⬆️
forte/data/base_pack.py 76.75% <100.00%> (+0.07%) ⬆️
forte/data/ontology/top.py 78.16% <100.00%> (+0.05%) ⬆️
tests/forte/data/data_pack_test.py 98.98% <100.00%> (+0.13%) ⬆️
tests/forte/data/multi_pack_test.py 97.05% <100.00%> (+0.15%) ⬆️
... and 2 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Collaborator

@mylibrar mylibrar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a few more places to refactor in the current get method:

  • entry_type can just be a string. This will save the cost of get_full_module_name
  • range_annotation could be a tid instead of an entry object.

To maintain backward compatibility, there is a potential workaround:

  • We may add a new get_raw method with the following signature
    def get_raw(  # type: ignore
        self,
        entry_type: str,
        range_annotation_tid: Optional[int] = None,
        components: Optional[Union[str, Iterable[str]]] = None,
        include_sub_type: bool = True,
    ) -> Iterator[List]
  • And then in the get method, we can do:
    def get(  # type: ignore
          self,
          entry_type: Union[str, Type[EntryType]],
          range_annotation: Optional[Union[Annotation, AudioAnnotation]] = None,
          components: Optional[Union[str, Iterable[str]]] = None,
          include_sub_type: bool = True
      ):
        # Convert entry_type to string if it's Type[EntryType]
        ...
        # Convert range_annotation to tid
        ...
        # Convert result from get_raw to entry objects
        for entry_data in get_raw(...):
          yield self.get_entry(tid=entry_data[TID])

You can try out other solutions as well.

forte/data/base_pack.py Outdated Show resolved Hide resolved
forte/data/data_pack.py Outdated Show resolved Hide resolved
forte/data/base_pack.py Outdated Show resolved Hide resolved
forte/data/data_pack.py Show resolved Hide resolved
forte/data/data_pack.py Show resolved Hide resolved
forte/data/data_pack.py Outdated Show resolved Hide resolved
@hunterhector
Copy link
Member

quick comment on the title, not "fetch entries directly from Data Store", but fetch primitive types from data pack. Data store is still invisible to users.

@Pushkar-Bhuse Pushkar-Bhuse changed the title Interface to fetch entries directly from Data Store Interface to fetch entries in primitive types from DataPack Aug 30, 2022
forte/data/base_pack.py Show resolved Hide resolved
forte/data/data_pack.py Show resolved Hide resolved
forte/data/data_pack.py Outdated Show resolved Hide resolved
forte/data/data_store.py Outdated Show resolved Hide resolved
forte/data/ontology/top.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants