Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking for Series Types (Representation Series, Tokenized Series, ...). #69

Merged
merged 7 commits into from
Aug 5, 2020

Conversation

henrifroese
Copy link
Collaborator

This is mainly meant as a first step towards closing #60 and should not be merged yet. It implements three types of custom pandas Series that implement the most common usages in the library. Also adds decorators for easier development using the types. See the discussion in #60 for more information.

File _helper.py implements Series types for the library.

Would implement jbesomi#60 .
@jbesomi
Copy link
Owner

jbesomi commented Jul 12, 2020

Henri you are awesome!! 🎉🎉

@henrifroese henrifroese changed the title First implementation of custom Series types in. Checking for Series Types (Representation Series, Tokenized Series, ...). Waiting until Representation Series is implemented Jul 27, 2020
henrifroese and others added 2 commits August 1, 2020 14:22
Co-authored-by: Maximilian Krahn <maximilian.krahn@icloud.com>
@henrifroese henrifroese changed the title Checking for Series Types (Representation Series, Tokenized Series, ...). Waiting until Representation Series is implemented Checking for Series Types (Representation Series, Tokenized Series, ...). Aug 1, 2020
@henrifroese
Copy link
Collaborator Author

henrifroese commented Aug 1, 2020

We have now worked on this again; should be ready for a review/discussion (and maybe a merge) @jbesomi .

Summary (everything is explained more extensively in docstrings in _helper.py):

  • Add Classes TextSeries, TokenSeries, VectorSeries, DocumentRepresentationSeries where each class has a function that can check for a Series s if s fulfills the type
  • Add Decorator InputSeries(allowed_hero_series_type=) to check the input series type of a function
  • Tests for the decorator

Use Cases:

We can now do two things:

>>> import pandas as pd
>>> from texthero._helper import *
>>> def tfidf(s: TokenSeries) -> DocumentRepresentationSeries:
>>>      # tfidf stuff

This is very expressive for the users in the documentation (and also for the developers)! Everyone can immediately see what tfidf is working on and what the output looks like. Doing help(TokenSeries) now also explains to the users what's going on.

This does not force users to do anything differently from before, they should just continue using Pandas Series as usual; it's mainly for documentation / ease of use / expressiveness. This also automatically adds good explainations through the docstrings.

Second of all, the decorator (probably self-explanatory):

>>> from texthero._helper import *
>>> import pandas as pd
>>> @InputSeries(TokenSeries)
... def f(s):
...     pass
>>> f(pd.Series("Not tokenized")) # doctest: +SKIP
>>> # throws a type error with a nice explaination
>>> f(pd.Series([["I", "am", "tokenized"]]))
>>> # passes

All of this is implemented very lightweight and adds basically no overhead; the type checking etc. is all O(1)

@henrifroese henrifroese marked this pull request as ready for review August 3, 2020 17:05
@henrifroese henrifroese added the enhancement New feature or request label Aug 3, 2020
Copy link
Owner

@jbesomi jbesomi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job! 🎉 That's soo nice and soo cool (and soo simple too) !!

tests/test_helpers.py Show resolved Hide resolved
texthero/_helper.py Outdated Show resolved Hide resolved
- rename "DocumentRepresentationSeries" to "RepresentationSeries"
@henrifroese
Copy link
Collaborator Author

Just renamed "DocumentRepresentationSeries" to "RepresentationSeries"

@jbesomi
Copy link
Owner

jbesomi commented Aug 5, 2020

Everything looks great! I will merge it now ... the only observation: what if we move this under a separate file? Like types.py or heroseries.py or something like that?

@jbesomi jbesomi merged commit cc2ceee into jbesomi:master Aug 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants