You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While adding type hints to the library i found various places where in my opinion improvments could be made.
I add them here to discuss because most of them will be breaking changes, and should only be considered once a major revision is planned.
1)
classEmojiMatch:
defdata_copy(self) ->Dict[str, Any]:
""" Returns a copy of the data from :data:`EMOJI_DATA` for this match with the additional keys ``match_start`` and ``match_end``. """ifself.data:
emj_data=self.data.copy()
emj_data['match_start'] =self.startemj_data['match_end'] =self.endreturnemj_dataelse:
return {
'match_start': self.start,
'match_end': self.end
}
This method returns 2 completely different dicts, i don’t think its evident for the user what they will get here, and if the user wants to prevent bugs, even with type hints, after each call the user needs to test which kind of dict he got.
I got a branch where all dicts are typed, but this is still a pain here.
Also deepcopy should be considered here, because the emoji data contain itself mutable objects again.
The user wants to call token.value.emoji but he first needs to test what he got with isinstance(token.value, str)
3)
In data_dict.py: STATUS should be an IntEnum, i found no reason why this is a dict.
4)
defget_emoji_unicode_dict(lang: str) ->Dict[str, Any]:
"""Generate dict containing all fully-qualified and component emoji name for a language The dict is only generated once per language and then cached in _EMOJI_UNICODE[lang]"""if_EMOJI_UNICODE[lang] isNone:
_EMOJI_UNICODE[lang] = {data[lang]: emjforemj, datainEMOJI_DATA.items()
iflangindataanddata['status'] <=STATUS['fully_qualified']}
return_EMOJI_UNICODE[lang]
defget_aliases_unicode_dict() ->Dict[str, str]:
"""Generate dict containing all fully-qualified and component aliases The dict is only generated once and then cached in _ALIASES_UNICODE"""ifnot_ALIASES_UNICODE:
_ALIASES_UNICODE.update(get_emoji_unicode_dict('en'))
foremj, datainEMOJI_DATA.items():
if'alias'indataanddata['status'] <=STATUS['fully_qualified']:
foraliasindata['alias']:
_ALIASES_UNICODE[alias] =emjreturn_ALIASES_UNICODE
a)
Instead of returning mutable dicts, we could return a MappingProxy Object which wraps the dict, this makes it (as much as you can in python) immutable. And the user cannot accidentally modify it.
b)
Instead of the constant _EMOJI_UNICODE and _ALIASES_UNICODE one could use @functools.lru_cache which would make the code much smaller.
5)
A lot of test code imports the private cache _EMOJI_UNICODE. While this is probably not really a problem it makes it so we have to add many type: ignore comments.
A simple helper method in the test module which iterates the languages and builds the whole cache would let us get rid of all the comments.
The text was updated successfully, but these errors were encountered:
Sounds great. I would generally be open to any of those.
Moving the EMOJI_DATA to json files with a separate json file for each language ( #280 ) would be a breaking change, so that should be a major version change anyway.
Yes. I think Python 2 had some problem with enums, I don't exactly remember why I chose dict.
While adding type hints to the library i found various places where in my opinion improvments could be made.
I add them here to discuss because most of them will be breaking changes, and should only be considered once a major revision is planned.
1)
This method returns 2 completely different dicts, i don’t think its evident for the user what they will get here, and if the user wants to prevent bugs, even with type hints, after each call the user needs to test which kind of dict he got.
I got a branch where all dicts are typed, but this is still a pain here.
Also
deepcopy
should be considered here, because the emoji data contain itself mutable objects again.2)
The user wants to call
token.value.emoji
but he first needs to test what he got withisinstance(token.value, str)
3)
In
data_dict.py
:STATUS
should be anIntEnum
, i found no reason why this is a dict.4)
a)
Instead of returning mutable dicts, we could return a
MappingProxy
Object which wraps the dict, this makes it (as much as you can in python) immutable. And the user cannot accidentally modify it.b)
Instead of the constant
_EMOJI_UNICODE
and_ALIASES_UNICODE
one could use@functools.lru_cache
which would make the code much smaller.5)
A lot of test code imports the private cache
_EMOJI_UNICODE
. While this is probably not really a problem it makes it so we have to add manytype: ignore
comments.A simple helper method in the test module which iterates the languages and builds the whole cache would let us get rid of all the comments.
The text was updated successfully, but these errors were encountered: