You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currenlty, the data models that is used throughout the whole repository is defined in items.py. This data model is neither really equivalent to the EduSharing services data model (which is why there are so many transformations in es_connector.py), nor does it really cover what the most of the crawlers produce.
I propose to:
define a data model that is oriented w.r.t. the problem domain (e.g. only contains the metadata that is actually relevant for the end users)
define this model with either python dataclasses or via pydantic (which will allow automatic validation via type annotations)
define a seconds data model which exactly reflects the EduSharing services requirement
write one big transformation step (eventually as pipeline) to transform from one into the other.
This will
remove a lot of complexity from the scrawler/spider layer of the repository
will make debugging of mapping / transformation issues a lot easier
will allows for automatic and spider independent unit-tests (we can simply populate input data model and validate the transformed output)
removes complexity from the es_connector.py classes / files
The text was updated successfully, but these errors were encountered:
Currenlty, the data models that is used throughout the whole repository is defined in
items.py
. This data model is neither really equivalent to the EduSharing services data model (which is why there are so many transformations ines_connector.py
), nor does it really cover what the most of the crawlers produce.I propose to:
This will
es_connector.py
classes / filesThe text was updated successfully, but these errors were encountered: