Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importer restructure #5624

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open

Conversation

semohr
Copy link

@semohr semohr commented Feb 9, 2025

Description

Hello y'all, when working on the importer.py file in a previous PR I noticed that this file grew quite large and badly needs a restructuring. Restructuring should improve our ability to apply changes to it in the future and isolate sub-functionalities within the importer.

Overview

For now I only changed the structure keeping the code (mostly) unchanged.

I split the functions and classes in the importer.py into the following responsibilities:

  • importer/session.py : Includes the ImportSession class.
  • importer/stages.py : Includes all stage functions, I prefixed the helper functions with a _ to allow distinguishing between stages and helper functions more easily.
  • importer/state.py : Includes the logic for the ImportState handling i.e. the resume feat.
  • importer/tasks.py : Includes the ImportTask class and all derived classes. Also includes the Action enum which I have renamed from action.
  • importer/__init__.py : Identified all public facing classes and functions and added them to __all__

Potential future changes

I don't want to add this to this PR but there are some places here where I see possible improvements for our code:

  • There are quite some config parsing related functions in the ImportSession which could be isolated (see e.g. set_config, want_resume). Maybe a mixin class which handles the config operations could be useful?
  • The ImportSession should be abstract if it is not used directly (I think it shouldn't). The function definitions which raise NotImplemented errors are quite weird imo and could be avoided by making the class abstract.
  • For me it was difficult to understand the flow of the importer as stages call session function and it is not clear which function is called by which stage and when. Maybe a naming convention for the stage functions in conjunction with the session methods could help here. Not sure how this will look in practice but right now it is quite hard to follow imo. Alternatively splitting the session into a outfacing session and a session context which is passed to the stages could help.
  • The use of the stage decorator is highly inconsistent. Maybe a better way to handle the stages could be found. This is more of a pipeline related issue and not directly related to the restructuring but I think it is worth mentioning.
  • Similar to the ImportSession, I think the ImportTask should be abstract as well, maybe we can put a bit more thought into the task hierarchy. This might also automatically improve the flow of the importer pipeline.

Am happy to tackle some of these issues in future PRs if you also think they are worth it.

Best,
Sebastian

Note: This PR is based on #5611 and can only be merged once the typing additions are accepted.

semohr and others added 20 commits February 1, 2025 13:16
Makes this more readable in my opinion, we also now have typehints for
the import state.
The return type of the stage decorator should in theory be `T|None`
but the return of task types is not consistent in its usage. Would need
some bigger changes for which I'm not ready at the moment.
responsibilities. I identified the following:
- session
- state
- tasks
- stages
Should allow to make changes more rapidly in the future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant