-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite staff importers (UserImporter, EnrollmentImporter) #1634
Rewrite staff importers (UserImporter, EnrollmentImporter) #1634
Conversation
bfec022
to
02e5ee2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
didn't read till the end, but here are some thoughts along the way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- at 1400 lines, this is rather long. Consider splitting this up, e.g. putting every importer in its own file. Since this basically a full rewrite, there's no git history lost either ;)
- The importer classes are currently only scopes for a set of methods, they are never instantiated. With separate files, there wouldn't be any need for the classes. I suggest to also mark all of their methods but the process one as private.
Generally, this is really a lot of code to understand. I know it was your goal to make this more easily extensible, but I think the current structure achieves this only partially: IF you figure out the structure, it's simple to not make any mistakes. BUT currently you need a lot of time to understand the structure, and people might not be willing to spend this time before extending this.
- E.g., try to understand the CourseDataAdapter. It's only 14 LOC and a few comments, but it's using/mentioning a total of 7 classes, containing a finalize which does nothing (?), requires explanation why Exceptions are being swallowed, and all it does is basically
course_data_checker.check_course_data(row.get_course_data(...))
. Can this be inlined intoCourseDataMismatchChecker
? Or maybe I'm just not used to the adapter pattern and if I were, I could just gloss over this... - Also in the enrollment importer, course data is mapped three times: first in
Invalid[Degree|CourseType|IsGraded]Checker
andCourseNameChecker
, then inCourseDataAdapter
for theCourseDataMismatchChecker
, then inmap_enrollment_input_rows
. This also uses three Mapper classes and a mixture of exceptions and error messages. All this took a while to understand.- Here's a proposal:
map_enrollment_input_rows
runs first, and somehow usesInvalid[Degree|CourseType|IsGraded]Checker
. If those say that the value is not valid, they make itNone
or an error value. ThenCourseNameChecker
andCourseDataMismatchChecker
would run on parsed rows, ignoring any error value. - alternatively the mapping could run after the first three checkers, so that the other two can run on the parsed data.
- Maybe that's not the best way. But I do have the feeling that dissolving the mappers and adapters would result in a lot less structure to understand, and the remaining classes might not be that much larger.
- Here's a proposal:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are multiple Checkers which have no error aggregation, but you seem to know that already. But there are also a couple checkers already doing some aggregation, but not using the Tracker.
Also, I would suggest to not split the classes any further. Smaller code units are not always better ;) Smaller code units might make the individual parts easier to understand, but it doesn't really remove the complexity, but rather moves it from the code itself into the code structure. kinda like microservices :D One could argue that small code units are easier to test. In this case here, the individual parts are implementation details and shouldn't be tested anyway in my opinion. Something something about all the boilerplate code something, and the pendulum swinging from monoliths to small units and back. I'm rather tired now :D |
One last thought. You said you want to have the invariant that the input data is always valid or something like that. I didn't really get that feeling from the code. I think more assertions and less checking-again-for-good-measure would communicate your intention better. |
Random thought, if you want to scope-creep your refactor, you could improve testability of the importer. A big step towards better testability might be having the test data in code instead of less-nice-to-edit xls files. Either the test data from the code could first be converted to an xls file, or the importers would need to enable tests to insert non-xls input somewhere. |
82b782a
to
7174862
Compare
7174862
to
837a516
Compare
837a516
to
6ad9e31
Compare
1aff4e9
to
e8484ec
Compare
@karyon @niklasmohrin all the stuff suggested that I marked as resolved is now incorporated. The structuring of components across files is still open, there's one TODO in the code for that. IDK. If you want to review again, you can do that. I can also try to move stuff around files first if you prefer that. It's not much less code than before, mainly because the distinction between "PartialCourseData", which might be incomplete, and "CourseData" which is guaranteed to be complete, forces a translation step between them. I couldn't bring myself to store invalid values without any reminders/guarantees in CourseData though. I think the current approach allows humans to reason about correctness that wouldn't be given otherwise. |
d17f073
to
c0bebf7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(review in progress)
b7c5f39
to
e7af028
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like what I see 👍 (not completely through yet)
I really like the changes, I feel like it really helps when trying to find / understand where something particular is happening without having to backward-slice (swt namedrop, I know) for some importer-wide state modifications 👍 I haven't proof-read whether the logic is actually correctly mapped to the new code, but I would defer that until final review (that is, until @karyon is through again); and the tests should have already done most of this work for me :^) |
I've removed the importer classes, they are now methods ( Some discussions in this PR are still unresolved, waiting for input there. Apart from that, waiting for review. |
I'd say only xlsx is fine. No need to add more complexity to the importer :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
haven't read everything (again), good stuff
970339e
to
a41e9bf
Compare
a41e9bf
to
fc6ef37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In person we found that functions / methods that dont have any annotations in the signature are not checked; in a follow up, we / I should add -> None
as a default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked over user.py
, seems good, some notes:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now read until the CourseNameChecker
in enrollment.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done with enrollment
reading with ConvertExceptionsToMessages(importer_log):
pipeline = Pipeline(
ExcelFileRowMapper(skip_first_n_rows=1, row_cls=EnrollmentInputRow),
RaiseIfErrors(),
EnrollmentInputRowMapper(),
TooManyEnrollmentsChecker(test_run),
CourseDataAdapter(CourseNameChecker(test_run, semester=semester)),
CourseDataAdapter(CourseDataMismatchChecker(test_run)),
UserDataAdapter(UserDataEmptyFieldsChecker(test_run)),
UserDataAdapter(UserDataMismatchChecker(test_run)),
UserDataAdapter(UserDataValidationChecker(test_run)),
RaiseIfErrors(),
importer_log=importer_log,
)
parsed_rows = pipeline.run(excel_content) This will however require some more (typing?) effort without much "business value" - maybe it will be interesting to look into at some point though :^) |
Created #1795 for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Idea / Model is that importers start with the binary excel content and then put their current state through a series of steps that each gets them a bit closer to their result. On errors, it simply raises an error aborting the import. After each step, checkers can be run to add errors or warnings that are more user friendly. This should allow to understand the existing checks and add new checks very easily.
5bc2131
to
dd64f43
Compare
Preparation to solving #1574.
Minimal implementation to pass all existing tests. Putting this here already to get some feedback on the overall architecture.
The idea is that importers start with the binary excel content and then put their current state through a series of steps that each gets them a bit closer to their result. On errors, it simply raises an error aborting the import. After each step, checkers can be run to add errors or warnings that are more user friendly. This hopefully allows to understand the existing checks and add new checks very easily.
I tried to separate and encapsulate the different concerns as much as possible: Logic required for an import, errors/warnings, message handling
Its probably best check out how user import and enrollment import files look like before looking at the code.
related code changes that need to be integrated: