Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Importer doesn't create general contributions #1839

Closed
janno42 opened this issue Dec 28, 2022 · 6 comments · Fixed by #1843
Closed

Importer doesn't create general contributions #1839

janno42 opened this issue Dec 28, 2022 · 6 comments · Fixed by #1843
Assignees
Labels
[C] Backend Focuses on backend implementation [P] Major Major priority [T] Bug This is a bug. We don't like it. Please get rid of it.

Comments

@janno42
Copy link
Member

janno42 commented Dec 28, 2022

When importing enrollment data, no general_contributions are created for the newly added evaluations. This prevents managers from, e.g., using the "assign questionnaires" feature which fails with an AttributeError:

File "staff/views.py", line 802, in semester_questionnaire_assign
    evaluation.general_contribution.questionnaires.set(form.cleaned_data[evaluation.course.type.name])

Exception Type: AttributeError
Exception Value: 'NoneType' object has no attribute 'questionnaires'

The importer should make sure that every new evaluation has a general_contribution.

@janno42 janno42 added [C] Backend Focuses on backend implementation [T] Bug This is a bug. We don't like it. Please get rid of it. [P] Major Major priority labels Dec 28, 2022
@niklasmohrin
Copy link
Member

niklasmohrin commented Dec 28, 2022

Hm, shouldn't the questionnaire assign view just create it if users tried to use the general contribution? Or, if we dont want to allow that, it shouldnt display an option to assign to the general questionnaire, right?

Do we currently have some invariants about what questionnaires exist for an evaluation? If so, we should probably enforce them with database contraints (#1800 could get another point)

@richardebeling
Copy link
Member

So the problem is that the importer uses bulk_create instead of save to be faster, but we use save to create additional objects, in this case, the general contribution.

I wouldn't want to duplicate the "when a new evaluation is created, we also must create a general contribution for it" logic across the codebase.
I think the approach of overriding save and having pre_save / post_save signals while also having methods like bulk_create that completely circumvent them is generally broken in django.

We could defer creation of the general contribution to its first access using the getter for the general_contribution property. However, code can also try to access the database object via Contribution.objects.filter(..., contributor=None). This would bypass any getters, causing errors. Searching for contributor=None gives me 17 places that look like they might break if the object doesn't exist in the database.

If we use save instead of bulk_* in the importer, the importer tests will get 4 seconds slower (11s instead of 7s), and the import of somewhat realistically sized file (500 students, 100 courses, 4 enrollments per student) will get ~8s slower (currently 40s vs 48s, but that's due to a performance bug in openpyxl. There is no good reason for the importer to take longer than a second, I think).

I would really like to keep the bulk operations for performance, but the only way to make them work with save() overrides and pre_save/post_save signals is with a custom, third method that performs all additional on_save() logic, and can take a list of objects, so it can work together with bulk operations. That would be a lot of extra infrastructure.

As a quick fix, we can revert to save() in update_existing_and_create_new_courses, store_participations_in_db, and update_existing_and_create_new_user_profiles.

For production right now, calling the save() on all instances without a general contribution should fix the problem, but might create weird log entries.

@richardebeling
Copy link
Member

After #1841, it's ~7s (bulk_create) vs ~12s (save) for me locally for a 6000 rows import file. Probably acceptable for now, as it only affects the import run and not test runs.

@niklasmohrin
Copy link
Member

Sooooo, should we take the 8s hit in a hotfix now-ish, and make it faster again next year? Or should I review the performance PR and we merge that in the next couple of days?

@richardebeling
Copy link
Member

Sooooo, should we take the 8s hit in a hotfix now-ish, and make it faster again next year? Or should I review the performance PR and we merge that in the next couple of days?

The performance PR should be unrelated, I just wanted to have it to have some numbers regarding performance impact of save() vs bulk_create. We can revert to using save in the importer for now.


I just stumbled across this test:

def test_has_enough_questionnaires(self):
# manually circumvent Evaluation's save() method to have a Evaluation without a general contribution
# the semester must be specified because of https://github.com/vandersonmota/model_bakery/issues/258
course = baker.make(Course, semester=baker.make(Semester), type=baker.make(CourseType))
Evaluation.objects.bulk_create([baker.prepare(Evaluation, course=course)])
evaluation = Evaluation.objects.get()
self.assertEqual(evaluation.contributions.count(), 0)
self.assertFalse(evaluation.general_contribution_has_questionnaires)
self.assertFalse(evaluation.all_contributions_have_questionnaires)

and I was a bit confused here. We create an evaluation without a general_contribution instance and expect the model methods to still work properly?

Also, the getter for the general contribution explicitly handles the "does not exist yet" state, and explicitly returns None instead of just creating the object in the database:

@cached_property
def general_contribution(self):
if self.pk is None:
return None
try:
return self.contributions.get(contributor=None)
except Contribution.DoesNotExist:
return None

Is the intention here to handle Evaluation instances that were not yet saved to the database? The test doesn't express that as it explicitly uses bulk_create with prepare instead of just prepare.

I guess this comes back to @niklasmohrin's point in #1839 (comment): What is our invariant? I see three possibilities

  1. We do not allow Evaluations without a general contribution.
    • This does not work for newly created instances that are not yet saved to database.
  2. We do not allow Evaluations stored in the database without a general contribution.
    • This matches our implementations of save and (at least some) methods that could be called with unsaved objects.
    • It requires all accesses to instances that are not known to be in the database to handle the case that the contribution does not yet exist -- I'd bet there's cases where we don't properly check this.
    • The test above would be wrong in this case.
  3. Evaluations might not have a contribution.
    • Shouldn't be harder to handle than 2.? Code that requires existance could just use a get-or-create getter.
    • The importer would be correct in this case.

We considered allowing not to have a general contribution in #1246, it seems there was no good fundamental reason to enfore existence of the general contribution, it just happened to be what we first implemented. We didn't consider bulk_ operations circumventing save() though.

Related (and funny) PR / issue combination: #1259 and #1260.

@niklasmohrin
Copy link
Member

Is the intention here to handle Evaluation instances that were not yet saved to the database? The test doesn't express that as it explicitly uses bulk_create with prepare instead of just prepare.

If that's the case, I suppose the try-except could be obsolete with the pk is None check I had to add with Django 4.1.


I was trying to come up with a good solution for thirty minutes or so and I always come back wondering whether we should have class SingleResult(Evaluation) and class VotingResult(Evaluation) (or OfflineResult, OnlineResult or whatever). This would probably be the assert set(Answer.__subclasses__()) == {TextAnswer, RatingAnswerCounter} situation all over again. Honestly, they can (or should) even reside in the same table and share a key-space for other models to use, but that's just not how Django works :(

Maybe at the other end: class GeneralContribution(Contribution), class PersonContribution(Contribution)? Then, we add OneToOneField and sleep more easily, but get more complex templates (maybe) and probably a bit slower sites (is there something like always_prefetch or inline_foreign_key?).


If we keep the models as they are: I don't know, I think always having the general contribution seems like the easier invariant to reason about.

I'd bet there's cases where we don't properly check this.

Yeah, probably. Let's annoy our future selves and just throw an exception in the general_contribution getter and add the db constraint (and re-check that the constraint holds in production). I am tempted to vote for get-or-create, but I feel like we are going to regret that. Imagine debugging another O(n) queries on contributor-secondary-page-this-is-cool-info just because we queried whether people usually answer question X about them and some code somewhere used the wrong getter. Let's at least fail early (and in the DB, beyond the weird Django behaviors)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[C] Backend Focuses on backend implementation [P] Major Major priority [T] Bug This is a bug. We don't like it. Please get rid of it.
Development

Successfully merging a pull request may close this issue.

3 participants