Importer doesn't create general contributions #1839

janno42 · 2022-12-28T10:16:25Z

When importing enrollment data, no general_contributions are created for the newly added evaluations. This prevents managers from, e.g., using the "assign questionnaires" feature which fails with an AttributeError:

File "staff/views.py", line 802, in semester_questionnaire_assign
    evaluation.general_contribution.questionnaires.set(form.cleaned_data[evaluation.course.type.name])

Exception Type: AttributeError
Exception Value: 'NoneType' object has no attribute 'questionnaires'

The importer should make sure that every new evaluation has a general_contribution.

The text was updated successfully, but these errors were encountered:

niklasmohrin · 2022-12-28T12:09:40Z

Hm, shouldn't the questionnaire assign view just create it if users tried to use the general contribution? Or, if we dont want to allow that, it shouldnt display an option to assign to the general questionnaire, right?

Do we currently have some invariants about what questionnaires exist for an evaluation? If so, we should probably enforce them with database contraints (#1800 could get another point)

richardebeling · 2022-12-28T15:34:15Z

So the problem is that the importer uses bulk_create instead of save to be faster, but we use save to create additional objects, in this case, the general contribution.

I wouldn't want to duplicate the "when a new evaluation is created, we also must create a general contribution for it" logic across the codebase.
I think the approach of overriding save and having pre_save / post_save signals while also having methods like bulk_create that completely circumvent them is generally broken in django.

We could defer creation of the general contribution to its first access using the getter for the general_contribution property. However, code can also try to access the database object via Contribution.objects.filter(..., contributor=None). This would bypass any getters, causing errors. Searching for contributor=None gives me 17 places that look like they might break if the object doesn't exist in the database.

If we use save instead of bulk_* in the importer, the importer tests will get 4 seconds slower (11s instead of 7s), and the import of somewhat realistically sized file (500 students, 100 courses, 4 enrollments per student) will get ~8s slower (currently 40s vs 48s, but that's due to a performance bug in openpyxl. There is no good reason for the importer to take longer than a second, I think).

I would really like to keep the bulk operations for performance, but the only way to make them work with save() overrides and pre_save/post_save signals is with a custom, third method that performs all additional on_save() logic, and can take a list of objects, so it can work together with bulk operations. That would be a lot of extra infrastructure.

As a quick fix, we can revert to save() in update_existing_and_create_new_courses, store_participations_in_db, and update_existing_and_create_new_user_profiles.

For production right now, calling the save() on all instances without a general contribution should fix the problem, but might create weird log entries.

richardebeling · 2022-12-29T22:16:58Z

After #1841, it's ~7s (bulk_create) vs ~12s (save) for me locally for a 6000 rows import file. Probably acceptable for now, as it only affects the import run and not test runs.

niklasmohrin · 2022-12-30T13:09:11Z

Sooooo, should we take the 8s hit in a hotfix now-ish, and make it faster again next year? Or should I review the performance PR and we merge that in the next couple of days?

richardebeling · 2022-12-30T16:10:58Z

Sooooo, should we take the 8s hit in a hotfix now-ish, and make it faster again next year? Or should I review the performance PR and we merge that in the next couple of days?

The performance PR should be unrelated, I just wanted to have it to have some numbers regarding performance impact of save() vs bulk_create. We can revert to using save in the importer for now.

I just stumbled across this test:

EvaP/evap/evaluation/tests/test_models.py

Lines 212 to 220 in e889dce

    
           def test_has_enough_questionnaires(self): 
        
               # manually circumvent Evaluation's save() method to have a Evaluation without a general contribution 
        
               # the semester must be specified because of https://github.com/vandersonmota/model_bakery/issues/258 
        
               course = baker.make(Course, semester=baker.make(Semester), type=baker.make(CourseType)) 
        
               Evaluation.objects.bulk_create([baker.prepare(Evaluation, course=course)]) 
        
               evaluation = Evaluation.objects.get() 
        
               self.assertEqual(evaluation.contributions.count(), 0) 
        
               self.assertFalse(evaluation.general_contribution_has_questionnaires) 
        
               self.assertFalse(evaluation.all_contributions_have_questionnaires)

and I was a bit confused here. We create an evaluation without a general_contribution instance and expect the model methods to still work properly?

Also, the getter for the general contribution explicitly handles the "does not exist yet" state, and explicitly returns None instead of just creating the object in the database:

EvaP/evap/evaluation/models.py

Lines 776 to 784 in e889dce

    
           @cached_property 
        
           def general_contribution(self): 
        
               if self.pk is None: 
        
                   return None 
        
               try: 
        
                   return self.contributions.get(contributor=None) 
        
               except Contribution.DoesNotExist: 
        
                   return None

Is the intention here to handle Evaluation instances that were not yet saved to the database? The test doesn't express that as it explicitly uses bulk_create with prepare instead of just prepare.

I guess this comes back to @niklasmohrin's point in #1839 (comment): What is our invariant? I see three possibilities

We do not allow Evaluations without a general contribution.
- This does not work for newly created instances that are not yet saved to database.
We do not allow Evaluations stored in the database without a general contribution.
- This matches our implementations of save and (at least some) methods that could be called with unsaved objects.
- It requires all accesses to instances that are not known to be in the database to handle the case that the contribution does not yet exist -- I'd bet there's cases where we don't properly check this.
- The test above would be wrong in this case.
Evaluations might not have a contribution.
- Shouldn't be harder to handle than 2.? Code that requires existance could just use a get-or-create getter.
- The importer would be correct in this case.

We considered allowing not to have a general contribution in #1246, it seems there was no good fundamental reason to enfore existence of the general contribution, it just happened to be what we first implemented. We didn't consider bulk_ operations circumventing save() though.

Related (and funny) PR / issue combination: #1259 and #1260.

niklasmohrin · 2022-12-30T20:49:51Z

Is the intention here to handle Evaluation instances that were not yet saved to the database? The test doesn't express that as it explicitly uses bulk_create with prepare instead of just prepare.

If that's the case, I suppose the try-except could be obsolete with the pk is None check I had to add with Django 4.1.

I was trying to come up with a good solution for thirty minutes or so and I always come back wondering whether we should have class SingleResult(Evaluation) and class VotingResult(Evaluation) (or OfflineResult, OnlineResult or whatever). This would probably be the assert set(Answer.__subclasses__()) == {TextAnswer, RatingAnswerCounter} situation all over again. Honestly, they can (or should) even reside in the same table and share a key-space for other models to use, but that's just not how Django works :(

Maybe at the other end: class GeneralContribution(Contribution), class PersonContribution(Contribution)? Then, we add OneToOneField and sleep more easily, but get more complex templates (maybe) and probably a bit slower sites (is there something like always_prefetch or inline_foreign_key?).

If we keep the models as they are: I don't know, I think always having the general contribution seems like the easier invariant to reason about.

I'd bet there's cases where we don't properly check this.

Yeah, probably. Let's annoy our future selves and just throw an exception in the general_contribution getter and add the db constraint (and re-check that the constraint holds in production). I am tempted to vote for get-or-create, but I feel like we are going to regret that. Imagine debugging another O(n) queries on contributor-secondary-page-this-is-cool-info just because we queried whether people usually answer question X about them and some code somewhere used the wrong getter. Let's at least fail early (and in the DB, beyond the weird Django behaviors)

…eate Importers: Use .save() instead of `bulk_` operations to fix #1839

janno42 added [C] Backend Focuses on backend implementation [T] Bug This is a bug. We don't like it. Please get rid of it. [P] Major Major priority labels Dec 28, 2022

richardebeling mentioned this issue Dec 30, 2022

Prefetching #1842

Merged

richardebeling pushed a commit to richardebeling/EvaP that referenced this issue Dec 30, 2022

Importers: Use .save() to fix e-valuation#1839

6927e11

richardebeling mentioned this issue Dec 30, 2022

Importers: Use .save() instead of bulk_ operations to fix #1839 #1843

Merged

richardebeling self-assigned this Dec 30, 2022

richardebeling closed this as completed in #1843 Jan 8, 2023

richardebeling added a commit that referenced this issue Jan 8, 2023

Merge pull request #1843 from He3lixxx/import-save-instead-of-bulk-cr…

cf870ef

…eate Importers: Use .save() instead of `bulk_` operations to fix #1839

This was referenced Jan 8, 2023

Django's bulk_ database operations circumvent .save() side effects #1846

Open

Invariant for an evaluation's general contribution #1847

Open

richardebeling mentioned this issue Jun 19, 2023

Test that the enrollment importer creates general contributions #1966

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Importer doesn't create general contributions #1839

Importer doesn't create general contributions #1839

janno42 commented Dec 28, 2022 •

edited

Loading

niklasmohrin commented Dec 28, 2022 •

edited

Loading

richardebeling commented Dec 28, 2022

richardebeling commented Dec 29, 2022

niklasmohrin commented Dec 30, 2022

richardebeling commented Dec 30, 2022

niklasmohrin commented Dec 30, 2022

Importer doesn't create general contributions #1839

Importer doesn't create general contributions #1839

Comments

janno42 commented Dec 28, 2022 • edited Loading

niklasmohrin commented Dec 28, 2022 • edited Loading

richardebeling commented Dec 28, 2022

richardebeling commented Dec 29, 2022

niklasmohrin commented Dec 30, 2022

richardebeling commented Dec 30, 2022

niklasmohrin commented Dec 30, 2022

janno42 commented Dec 28, 2022 •

edited

Loading

niklasmohrin commented Dec 28, 2022 •

edited

Loading