-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #2 from RGOODSFR/FactoryBoy-optim
FactoryBoy optim
- Loading branch information
Showing
2 changed files
with
154 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
Title: Factory-Boy : Optimize bulk creation | ||
Date: 2024-09-12 16:42 | ||
Id: 0005 | ||
Slug: optimize-bulk-creation-factory_boy | ||
Lang: en | ||
Category: development | ||
Tags: django, tests | ||
Summary: Speed up large dataset creation in factory boy | ||
|
||
# The problem | ||
|
||
When creating large dataset using [factory_boy](https://pypi.org/project/factory-boy/), you may find yourself using [`MyFactory.create_batch()`](https://factoryboy.readthedocs.io/en/stable/reference.html#factory.create_batch) which is great for specifyng a size, but falls short in terms of performance when using factories based on Django models. | ||
|
||
Indeed, here's the related source code: | ||
```python | ||
@classmethod | ||
def create_batch(cls, size, **kwargs): | ||
"""Create a batch of instances of the given class, with overridden attrs. | ||
Args: | ||
size (int): the number of instances to create | ||
Returns: | ||
object list: the created instances | ||
""" | ||
return [cls.create(**kwargs) for _ in range(size)] | ||
``` | ||
|
||
This means that an instance is generated and created for each iteration, resulting in numerous SQL queries, especially if your factory uses `SubFactory` (related model factories). | ||
|
||
# The solution | ||
|
||
To prevent too much SQL queries, it would be better to use `bulk_create` from the Django manager. | ||
|
||
A simple solution can be to generate the instances and then saving them, you can also pass parameters (`notifications_enabled` for example): | ||
```python | ||
class ContactFactory(DjangoModelFactory): | ||
class Meta: | ||
model = Contact | ||
|
||
# Create a thousand contacts | ||
contact_list = ContactFactory.simple_generate_batch( | ||
create=False, size=1000, notifications_enabled=True | ||
) | ||
contact_list = Contact.objects.bulk_create(contact_list) | ||
``` | ||
|
||
But what if our factory has a `SubFactory`? You would certainly hit a N+1 problem. To overcome it, you may bulk create sequentially, while retaining primary keys: | ||
|
||
```python | ||
class ContactFactory(DjangoModelFactory): | ||
class Meta: | ||
model = Contact | ||
|
||
class NotificationFactory(DjangoModelFactory) | ||
contact = factory.SubFactory(ContactFactory) | ||
|
||
class Meta: | ||
model = Notification | ||
|
||
size = 1000 | ||
# Create contacts | ||
contact_list = ContactFactory.simple_generate_batch( | ||
create=False, size=size, notifications_enabled=True | ||
) | ||
contact_list = Contact.objects.bulk_create(contact_list) | ||
|
||
# Create a notification for each contact | ||
obj_list = NotificationFactory.simple_generate_batch( | ||
create=False, size=size, contact=None | ||
) | ||
for pos, obj in enumerate(obj_list): | ||
obj.contact_id = contact_list[pos].pk | ||
Notification.objects.bulk_create(obj_list) | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
Title : Factory-Boy : Optimiser la création en masse | ||
Date : 2024-09-12 16:42 | ||
Id : 0005 | ||
Slug : optimiser-creation-en-masse-factory_boy | ||
Lang : fr | ||
Category : développement | ||
Tags : django, tests | ||
Summary : Accélérer la création de grands ensembles de données avec factory_boy | ||
|
||
|
||
# Le problème | ||
|
||
Lorsque vous créez de grands ensembles de données en utilisant [factory_boy](https://pypi.org/project/factory-boy/), vous pouvez vous retrouver à utiliser [`MyFactory.create_batch()`](https://factoryboy.readthedocs.io/en/stable/reference.html#factory.create_batch), ce qui est excellent pour spécifier la taille de la liste, mais est limité en termes de performance lorsqu'il s'agit de factory basées sur des modèles Django. | ||
|
||
En effet, voici le code source concerné : | ||
|
||
```python | ||
@classmethod | ||
def create_batch(cls, size, **kwargs): | ||
"""Create a batch of instances of the given class, with overridden attrs. | ||
Args: | ||
size (int): the number of instances to create | ||
Returns: | ||
object list: the created instances | ||
""" | ||
return [cls.create(**kwargs) for _ in range(size)] | ||
``` | ||
|
||
Cela signifie qu'une instance est générée et créée pour chaque itération, entraînant de nombreuses requêtes SQL, surtout si votre factory utilise `SubFactory` (lorsqu'un modèle est associé via une `ForeignKey`). | ||
|
||
# La solution | ||
|
||
Pour éviter trop de requêtes SQL, il serait préférable d'utiliser `bulk_create` depuis le manager Django. | ||
|
||
Une solution simple consiste à générer les instances puis à les sauvegarder. Vous pouvez également passer des paramètres (comme `notifications_enabled` par exemple) : | ||
|
||
```python | ||
class ContactFactory(DjangoModelFactory): | ||
class Meta: | ||
model = Contact | ||
|
||
# Créer mille contacts | ||
contact_list = ContactFactory.simple_generate_batch( | ||
create=False, size=1000, notifications_enabled=True | ||
) | ||
contact_list = Contact.objects.bulk_create(contact_list) | ||
``` | ||
|
||
Mais que faire si notre factory a une `SubFactory` ? Vous rencontrerez certainement un problème de N+1. Pour y remédier, vous pouvez créer en masse séquentiellement, en conservant les clés primaires : | ||
|
||
|
||
```python | ||
class ContactFactory(DjangoModelFactory): | ||
class Meta: | ||
model = Contact | ||
|
||
class NotificationFactory(DjangoModelFactory) | ||
contact = factory.SubFactory(ContactFactory) | ||
|
||
class Meta: | ||
model = Notification | ||
|
||
size = 1000 | ||
# Création des contacts | ||
contact_list = ContactFactory.simple_generate_batch( | ||
create=False, size=size, notifications_enabled=True | ||
) | ||
contact_list = Contact.objects.bulk_create(contact_list) | ||
|
||
# Créer une notification pour chaque contact | ||
obj_list = NotificationFactory.simple_generate_batch( | ||
create=False, size=size, contact=None | ||
) | ||
for pos, obj in enumerate(obj_list): | ||
obj.contact_id = contact_list[pos].pk | ||
Notification.objects.bulk_create(obj_list) | ||
``` |