Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify change field to be a json field. #407

Merged

Conversation

sum-rock
Copy link
Contributor

@sum-rock sum-rock commented Aug 3, 2022

Storing the object changes as a json is preferred because it allows SQL
queries to access the change values. This work moves the burden of
handling json objects from an implementation of python's json library in
this package and puts it instead onto the ORM. Ultimately, having the
text field store the changes was leaving them less accessible to external
systems and code that is written outside the scope of the django
auditlog.

This change was accomplished by updating the field type on the model and
then removing the JSON dumps invocations on write and JSON loads
invocations on read. Test were updated to assert equality of
dictionaries rather than equality of JSON parsable text.

Separately, it was asserted that postgres will make these changes to
existing data. Therefore, existing postgres installations should update the
type of existing field values without issue.

@sum-rock sum-rock closed this Aug 3, 2022
@sum-rock sum-rock reopened this Aug 3, 2022
@codecov
Copy link

codecov bot commented Aug 3, 2022

Codecov Report

Merging #407 (56f9da0) into master (c65c38e) will increase coverage by 0.46%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #407      +/-   ##
==========================================
+ Coverage   93.55%   94.01%   +0.46%     
==========================================
  Files          29       30       +1     
  Lines         869      869              
==========================================
+ Hits          813      817       +4     
+ Misses         56       52       -4     
Impacted Files Coverage Δ
auditlog/receivers.py 100.00% <ø> (ø)
auditlog/migrations/0015_alter_logentry_changes.py 100.00% <100.00%> (ø)
auditlog/models.py 88.65% <100.00%> (+0.68%) ⬆️
auditlog/mixins.py 91.07% <0.00%> (+1.78%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sum-rock sum-rock force-pushed the modify-change-field-to-json branch from 0997c30 to f325b97 Compare August 3, 2022 22:20
@hramezani
Copy link
Member

Thanks @sum-rock for the patch 👍
I haven't checked it completely, but I think migration is missed and you need to add a changelog as well.

@alieh-rymasheuski @Linkid Could you please take a look?

Copy link
Contributor

@Linkid Linkid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. But in fact, I'm agree with @hramezani.

@sum-rock sum-rock force-pushed the modify-change-field-to-json branch from f325b97 to 5b1ccc6 Compare August 4, 2022 14:38
CHANGELOG.md Show resolved Hide resolved
@sum-rock
Copy link
Contributor Author

sum-rock commented Aug 4, 2022

@hramezani @Linkid Thanks for the review. I added a migration and updated the changelog. Sorry about the missing migration! I'm not experienced in contributing to Django packages with migrations. I managed to make it happen in the python console with the following... Out of curiosity, is there a better way to do that?

import os
import django
from django.core.management import call_command

os.environ["DJANGO_SETTINGS_MODULE"] = "auditlog_tests.test_settings"
django.setup()
call_command('makemigrations')

@hramezani
Copy link
Member

@hramezani @Linkid Thanks for the review. I added a migration and updated the changelog. Sorry about the missing migration! I'm not experienced in contributing to Django packages with migrations. I managed to make it happen in the python console with the following... Out of curiosity, is there a better way to do that?

import os
import django
from django.core.management import call_command

os.environ["DJANGO_SETTINGS_MODULE"] = "auditlog_tests.test_settings"
django.setup()
call_command('makemigrations')

Thanks @sum-rock for your update.
I usually have a local project that uses my local django-auditlog. So, I change the model and run the make makemigrations command on my test project.

CHANGELOG.md Outdated Show resolved Hide resolved
@sum-rock
Copy link
Contributor Author

sum-rock commented Aug 4, 2022

@hramezani I noticed in the migration file that you suggested it changed the null=True to blank=True should this be changed on the model too? I can understand if you'd like to maintain a blank=True if that's going to conflict with existing installations. I just want to make sure the models match the migrations.

(Also, that last force push was done to squash a commit that formatted the migration file with black since this failed the auto-checks.)

@aleh-rymasheuski
Copy link
Contributor

aleh-rymasheuski commented Aug 4, 2022

I like the idea of this pull request. I've got just two major concerns:

  1. Are there any users of django-auditlog who don't have native support for JSON field in their DBMS?
  2. How this migration will perform on databases with millions of log entries? This migration will lock the whole table for the duration of the transformation.

And a lesser concern: this is a breaking change because it changes the existing schema (== the API of the package) so it will require us to bump the semver to 2.0 3.0.

@hramezani
Copy link
Member

@hramezani I noticed in the migration file that you suggested it changed the null=True to blank=True should this be changed on the model too? I can understand if you'd like to maintain a blank=True if that's going to conflict with existing installations. I just want to make sure the models match the migrations.

(Also, that last force push was done to squash a commit that formatted the migration file with black since this failed the auto-checks.)

Sorry for that. null=True is the right one.

@hramezani
Copy link
Member

@alieh-rymasheuski

Are there any users of django-auditlog who don't have native support for JSON field in their DBMS?

As all official DB backends in Django support JSONField I think it would be ok

How this migration will perform on databases with millions of log entries? This migration will lock the whole table for the duration of the transformation.

This is my concern as well. I was about to test locally and see what's happening. It would be great if you @sum-rock and @alieh-rymasheuski do it as well.

@hramezani
Copy link
Member

@sum-rock @alieh-rymasheuski I've tested the change on SQLite3 and PostgreSQL. here are the results:

On PostgreSQL, with 320000 records in LogEntry. The migration took 5 second:

Here is the sqlmigrate management command output:

BEGIN;
--
-- Alter field changes on logentry
--
ALTER TABLE "auditlog_logentry" ALTER COLUMN "changes" TYPE jsonb USING "changes"::jsonb, ALTER COLUMN "changes" DROP NOT NULL;
COMMIT;

On SQLlite, with 50000 records in LogEntry. The migration took less than one second:

Here is the sqlmigrate management command output:

BEGIN;
--
-- Alter field changes on logentry
--
CREATE TABLE "new__auditlog_logentry" ("id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "changes" text NULL CHECK ((JSON_VALID("changes") OR "changes" IS NULL)), "object_pk" varchar(255) NOT NULL, "object_id" bigint NULL, "object_repr" text NOT NULL, "action" smallint unsigned NOT NULL CHECK ("action" >= 0), "timestamp" datetime NOT NULL, "actor_id" integer NULL REFERENCES "auth_user" ("id") DEFERRABLE INITIALLY DEFERRED, "content_type_id" integer NOT NULL REFERENCES "django_content_type" ("id") DEFERRABLE INITIALLY DEFERRED, "remote_addr" char(39) NULL, "additional_data" text NULL CHECK ((JSON_VALID("additional_data") OR "additional_data" IS NULL)));
INSERT INTO "new__auditlog_logentry" ("id", "object_pk", "object_id", "object_repr", "action", "timestamp", "actor_id", "content_type_id", "remote_addr", "additional_data", "changes") SELECT "id", "object_pk", "object_id", "object_repr", "action", "timestamp", "actor_id", "content_type_id", "remote_addr", "additional_data", "changes" FROM "auditlog_logentry";
DROP TABLE "auditlog_logentry";
ALTER TABLE "new__auditlog_logentry" RENAME TO "auditlog_logentry";
CREATE INDEX "auditlog_logentry_object_pk_6e3219c0" ON "auditlog_logentry" ("object_pk");
CREATE INDEX "auditlog_logentry_object_id_09c2eee8" ON "auditlog_logentry" ("object_id");
CREATE INDEX "auditlog_logentry_action_229afe39" ON "auditlog_logentry" ("action");
CREATE INDEX "auditlog_logentry_timestamp_37867bb0" ON "auditlog_logentry" ("timestamp");
CREATE INDEX "auditlog_logentry_actor_id_959271d2" ON "auditlog_logentry" ("actor_id");
CREATE INDEX "auditlog_logentry_content_type_id_75830218" ON "auditlog_logentry" ("content_type_id");
COMMIT;

@sum-rock sum-rock force-pushed the modify-change-field-to-json branch from 4442147 to ffa6f02 Compare August 4, 2022 19:47
@sum-rock
Copy link
Contributor Author

sum-rock commented Aug 5, 2022

@hramezani I created 502,000 LogEntry records using a fresh installation of auditlog in a development environment running our project. I am using a local Postgres instance running in docker. All LogEntry objects were created with a script that was executed in the terminal. I made test changes to actual model objects to trigger the new LogEntry records, (as opposed to creating LogEntry objects directly). 2,000 records were large updates with lots of things to track. 500,000 records were created by adding dummy records to a smaller object.

I pulled this branch and installed the update. When I ran manage.py migrate it took 6.418 seconds.

CHANGELOG.md Outdated Show resolved Hide resolved
@sum-rock sum-rock requested a review from hramezani August 5, 2022 14:30
@hramezani
Copy link
Member

Thanks @sum-rock The patch seems good to me now.
I will merge it when we are going to prepare version 2 (I don't know when exactly)

@sum-rock
Copy link
Contributor Author

sum-rock commented Aug 5, 2022

Thanks everyone for the collaboration! I felt like this was a natural progression to the code and fits within the vision for your project. I'm glad you guys agree!

I'm in the process of creating another PR that would include a different breaking change. However, this next one is more opinionated so you guys might not want it. It's necessary for our project though.

Linkid
Linkid previously approved these changes Aug 5, 2022
@Linkid Linkid self-requested a review August 5, 2022 15:11
@Linkid Linkid dismissed their stale review August 5, 2022 15:13

I let @hramezani approve the PR to unlock the merge, but I'm ok with this PR.

@hramezani
Copy link
Member

Thanks everyone for the collaboration! I felt like this was a natural progression to the code and fits within the vision for your project. I'm glad you guys agree!

I'm in the process of creating another PR that would include a different breaking change. However, this next one is more opinionated so you guys might not want it. It's necessary for our project though.

I would suggest first create an issue for that. Then we can discuss there

@aleh-rymasheuski aleh-rymasheuski added this to the 3.0 milestone Nov 14, 2022
@hramezani
Copy link
Member

@sum-rock we are preparing v3.0. Could you please refresh this PR?

@sum-rock
Copy link
Contributor Author

@hramezani Absolutely. I'll take a look next week, after the holiday.

@sum-rock sum-rock force-pushed the modify-change-field-to-json branch 3 times, most recently from 5be7d58 to 410121c Compare December 27, 2022 21:39
sum-rock and others added 8 commits December 27, 2022 15:41
Storing the object changes as a json is preferred because it allows SQL
queries to access the change values. This work moves the burden of
handling json objects from an implementation of python's json library in
this package and puts it instead onto the ORM. Ultimately, having the
text field store the changes was leaving them less accessible to external
systems and code that is written outside the scope of the django
auditlog.

This change was accomplished by updating the field type on the model and
then removing the JSON dumps invocations on write and JSON loads
invocations on read. Test were updated to assert equality of
dictionaries rather than equality of JSON parsable text.

Separately, it was asserted that postgres will make these changes to
existing data. Therefore, existing postgres installations should update the
type of existing field values without issue.
The "Modify change field to be a json field" commit reduced test
coverage on the mixins.py file by 0.03%. The reduction in coverage was
the result of reducing the number of operations required to achieve the
desired state. An additional test was added to increase previously
uncovered code. The net effect is an increase in test case coverage.
Better markdown formatting

Co-authored-by: Hasan Ramezani <hasan.r67@gmail.com>
More specific language in the improvement section regarding `LogEntry.change`

Co-authored-by: Hasan Ramezani <hasan.r67@gmail.com>
Co-authored-by: Hasan Ramezani <hasan.r67@gmail.com>
Running the migration to update the field type of `LogEntry.change` is a breaking change.

Co-authored-by: Hasan Ramezani <hasan.r67@gmail.com>
The create log method on the LogEntry manager required an additional
kwarg for a call to create an instance regardless of a change or not.
This felt brittle anyway. The reason it had worked prior to these
changes was that the `change` kwarg was sending a string "null" and
not a None when there were no changes.
def log_create(self, instance, **kwargs):
def log_create(self, instance, force_log: bool = False, **kwargs):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hramezani This is a recent addition. When the receivers were calling this method, the changes kwarg would equal "null" when changes were None. This transformation was caused by the json.dumps method being called on None.

Because we don't need to transform with the json library anymore, we need a way to signal that a log should be created when there are no changes. Specifically this is required for the new ACCESS action type. I extended the idea of passing along the force_log flag in those cases.

@sum-rock
Copy link
Contributor Author

@hramezani The branch is now refreshed.

@BigVaibhav
Copy link

Any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants