Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(accessLogExport)!: create new AccessLogExportTask to generate a csv of access logs TASK-871 #5258

Merged
merged 17 commits into from
Dec 2, 2024

Conversation

RuthShryock
Copy link
Member

@RuthShryock RuthShryock commented Nov 12, 2024

👀 Preview steps

  1. Ensure that the Export all data button for ProjectViews still works as normal and that the email header has the appropriate title: "Project View Report Complete"
  2. Ensure that exports for submission data works correctly:
  • make submissions to a project
  • navigate to the Project < Data < Downloads
  • 🟢 ensure the Export button is exporting data correctly
  1. Preview the AccessLogExportTask from the shell:
  • ℹ️ have a user
  • enter the Django shell: ./manage.py shell_plus
  • get your user: test_user = User.objects.get(username='test')
  • create an AccessLogExportTask:
task = AccessLogExportTask.objects.create(
           user=test_user,
           get_all_logs=False,
           data={'type': 'access_logs_export'},
  )
  • 🟢 print the task object and notice that an AccessLogExportTask object has been created: print(task)
  • run the task: task.run()
  • 🟢 print the result and notice that it generated a link to the csv export: print(task.result)

💭 Notes

  • Changed the field named user to user_url because naming it user conflicts with an existing field in the AccessLog model
  • Renamed the file titled project_view_exports.py to data_exports.py because it now generates exports for both project view and access log exports
  • Refactored the ProjectViewExportTask to have a base CommonExportTask which both the project view and access log export classes inherit from
  • Updated the email (in kpi/tasks.py) to have the correct subject title based on the type of export report it is sending
  • Alphabetized the functions in data_exports.py

…ate csv files for access logs, and added tests
Copy link

@RuthShryock RuthShryock changed the title feat(accessLogExportTask): create new AccessLogExportTask to generate a csv of access logs TASK-871 feat(accessLogExport): create new AccessLogExportTask to generate a csv of access logs TASK-871 Nov 12, 2024
@RuthShryock RuthShryock self-assigned this Nov 12, 2024
@RuthShryock RuthShryock marked this pull request as ready for review November 13, 2024 16:55
@magicznyleszek magicznyleszek removed their request for review November 14, 2024 08:54
@RuthShryock RuthShryock changed the title feat(accessLogExport): create new AccessLogExportTask to generate a csv of access logs TASK-871 feat(accessLogExport)!: create new AccessLogExportTask to generate a csv of access logs TASK-871 Nov 14, 2024
Copy link
Member

@jnm jnm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nitpick that I need to somehow get into our organization-wide development guidelines: SubmissionsExport… should be SubmissionExport. Another example of something similar would be that we use AssetSerializer instead of AssetsSerializer

Copy link
Contributor

@jamesrkiger jamesrkiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should figure out what is going on with the diff here (208 files changed) before we proceed with this. I'm sure whoever else looks at this will notice, but I'm marking it as change requested out of an abundance of caution.

@RuthShryock
Copy link
Member Author

@jamesrkiger Yes sorry I'm working on cleaning this up, something with my force push went wrong.

Copy link
Contributor

@rgraber rgraber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify this refactor a bit.
If we make CommonExportTask a mixin we won't need to do the difficult migration.
Also, I think instead of having a lot of methods that are like

do_thing(task):
    if task is project_view:
        do_stuff
    elif task is export:
        do other stuff

we can make the task classes themselves responsible for generating the data.
Then export_task_in_background would look like (pseudocode)

model = get_model()
task = model.get(uid=uid)
task.run()
if status is complete:
      subject = model.default_email_subject
       ...get the result and send the email

and in CommonExportTask, run_task could do something like

data = self.get_data()
url = create_export_file()
return url

and AccessLogExportTask and ProjectViewExportTask would each have their own implementations of get_data() and a value for default_email_subject.
Usually inheritance is preferable to if-else chains because the code it's easier to change one subclass without changing methods that affect other subclasses.

@@ -482,27 +482,23 @@ def export_upload_to(self, filename):
return posixpath.join(self.user.username, 'exports', filename)


class ProjectViewExportTask(ImportExportTask):
uid = KpiUidField(uid_prefix='pve')
class CommonExportTask(ImportExportTask):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have caught this before, but I think we may be able to avoid the complicated migration if we make this a mixin rather than a separate model.

}


def create_data_export(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do it this way we'll have to add to this method every time we create a new kind of export task. I think it would make more sense to have this method take just take data and headers and create a csv from it. The AccessLogExportTask and ProjectViewExportTask classes would do whatever they had to do to get the data and then call this method with the data they collected.

if isinstance(val, str):
val = val.replace('\n', '')
return val
def get_data(filtered_queryset: QuerySet, export_type: str) -> QuerySet:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we have more cases, we should just have each type of ExportTask have its own method for getting data so we don't just keep adding elifs for every new kind of export task.

'abstract': False,
},
),
migrations.RunPython(populate_common_export_tasks),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot: if we end up keeping the migration, this needs to be reversible, even if the reverse is RunPython.noop. Otherwise unapplying migrations breaks

Copy link
Contributor

@rgraber rgraber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small things

@@ -610,7 +610,7 @@ def create_from_related_request(
action = modify_action
if action:
# some actions on related objects do not need to be logged,
# eg deleting an ExportTask
# eg deleting an SubmissionExportTask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# eg deleting an SubmissionExportTask
# eg deleting a SubmissionExportTask

nit, definitely non-blocking

filename = self._build_export_filename(
export_type, self.user.username, view
)
def _run_task_base(self, messages: list, buff) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _run_task_base(self, messages: list, buff) -> None:
def _export_data_to_file(self, messages: list, buff) -> None:

since it's no longer doing the whole task

return 'Access Log Report Complete'

def get_data(self, filtered_queryset: QuerySet) -> QuerySet:
user_url = Concat(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didn't know you could do this outside a query. neat.

return 'Report Complete'

def _get_export_details(self) -> tuple:
return self.data['type'], self.data['view']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should account for data not having a view key since we don't need one for access logs exports. the filename can just be export_type-username-time for tasks without a view.

return 'Report Complete'

def _get_export_details(self) -> tuple:
return self.data['type'], self.data['view']

def _build_export_filename(
self, export_type: str, username: str, view: str
) -> str:
time = timezone.now().strftime('%Y-%m-%dT%H:%M:%SZ')
return f'{export_type}-{username}-view_{view}-{time}.csv'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous comment. This should be updated to allow an empty/None view parameter

'ip_address',
'initial_user_username',
'initial_user_uid',
'auth_app_name',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'auth_app_name',
'authorized_app_name',

Copy link
Contributor

@rgraber rgraber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the ordering thing and then I think we're good to go!

from kobo.apps.audit_log.models import AccessLog
from kpi.models.import_export_task import AccessLogExportTask

User = get_user_model()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (non-blocking)
If you want to avoid this boilerplate, you can use from kobo.apps.kobo_auth.shortcuts import User


self.assertIsNotNone(task.result, 'The task.result should not be None.')
expected_pattern = (
rf'{self.superuser.username}/exports/access_logs_export-'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are way more patient than I am to write this regex. respect.

@@ -79,9 +92,43 @@
'key': METADATA,
'columns': USER_FIELDS + METADATA_FIELDS,
},
'access_logs_export': {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should order by '-date-created to be consistent with the endpoint

Copy link
Contributor

@rgraber rgraber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@RuthShryock RuthShryock merged commit 6992e8a into main Dec 2, 2024
8 checks passed
@RuthShryock RuthShryock deleted the access-log-export-task-class branch December 2, 2024 19:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants