Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3205 reparse command refactor #3361

Open
wants to merge 10 commits into
base: develop
Choose a base branch
from

Conversation

raftmsohani
Copy link

@raftmsohani raftmsohani commented Dec 11, 2024

Summary of Changes

Pull request closes #3205 _

Moved all the utility functions to utilize.py, moved frontend reparse command to a separate function and cleaned up clean_and_reparse management command.

How to Test

Follow steps below:

task up
  1. Open http://localhost:3000/ and sign in.
  2. Proceed to admin page
  3. make sure you have OFA admin role (to see the reparse command in admin) and go to datafiles list
  4. Perform reparse admin command and watch the logs

Deliverables

More details on how deliverables herein are assessed included here.

Deliverable 1: Accepted Features

Checklist of ACs:

  • [insert ACs here]
  • lfrohlich and/or adpennington confirmed that ACs are met.

Deliverable 2: Tested Code

  • Are all areas of code introduced in this PR meaningfully tested?
    • If this PR introduces backend code changes, are they meaningfully tested?
    • If this PR introduces frontend code changes, are they meaningfully tested?
  • Are code coverage minimums met?
    • Frontend coverage: [insert coverage %] (see CodeCov Report comment in PR)
    • Backend coverage: [insert coverage %] (see CodeCov Report comment in PR)

Deliverable 3: Properly Styled Code

  • Are backend code style checks passing on CircleCI?
  • Are frontend code style checks passing on CircleCI?
  • Are code maintainability principles being followed?

Deliverable 4: Accessible

  • Does this PR complete the epic?
  • Are links included to any other gov-approved PRs associated with epic?
  • Does PR include documentation for Raft's a11y review?
  • Did automated and manual testing with iamjolly and ttran-hub using Accessibility Insights reveal any errors introduced in this PR?

Deliverable 5: Deployed

  • Was the code successfully deployed via automated CircleCI process to development on Cloud.gov?

Deliverable 6: Documented

  • Does this PR provide background for why coding decisions were made?
  • If this PR introduces backend code, is that code easy to understand and sufficiently documented, both inline and overall?
  • If this PR introduces frontend code, is that code easy to understand and sufficiently documented, both inline and overall?
  • If this PR introduces dependencies, are their licenses documented?
  • Can reviewer explain and take ownership of these elements presented in this code review?

Deliverable 7: Secure

  • Does the OWASP Scan pass on CircleCI?
  • Do manual code review and manual testing detect any new security issues?
  • If new issues detected, is investigation and/or remediation plan documented?

Deliverable 8: User Research

Research product(s) clearly articulate(s):

  • the purpose of the research
  • methods used to conduct the research
  • who participated in the research
  • what was tested and how
  • impact of research on TDP
  • (if applicable) final design mockups produced for TDP development

@raftmsohani raftmsohani self-assigned this Dec 13, 2024
@raftmsohani raftmsohani added the raft review This issue is ready for raft review label Dec 19, 2024
@@ -1,6 +1,4 @@
# Base Docker compose for all environments
version: "3.4"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is depreciated

Copy link

@jtimpe jtimpe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working as expected! One minor organization comment

delete_associated_models,
count_total_num_records,
calculate_timeout,
handle_datafiles,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to just have all these functions in the reparse.py file?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of that as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to have a cleaner reparse file while also we can re-use some of these functions if needed. However, for some of these functions it might make sense to move them to reparse, for example handle_datafile is specific to reparse

)

is_sequential = assert_sequential_execution(log_context)
should_exit(not is_sequential)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we update this to raise an exception/return an error to the user so that they know why the reparse didn't happen? Even writing to the console would be a start.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we move this to the very beginning of the function to avoid unnecessary computation if we arent sequential?

fiscal_quarter = None
fiscal_year = None
all_reparse = False
new_indices = False
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a future ticket, could we deduce these fields from the selected datafiles?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I will leave a todo

######
files = DataFile.objects.filter(id__in=selected_files)
backup_file_name = "/tmp/reparsing_backup"
backup_file_name += "_selected_files"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be removed since we won't put all the file IDs or other file meta in the backup name?


delete_associated_models(meta_model, file_ids, new_indices, log_context)

meta_model.timeout_at = meta_model.created_at + calculate_timeout(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it'll be a bit slower. But could we calculate this at the very top of the function by summing the number of records in the selected files first instead of deleting stuff first and then updating the timeout? It feels like that could save us some grief in the future.

Copy link

codecov bot commented Dec 26, 2024

Codecov Report

Attention: Patch coverage is 61.97719% with 100 lines in your changes missing coverage. Please review.

Project coverage is 91.09%. Comparing base (ff01121) to head (73942d5).
Report is 2 commits behind head on develop.

Files with missing lines Patch % Lines
tdrs-backend/tdpservice/search_indexes/utils.py 67.61% 52 Missing and 5 partials ⚠️
tdrs-backend/tdpservice/search_indexes/reparse.py 17.39% 38 Missing ⚠️
...h_indexes/management/commands/clean_and_reparse.py 89.74% 4 Missing ⚠️
tdrs-backend/tdpservice/data_files/tasks.py 50.00% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3361      +/-   ##
===========================================
- Coverage    91.48%   91.09%   -0.40%     
===========================================
  Files          299      301       +2     
  Lines         8595     8644      +49     
  Branches       636      637       +1     
===========================================
+ Hits          7863     7874      +11     
- Misses         615      653      +38     
  Partials       117      117              
Flag Coverage Δ
dev-backend 90.87% <61.97%> (-0.45%) ⬇️
dev-frontend 92.65% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
tdrs-backend/tdpservice/data_files/tasks.py 73.91% <50.00%> (ø)
...h_indexes/management/commands/clean_and_reparse.py 87.67% <89.74%> (+14.50%) ⬆️
tdrs-backend/tdpservice/search_indexes/reparse.py 17.39% <17.39%> (ø)
tdrs-backend/tdpservice/search_indexes/utils.py 67.61% <67.61%> (ø)

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0984b95...73942d5. Read the comment docs.

return True


def should_exit(condition):
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be needed after the management command is removed

@raftmsohani raftmsohani requested a review from elipe17 January 2, 2025 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend raft review This issue is ready for raft review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Re-parse command refactor
4 participants