-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Quality Checker and Run on GitHub Action #170
base: main
Are you sure you want to change the base?
Conversation
implement the quality checker that reports errors for - wrong escape characters - wrong starting letters - presence of non-utf-8 characters and reports warning for - duplicate entries - same full forms - same abbreviations - outdated 'Manage' abbreviation
prevent the script from stopping by error-triggered exit
- ignore single-name journals with same abbreviation as full name - generate error summary and deploy checker on GitHub Action
change upload-artifact@v2 to @V3
fix mismatched quality check file name
provide better visualisation of error/warning output
- enhance invalid escape character check - group full name duplication and abbreviation duplication into same warning - ignore articles and preposition in check wrong beginning letters
make quality check action exit with code 1 if errors are present
print out error and warning messages on GitHub Action under quality check
Removed force print error summary in quality-check.yml since default GitHub Action console could not accommodate the size of error/warning summary. Partial error message can be seen in Run Quality Check
deleted a redundant quality checker
Add quality checker and set up GitHub Action file to run the checker
the function now considers abbreviations valid if they are similar to full text while not strictly having the same starting letters as the full names
scripts/check_quality.py
Outdated
if filename.endswith(".csv"): | ||
filepath = os.path.join(JOURNALS_FOLDER_PATH, filename) | ||
|
||
# Run the checks | ||
check_non_utf8_characters(filepath) | ||
check_wrong_escape(filepath) | ||
check_wrong_beginning_letters(filepath) | ||
check_duplicates(filepath) | ||
check_full_form_identical_to_abbreviation(filepath) | ||
check_outdated_abbreviations(filepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's possible to read the CSV file here, so that we can read the file only once and reduce performance-expensive io operations.
scripts/check_quality.py
Outdated
|
||
# Write to summary file | ||
with open(SUMMARY_FILE_PATH, 'w', encoding='utf-8') as summary_file: | ||
summary_file.writelines(summary_output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As koppor mentioned (lychee), we can post the logs to the action's summary, see: https://docs.github.com/en/actions/writing-workflows/choosing-what-your-workflow-does/workflow-commands-for-github-actions#adding-a-job-summary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Content of each abbreviation csv is now loaded into memory once and used by check functions, instead of being read multiple times upon function calls
Attempt to write issue report to GITHUB_STEP_SUMMARY
Fix quality checker name error
Try uploading large error report as artifect
Shorten error/warning message for smaller summary size
This reverts commit 512c4c4.
Shorten message and provide a more efficient error summary
Add quality checker (Update on PR JabRef#170)
Attempt solving #149
Closes #149
Improvement
Added a quality checker for abbreviation csv files that detects errors due to
and warnings due to
Also added a yml file that runs the checker script on GitHub Action, which exits with error code if issues are found with the abbreviation files. A full downloadable error/warning summary is generated at each run (partial reports can be seen in Run Quality Check).
Limitations:
check_wrong_beginning_letters(filepath)
: currently considers abbreviations longer than the full text as invalid, even if the abbreviations seem valid, e.g.