Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Un-escaped Ampersands Checker for #107 #108

Merged
merged 12 commits into from
Oct 27, 2022
12 changes: 11 additions & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ on:
# always run on pull requests
jobs:
koppor marked this conversation as resolved.
Show resolved Hide resolved
markdown-check:
name: Check markdown
name: Check Markdown
runs-on: ubuntu-latest
steps:
- name: Checkout source
Expand All @@ -18,3 +18,13 @@ jobs:
with:
config: './.markdownlint.yml'
args: .

ampersands-check:
name: Check Ampersands are Unescaped
runs-on: ubuntu-latest
steps:
- name: Checkout source
uses: actions/checkout@v2
- name: Run Python Ampersands Script
run: python3 check_ampersands.py

19 changes: 19 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,27 @@
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [UNRELEASED]

## 2022-10
koppor marked this conversation as resolved.
Show resolved Hide resolved
koppor marked this conversation as resolved.
Show resolved Hide resolved

Added Escaped Ampersands Checker
koppor marked this conversation as resolved.
Show resolved Hide resolved

### Added

- check_ampersands.py which checks all csv journals in the journals folder to make
sure all instances of ampersands are unescaped

### Changed

- `.github/workflows/tests.yml` added the above script to the GitHub workflow so the check runs every time the main branch is pushed to
- Minor format changes in `README.md` and `LISENSE.md` as the old GitHub actions check was already failing
- Found an escaped ampersands using the new script in `journal_abbreviations_dainst.csv` so this was ammended


## 2021-09

Initial tagged release


koppor marked this conversation as resolved.
Show resolved Hide resolved
<!-- markdownlint-disable-file MD012 MD024 MD033 -->
2 changes: 1 addition & 1 deletion LICENSE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED HEREUNDER.

### Statement of Purpose
## Statement of Purpose

The laws of most jurisdictions throughout the world automatically confer exclusive Copyright and Related Rights (defined below) upon the creator and subsequent owner(s) (each and all, an "owner") of an original work of authorship and/or a database (each, a "Work").

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,6 @@ In case of duplicate appearances in the journal lists, the last occuring abbrevi
* Frontend: <https://marcinwrochna.github.io/abbrevIso/>
* API: <https://tools.wmflabs.org/abbreviso/>

It takes the official list of ISO4 abbreviations of single words, plus the general rules defined in the ISO4 specifications to deduce the abbreviation for any journal name you input.
It takes the official list of ISO4 abbreviations of single words, plus the general rules defined in the ISO4 specifications to deduce the abbreviation for any journal name you input.

Could be an alternative or complementary (when missing in the lists) approach to abbreviate journal names. But of course, it does not handle unabbreviation, for which there is no alternative to lists. It can also be a way to check the consistency of existing lists and it might make sense to link to the frontend on the abbrv.jabref website, so that people who want to add abbreviations can check for the correct one.
50 changes: 50 additions & 0 deletions check_ampersands.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#!/usr/bin/env python3

"""
Python script for checking if all Ampersands in .csv journal abbreviation files are
unescaped. This convention is enforced to ensure that abbreviations of journal titles
can be done without error.

The script will raise a ValueError() in case escaped ampersands are found, and will
also provide the row and column in which they were found (1 -indexed). The script does
NOT automatically fix these errors. This should be done manually.

The script will automatically run whenever there is a push to the main branch of the
abbreviations repo (abbrv.jabref.org) using GitHub Actions.
"""

import os
import itertools

# Get all file names in journal folders
PATH_TO_JOURNALS = "./journals/"
fileNames = next(itertools.islice(os.walk(PATH_TO_JOURNALS), 0, None))[2]

# Store ALL locations of escaped ampersands so they can all be printed upon failure
errFileNames = []
errRows = []
errCols = []

for file in fileNames:
if (file.endswith(".csv")):
# For each .csv file in the folder, open in read mode
with open(PATH_TO_JOURNALS + file, "r") as f:
for i, line in enumerate(f):
# For each line, if it has \&, store the fname, row and columns
if ('\&' in line):
errFileNames.append(file)
errRows.append(i + 1)
errCols.append([index + 1 for index in range(len(line)) if line.startswith('\&', index)])


# In the case where we do find escaped &, the len() will be non-zero
if (len(errFileNames) > 0):
err_msg = "["
# For each file, append every row:col location to the error message
for i, fname in enumerate(errFileNames):
for col in errCols[i]:
err_msg += "("+ fname + ", " + str(errRows[i]) + ":" + str(col) + "), "
# Format end of string and return as Value Error to 'fail' GitHub Actions process
err_msg = err_msg[:len(err_msg) - 2]
err_msg += "]"
raise ValueError("Found Escaped Ampersands at: " + err_msg)
2 changes: 1 addition & 1 deletion journals/journal_abbreviations_dainst.csv
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ Acta Universitatis Nicolai Copernici. Archaeologia;ActaTorunA
Acta Universitatis Nicolai Copernici. Historia;ActaTorunHist
Antike Denkmäler;AD
Abhandlungen des Deutschen Archäologischen Instituts, Abteilung Kairo;ADAIK
Adalya. Annual of the Suna \& Inan Kiraç-Research Institute on Mediterranean Civilizations;Adalya
Adalya. Annual of the Suna & Inan Kiraç-Research Institute on Mediterranean Civilizations;Adalya
Αρχαιολογικόν Δελτίον (Μελέτες);ADelt A
Αρχαιολογικόν Δελτίον (Χρονικά);Adelt B
Arkeoloji dergisi. Ege Üniversitesi Edebiyat Fakültesi;ADerg
Expand Down
1 change: 1 addition & 0 deletions update_mathscinet.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,4 @@

# Save the end file in the same path as the old one
df.to_csv(file_out, sep=";", escapechar="\\", index=False, header=False)