BibTexNanny

BibTexNanny is a tool to check the consistency of BibTex files, fix common mistakes and generate simplified versions of a bibliography.

BibTex Parser

BibTexNanny uses biblib to parse and generate BibTex files.

The following fixes and changes should be made to biblib:

Add BibDesk-compatibility mode for BibTex output
Fix issues with loading bad month information
- Can't replicate issue anymore, not sure what changed.
Add ability to handle duplicate keys
Prevent BibTex Parser from dropping metadata and comment lines
- BibTexNanny internal work-around
When names are parsed, curly braces need to be handled correctly

BibTex file consistency checker

Find duplicates
- Duplicate keys
  - added biblib work-around to load files with duplicate keys.
- Duplicate paper titles
  - Grade badness of duplicate by how much of the rest matches
  - Consider cases where duplicates might be acceptable
    - Pairs of entries for presentation and paper (what is the entry type for the presentation).
      - Allow users to define entry types that should be ignored when looking for duplicate titles. This way you can for example model presentations as @misc entries and have them be ignored
    - Pre-print and published version of paper.
    - Author who actually named different papers differently (in what cases would this happen?)
    - Different editions of a book.
    - Possibly paper and extended version of it as journal article.
Warnings for missing fields
- Optional warning for optional fields
Tex-Unicode conversion
- LaTeX to Unicode conversion
  - Fix loosing curly braces
- Unicode to BibTeX conversion
  - Check if URLs require special handling
Warnings for bad formatting
- Warning for non-standard entry type
- Warning for fields whose value has no curly braces, but is not a known macro
- Warnings for non-secured capitalisation in name field
- Warnings for unnecessary curly braces
  - Curly braces are not only for uppercase characters but also for encoding special characters, e.g. \'{e} to get é
  - Allow user preference for wrapping characters or whole words.
  - What is the difference between single and double braces?
- Warnings for badly formatted in page numbers
- Find badly formatted names (author and editor fields)
  - All-caps names
  - Bad use of latex commands
  - Missing spaces between initials
  - Other bad formattings
- Warning for all-caps texts
- Notice bad months
- Check if desired key format is followed (see entry key format)
Warnings for inconsistent formatting
- Different names for conferences (see dictionary of conference names)
- Name formatting
  - Names or parts of names written in all caps (MICKEY MOUSE or Mickey MOUSE)
    - Identify when an all-caps name part is actually intials written without period or whitespace
  - Name initials
    - Initial written without period (Mickey D Mouse)
    - Multiple initials written without whitespace (Mickey A.B. Mouse)
    - Multiple initials written without periods or whitespace (Mickey AD Mouse)
    - Warning when first names are only initials
    - Warning when only some names of a paper are full and some have initials
- Location names
  - Indicate when there is a country without a city
  - Indicate when there is a city without a country
  - States missing from US locations
- Inferrable information for conferences/journals is inconsistent
Allow limiting search to citations found in aux file

BibTex Fixer

Infer fields from other entries
- Basic inference functionality
- Add more inferrable fields (see Field Inference)
- Add functionality for mapping information across types (e.g. from proceeding to inproceedings)
Infer full names
- Infer full name form of initials when the full name is used elsewhere
- Infer proper non-ASCII spelling of a name when is it used elsewhere
Fix inconstistent fields
- Replace conference name variations with main name (see dictionary of conference names)
- Expand name initials to full names
  - Infer full name form of initials when the full name is used elsewhere
  - Infer proper non-ASCII spelling of a name when is it used elsewhere
- Make locations more informative (City, [State], Country)
  - Add missing country
  - Add missing city
  - Add state (USA only)
  - Extend state initials to full state name
- Have consistent file order
Fix formatting
- Replace non-ASCII characters in keys
- Add wraps around capitalised characters in name field
  - Add option to wrap entire words instead of only the capitalised characters
- Remove unnecessary {}-wraps
- Fix badly formatted page numbers
- Fix all-caps text (but not single all caps words)
  - Separate handling for names
- Fix bad but understandable months (e.g. numbers)
- Correct handling for escaped sequences - [ ] Escaped by curly braces - [ ] Escaped by math mode
- Name formatting
  - Change format of name to non-ambiguous "Last, First" format
  - Fix special character formatting
    - Use consistent braces format (e.g. write {\"o} instead of \"{o})
    - Replace latex commands (e.g. replace \textasciicaron{}e with {e})
  - Fix all-caps names (MICKEY MOUSE or Mickey MOUSE)
  - Fix initials format
    - Initials must be followed by a period
    - Multiple initials must be separated by spaces
  - Test if text starts with "and"
Rename entry keys
- Provide a format to specify the desired key names
- Key format might differ for different entry types.
- Key format should consist of only ASCII characters
Multi-bibliography merger
- Identify entries that are the same
  - Option 1: Same key
  - Option 2: Match on major fields (e.g. name plus authors?)
- Merge
  - Identical fields are accepted
  - Fields available in only one version are accepted
  - Fields that clash cause user prompt or trigger other fixer functions

BibTex simplifier

Simplify conference names
- Use dictionary of conference names
- allow regex or sed replacement
Simplify Names
- Turn full first names into initials
- Turn full middle names into initials
Simplify Locations
- Drop entirely
- Drop city
- Drop state
- Shorten state to initials
- Copy location to address (even though technically it is incorrect)

Auxiliary

Dictionary of conference names

Allow full name, name variation, short name
Names should allow for number placeholder
How to link regularly named conferences with years where they were held in conjunction with something?
Additional script to suggest possible name variations

Key formatting

There might already be an open source system for standardising BibTex keys. This is also used by Zotero. Gotra check that out.

Relevant factors for key formatting

First author last name
- capitalised
- lower caps
Year
Word from Title
- capitalised
- lower caps
- all caps
Disambiguating characters
- lowercase a,b,c

Common formats

lastnameYEAR
LastnameYEAR
LastnameYEARkeyword
LastnameYEARdisambig
lastname_keyword_year
TITLEWORD
LastnameYEAR or KEYWORD

How to choose format

Number of hardcoded options
- Easy to implement, little flexibility
RegEx
- Easy to implement, flexible, but limited functionality (can't check other fields)
- Actually, if you use named groups, you could use those names to trigger additional checks for them.
Custom format
- Lots of work to implement, full functionality, probably quite flexible

Field Inference

article: journal + year + volume => month
article: journal + year + month => volume
book: booktitle + year +volume/number => inbook: author, editor,publisher, series, edition, month, publisher
book: booktitle + year +volume/number => incollection: editor, publisher, series, edition, month, publisher
conference: booktitle + year => address, month, editor, organization, publisher
inbook: title + year => address, month, editor, publisher
incollection: booktitle + year => address, month, editor, publisher
inproceedings: booktitle + year => address, month, editor, organization, publisher
proceedings: booktitle + year => i**nproceedings: **address, month, editor, organization, publisher
If proceedings title contains an index (e.g. "Proceedings of the 5th Conference on Examples") we can infer year and all other pieces of information from it.

BibTexNanny Input Parameters

Input methods

Use Python's configparser, which allows INI-like config files

Internal processing

~~Dict~~
- Straightforward, but need to keep the key strings straight
Custom object with lots of boolean fields
- More design effort, but probably more flexible
- Should have different class for each Nanny component
  - As the tasks overlap considerably, there should be a NannyConfig superclass and inherriting classes for the components.
  - Accessing config info should be done via functions, not fields, to allow custom processing of the stored information

Required states for custom variables

Consistency checker

True (check value)
False (don't check value)

Fixer

True/Autofix/Auto (autofix value)
Tryfix/Try (autofix if trivial, otherwise prompt to fix)
Promptfix/Prompt (Prompt to fix)
False (don't check value)

Consistency + Fixer

How information for both scripts can be given in the same config file

Single value for both (Try and Prompt are treated as True)
~~Tuple: False,Tryfix (CONSISTENCY,FIXER)~~
Variables for only one of the two configs, e.g. duplicateKeys-consistency
Different sections for giving instructions for both or just either

Simplifier

Should have separate config files.

Blacklist: List fields that should be removed
Whitelist List only the fields that are wanted
Variables for conversion functions

============================================================

Interface

Good way to set parameters?

Argument calls
- set list of wanted fields (if None, all are wanted)
- Set list of unwanted fields (optional)
Config files
- allows for templates
- More complex to set up
Prompts during processing, asking for user decisions
- Could also be used to auto-generate config files

External information files

LaTeX style files

.bst: BibTex format file (difficult to parse)
.sty: LaTeX style file (can this contain the bst info?)
.cls: LaTeX class file (can this contain the bst info?)

LaTeX temp files

.aux: Lists citations and labels
- Single line to parse: \citation{citationlabel}

BibTexNanny files

Dictionary of conference names
Style config file
Tool config files
- Consistency checker config file
- Fixer config file
- Simplifier config file

BibTex field requirements

We need to be able to check the following aspects for fields:

What type of entry are we looking at?
What are the generally required and optional fields for this entry?
- This bit can be hardcoded as it is always true for all BibTex files
- Look up BibTex documentation to determine these values
For a particular bibliography type, which are the required and optional fields, which fields are ignored?
- Easy solution: Manually create a config file that lists fields as mandatory, optional and ignored
  - Requires config file design
- Better solution: Load style files to automatically extract this kind of information.
  - Are there python tools that can load sty and cls files for us?
Design a config file that allows users to set which info they want to drop and which they need enforced
- List by entry type
  - Allow defining fields for more than one entry type at once
- Define fields as mandatory, optional, unused and maybe as hidden
Three layer approach:
1. In-built BibTex entry definitions
2. Config file for bibliography style requirements
3. Config file for simplification requirements

People working on related tools

Titus von der Malsburg
Marten van Schijndel
- Dave's fork

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

BibTexNanny

BibTex Parser

BibTex file consistency checker

BibTex Fixer

BibTex simplifier

Auxiliary

Dictionary of conference names

Key formatting

Relevant factors for key formatting

Common formats

How to choose format

Field Inference

BibTexNanny Input Parameters

Input methods

Internal processing

Required states for custom variables

Consistency checker

Fixer

Consistency + Fixer

Simplifier

============================================================

Interface

Good way to set parameters?

External information files

LaTeX style files

LaTeX temp files

BibTexNanny files

BibTex field requirements

People working on related tools

Files

README.md

Latest commit

History

README.md

File metadata and controls

BibTexNanny

BibTex Parser

BibTex file consistency checker

BibTex Fixer

BibTex simplifier

Auxiliary

Dictionary of conference names

Key formatting

Relevant factors for key formatting

Common formats

How to choose format

Field Inference

BibTexNanny Input Parameters

Input methods

Internal processing

Required states for custom variables

Consistency checker

Fixer

Consistency + Fixer

Simplifier

============================================================

Interface

Good way to set parameters?

External information files

LaTeX style files

LaTeX temp files

BibTexNanny files

BibTex field requirements

People working on related tools