-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
marc-matcher
- a macro for working with MARC data
#4
Comments
Awesome! Now I need a z39.50 client! |
A Racket one would be nice! 😆 |
if only I had time - and I switched from libraries to health about 12 years ago so I'm into HL7 instead of MARC21 now. There is an ASN.1 Library if it is of interest |
Thank you for your contribution! If you haven’t already please take the time to fill in the form https://forms.gle/Z5CN2xzK13dfkBnF7 Bw |
bennn
added a commit
to syntax-objects/syntax-parse-example
that referenced
this issue
Sep 28, 2021
bennn
added a commit
to syntax-objects/syntax-parse-example
that referenced
this issue
Oct 27, 2021
bennn
added a commit
to syntax-objects/syntax-parse-example
that referenced
this issue
Oct 27, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Please enter the bee by submitting code (or links to code) for:
Thank you for your submission!
Macro
This is a very domain-specific macro, developed for a particular bibliographic metadata use-case. The macro definition itself is given below, and the required files containing helper definitions have been attached to this issue.
This macro aims to make it easier to do regex-like matching over a structured bibliographic data format known as MARC 21. MARC records contain a sequence of fields whose data are string values that look like this:
$aCarroll, Lewis,$d1832-1898,$eauthor.
In each field, individual subfields are separated using a separator character (in this case
$
); the character immediately following the separator is called the subtag; and the substring upto the next separator or end-of-string is the subfield data. So in the example above, there are three subfields,$a
,$d
, and$e
, whose data are, respectively,Carroll, Lewis,
,1832-1898,
, andauthor.
.Parsing subfields out of this is often done using regular expressions, but it gets really difficult when trying to deal with subfield repetitions. I'll use field 264 to illustrate. This field mainly contains the following pieces of publication information: the
$a
subfield contains place of publication; the$b
contains the entity responsible for publication; and the$c
contains the date of publication. There are several possible repetition patterns for these subfields which require different semantic interpretations. To give a few examples:a+bc
: multiple places of publication with the same publisher$aLondon ;$aNew York :$bRoutledge,$c2017.
[1]ab+c
: multiple publishers with the same place of publication$aNew York, NY :$bBarnes & Noble :$bSterling Publishing Co., Inc.,$c2012.
[2](ab)+c
: multiple publications, each with different places and publishers$aBoston :$bLee and Shepard, publishers ;$aNew York :$bLee, Shepard, and Dillingham,$c1872.
[3]Writing a regex to intelligently parse this information out of the string is a pain, but regexes are an already popular and well understood tool in the metadata community. Thus,
marc-matcher
lets users specify regular expressions that match subgroups within the field they want to parse, and define variables they can use in their code containing the results of those matches, which allows more complex kinds of processing to be done with simpler code.Example
This example defines a lambda called
parse-264
usingmarc-matcher
:The first clause of the
marc-matcher
expression is a list of variable definitions, similar to a parameter list for a lambda. For example,[#px"ab" #:as place-entity-groups]
defines a variable calledplace-entity-groups
, which will be a list of all the groups (which are themselves lists of structs) consisting of a single subfield$a
followed by a single subfield$b
. The second clause is the computation the user wishes to do with the values extracted from the field, and can refer to the variables defined in the first clause.The
parse-264
function above can then be used as follows:Here is another example, using table of contents data[4]:
Before and After
This would probably count as a code cleaning macro, though the before code doesn't exist (because I've not previously done this kind of metadata work in Racket).
Licence
I confirm that the code is under the same MIT license as the Racket language, and associated text is under Creative Commons Attribution 4.0 International License
Contact
The text was updated successfully, but these errors were encountered: