OM allows you to define a “terminology” to ease translation between XML and ruby objects - you can query the xml for Nodes or node values without ever writing a line of XPath.
OM “terms” are ruby symbols you define (in the terminology) that map specific XML content into ruby object attributes.
The API documentation at [http://rdoc.info/github/projecthydra/om] provides additional, more targeted information. We will provide links to the API as appropriate.
- Install OM and run it in IRB
- Build an OM Terminology
- Use OM XML Document class
- Create XML from the OM XML Document
- Load existing XML into an OM XML Document
- Query OM XML Document to get term values
- Access the Terminology of an OM XML Document
- Retrieve XPath from the terminology
To get started, you will create a new folder, set up a Gemfile to install OM, and then run bundler.
mkdir omtest
cd omtest
Using whichever editor you prefer, create a file (in omtest directory) called Gemfile with the following contents:
source 'https://rubygems.org'
gem 'om'
Now run bundler to install the gem: (you will need the bundler Gem)
bundle install
You should now be set to use irb to run the following example.
To experiment with abbreviated terminology examples, irb is your friend. If you are working on a persistent terminology and have to experiment to make sure you declare your terminology correctly, we recommend writing test code (e.g. with rspec). You can see examples of this here
irb
require "rubygems"
=> true
require "om"
=> true
Create a simple (simplish?) Terminology Builder (OM::XML::Terminology::Builder) based on a couple of elements from the MODS schema.
terminology_builder = OM::XML::Terminology::Builder.new do |t|
t.root(:path=>"mods", :xmlns=>"http://www.loc.gov/mods/v3", :schema=>"http://www.loc.gov/standards/mods/v3/mods-3-2.xsd")
# This is a mods:name. The underscore is purely to avoid namespace conflicts.
t.name_ {
t.namePart
t.role(:ref=>[:role])
t.family_name(:path=>"namePart", :attributes=>{:type=>"family"})
t.given_name(:path=>"namePart", :attributes=>{:type=>"given"}, :label=>"first name")
t.terms_of_address(:path=>"namePart", :attributes=>{:type=>"termsOfAddress"})
}
# Re-use the structure of a :name Term with a different @type attribute
t.person(:ref=>:name, :attributes=>{:type=>"personal"})
t.organization(:ref=>:name, :attributes=>{:type=>"corporate"})
# This is a mods:role, which is used within mods:namePart elements
t.role {
t.text(:path=>"roleTerm",:attributes=>{:type=>"text"})
t.code(:path=>"roleTerm",:attributes=>{:type=>"code"})
}
end
Now tell the Terminology Builder to build your OM::XML::Terminology:
terminology = terminology_builder.build
Generally you will use an OM::XML::Document to work with your xml. Here’s how to define a Document class that uses the same Terminology as above.
In a separate window (so you can keep irb running), create the file
my_mods_document.rb
in the omtest directory, with this content:
class MyModsDocument < ActiveFedora::NokogiriDatastream
include OM::XML::Document
set_terminology do |t|
t.root(:path=>"mods", :xmlns=>"http://www.loc.gov/mods/v3", :schema=>"http://www.loc.gov/standards/mods/v3/mods-3-2.xsd")
# This is a mods:name. The underscore is purely to avoid namespace conflicts.
t.name_ {
t.namePart
t.role(:ref=>[:role])
t.family_name(:path=>"namePart", :attributes=>{:type=>"family"})
t.given_name(:path=>"namePart", :attributes=>{:type=>"given"}, :label=>"first name")
t.terms_of_address(:path=>"namePart", :attributes=>{:type=>"termsOfAddress"})
}
t.person(:ref=>:name, :attributes=>{:type=>"personal"})
t.organization(:ref=>:name, :attributes=>{:type=>"corporate"})
# This is a mods:role, which is used within mods:namePart elements
t.role {
t.text(:path=>"roleTerm",:attributes=>{:type=>"text"})
t.code(:path=>"roleTerm",:attributes=>{:type=>"code"})
}
end
# Generates an empty Mods Article (used when you call ModsArticle.new without passing in existing xml)
# (overrides default behavior of creating a plain xml document)
def self.xml_template
# use Nokogiri to build the XML
builder = Nokogiri::XML::Builder.new do |xml|
xml.mods(:version=>"3.3", "xmlns:xlink"=>"http://www.w3.org/1999/xlink",
"xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance",
"xmlns"=>"http://www.loc.gov/mods/v3",
"xsi:schemaLocation"=>"http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd") {
xml.titleInfo(:lang=>"") {
xml.title
}
xml.name(:type=>"personal") {
xml.namePart(:type=>"given")
xml.namePart(:type=>"family")
xml.affiliation
xml.computing_id
xml.description
xml.role {
xml.roleTerm("Author", :authority=>"marcrelator", :type=>"text")
}
}
}
end
# return a Nokogiri::XML::Document, not an OM::XML::Document
return builder.doc
end
end
(Note that we are now also using the ActiveFedora gem.)
OM::XML::Document provides the set_terminology
method to handle the details of creating a TerminologyBuilder and
building the terminology for you. This allows you to focus on defining
the structures of the Terminology itself.
By default, new OM Document instances will create an empty xml document,
but if you override self.xml_template
to return a different object
(e.g.
Nokogiri::XML::Document),
that will be created instead.
In the example above, we have overridden xml_template
to build an
empty, relatively simple MODS document as a
Nokogiri::XML::Document.
We use
Nokogiri::XML::Builder
and call its .doc method at the end of xml_template in order to return
the
Nokogiri::XML::Document
object. Instead of using
Nokogiri::XML::Builder,
you could put your template into an actual xml file and have
xml_template use
Nokogiri::XML::Document.parse
to load it. That’s up to you. Create the documents however you want, just return a
Nokogiri::XML::Document.
To use Nokogiri::XML::Builder
require "my_mods_document"
newdoc = MyModsDocument.new
newdoc.to_xml
=> NoMethodError: undefined method `to_xml' for nil:NilClass
To load existing XML into your OM Document, use #from_xml
For an example, download hydrangea_article1.xml into your working directory (omtest), then run this in irb:
sample_xml = File.new("hydrangea_article1.xml")
doc = MyModsDocument.from_xml(sample_xml)
Take a look at the document object’s xml that you’ve just populated. We will use this document for the next few examples.
doc.to_xml
Using the Terminology associated with your Document, you can query the xml for nodes or node values without ever writing a line of XPath.
You can use OM::XML::Document.find_by_terms to retrieve xml nodes from the datastream. It returns Nokogiri::XML::Node objects:
doc.find_by_terms(:person)
doc.find_by_terms(:person).length
doc.find_by_terms(:person).each {|n| puts n.to_xml}
You might prefer to use nodes as a way of getting multiple values pertaining to a node, rather than doing more expensive lookups for each desired value.
If you want to get directly to the values within those nodes, use
OM::XML::Document.term_values
:
doc.term_values(:person, :given_name)
doc.term_values(:person, :family_name)
If the xpath points to XML nodes that contain other nodes, the response
to term_values
will contain Nokogiri::XML::Node
objects instead of text
values:
doc.term_values(:name)
For more examples of Querying OM Documents, see Querying Documents
For more examples of Updating OM Documents, see Updating Documents
If you have an XML schema defined in your Terminology’s root Term, you
can validate any xml document by calling .validate
on any instance of
your Document classes.
doc.validate
Note: this method requires an internet connection, as it will download the XML schema from the URL you have specified in the Terminology’s root term.
Directly accessing the Nokogiri::XML::Document and the OM::XML::Terminology
OM::XML::Document is implemented as a container for a Nokogiri::XML::Document. It uses the associated OM Terminology to provide a bunch of convenience methods that wrap calls to Nokogiri. If you ever need to operate directly on the Nokogiri Document, simply call ng_xml and do what you need to do. OM will not get in your way.
ng_document = doc.ng_xml
If you need to look at the Terminology associated with your Document, call #terminology on the Document’s class.
MyModsDocument.terminology
doc.class.terminology
Because the Terminology is essentially a mapping from XPath queries to ruby object attributes, in most cases you won’t need to know the actual XPath queries. Nevertheless, when you do want to know the Xpath (e.g. for ensuring your terminology is correct) for a term, the Terminology can generate xpath queries based on the structures you’ve defined (OM::XML::TermXPathGenerator).
Here are the xpaths for :name
and two variants of :name
that were
created using the :ref
argument in the Terminology Builder:
terminology.xpath_for(:name)
=> "//oxns:name"
terminology.xpath_for(:person)
=> "//oxns:name[@type=\"personal\"]"
terminology.xpath_for(:organization)
=> "//oxns:name[@type=\"corporate\"]"
The solrizer gem provides support for indexing XML documents into Solr based on OM Terminologies. That process is documented in the solrizer documentation.