Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial OAI-PMH Endpoint (issues 498 & 1192) #4

Merged
merged 10 commits into from
Sep 9, 2019
Merged

Initial OAI-PMH Endpoint (issues 498 & 1192) #4

merged 10 commits into from
Sep 9, 2019

Conversation

seth-shaw-unlv
Copy link
Contributor

GitHub Issue: Islandora/documentation#1192

This PR represents the 'short term' solution described in the issue thread.

What does this Pull Request do?

Adds an islandora_oaipmh submodule using rest_oai_pmh to enable an OAI-PMH endpoint. Includes a README providing details about how it works.

What's new?

  • Added the submodule.
  • Does this change require documentation to be updated? Perhaps adding a page to the Islandora docs?
  • Does this change add any new dependencies? Only if the module is enabled; requires rest_oai_pmh.
  • Does this change require any other modifications to be made to the repository
    (ie. Regeneration activity, etc.)? No.
  • Could this change impact execution of existing code? No.

How should this be tested?

  • Make some repository items (if you don't have any already).
  • Apply the PR
  • Install rest_oai_pmh (e.g. composer require drupal/rest_oai_pmh)
  • Enable the module: drush en -y islandora_oaipmh
  • Trigger the OAI-PMH indexer: Click the button found on the page at 'http://localhost:8000/admin/config/services/rest/oai-pmh/queue' (or wait for cron)
  • Query the OAI-PMH Endpoint. E.g. http://localhost:8000/oai/request?verb=ListRecords&metadataPrefix=oai_dc

Bonus:

  • (Read the README first...)
  • Add an Entity Reference display mode to any existing view you have of.
  • Add the new display mode as a set on the OAI-PMH configuration page.
  • Trigger the OAI-PMH indexer
  • See your set in the endpoint. 'http://localhost:8000/oai/request?verb=ListSets'

Additional Notes:

The linked agents field complicates the mapping to Dublin Core. This module makes some assumptions in islandora_oaipmh_preprocess_rest_oai_pmh_record() (in islandora_oaipmh.module) about which MARC relators are creators and assumes the rest are contributors. If someone could specifically review that list (e.g. @rosiel), that would be great.

@joecorall indicated the rest_oai_pmh module is still in alpha pending handling of deleted content. I personally don't think that should stop us from making it available. Although, I should probably add a note about it's experimental status in the README... I can also close the request if we think this is too soon.

Interested parties

@Islandora-CLAW/committers

@mjordan mjordan self-requested a review August 14, 2019 16:49
@mjordan
Copy link
Contributor

mjordan commented Aug 14, 2019

@seth-shaw-unlv I'll review and test this.

Copy link
Contributor

@mjordan mjordan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works as advertised, although I hit a snag when I tried creating new sets (see below). Here's the output from http://localhost:8000/oai/request?verb=ListRecords&metadataPrefix=oai_dc:

<?xml version="1.0"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd" name="OAI-PMH">
  <responseDate>2019-09-05T13:56:46Z</responseDate>
  <request verb="ListRecords" metadataPrefix="oai_dc">http://localhost:8000/oai/request</request>
  <ListRecords>
    <resumptionToken/>
    <record>
      <header>
        <identifier>oai:localhost:node-6</identifier>
        <datestamp>2019-08-23T14:13:08Z</datestamp>
        <setSpec>oai_pmh:all_repository_items</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:title>Greenwhich.jpg</dc:title>
          <dc:description>Object ingested by: mark@mark-ThinkPad-X1-Carbon-6th.

Basic technical metadata: /home/mark/Pictures/pics/Greenwhich.jpg: JPEG image data, Exif standard: [TIFF image data, little-endian, direntries=10, description=                               , manufacturer=Canon, model=Canon PowerShot G12, orientation=upper-left, xresolution=204, yresolution=212, resolutionunit=2, datetime=2013:07:13 06:56:08], baseline, precision 8, 3648x2432, frames 3

SHA256 hash: a1c9c00867ce71bcc63ef7e715b1deadaf7251f43ec6f4f92f02985345ec4a60</dc:description>
          <dc:extent>1 item</dc:extent>
          <dc:identifier>/home/mark/Pictures/pics/Greenwhich.jpg</dc:identifier>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:localhost:node-7</identifier>
        <datestamp>2019-09-02T19:12:47Z</datestamp>
        <setSpec>oai_pmh:all_repository_items</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:title>My test Islandora object</dc:title>
          <dc:description>Object ingested by: mark@mark-ThinkPad-X1-Carbon-6th.

Basic technical metadata: /home/mark/Downloads/replace_media.png: PNG image data, 833 x 437, 8-bit/color RGBA, non-interlaced

SHA256 hash: 048caaecfd9abab788afd0becdf657b8bb66ab74381b38e0aadf33b6d5a814bc</dc:description>
          <dc:extent>1 item</dc:extent>
          <dc:identifier>/home/mark/Downloads/replace_media.png</dc:identifier>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:localhost:node-9</identifier>
        <datestamp>2019-09-05T13:49:13Z</datestamp>
        <setSpec>oai_pmh:all_repository_items</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:title>Small boats in Havana Harbour</dc:title>
          <dc:description>Taken on vacation in Cuba.</dc:description>
          <dc:extent>1 item</dc:extent>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:localhost:node-10</identifier>
        <datestamp>2019-09-05T13:49:22Z</datestamp>
        <setSpec>oai_pmh:all_repository_items</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:title>Manhatten Island</dc:title>
          <dc:description>Taken from the ferry from downtown New York to Highlands, NJ. Weather was windy.</dc:description>
          <dc:extent>1 item</dc:extent>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:localhost:node-12</identifier>
        <datestamp>2019-09-05T13:49:33Z</datestamp>
        <setSpec>oai_pmh:all_repository_items</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:title>Amsterdam waterfront</dc:title>
          <dc:description>Amsterdam waterfront on an overcast day.</dc:description>
          <dc:extent>1 item</dc:extent>
        </oai_dc:dc>
      </metadata>
    </record>
    <record>
      <header>
        <identifier>oai:localhost:node-13</identifier>
        <datestamp>2019-09-05T13:49:42Z</datestamp>
        <setSpec>oai_pmh:all_repository_items</setSpec>
      </header>
      <metadata>
        <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
          <dc:title>Alcatraz Island</dc:title>
          <dc:description>Taken from Fisherman's Wharf, San Francisco.</dc:description>
          <dc:extent>1 item</dc:extent>
        </oai_dc:dc>
      </metadata>
    </record>
  </ListRecords>
</OAI-PMH>

modules/islandora_oaipmh/README.md Show resolved Hide resolved
@mjordan
Copy link
Contributor

mjordan commented Sep 5, 2019

Nice! Only glitch I ran into, and it might be a PEBCAK error, is I can't get sets to work. I've added an Entity Reference display to an existing View (Taxonomy Term), checked it in the OAI-PMH "What to expose to OAI-PMH" settings, and rebuilt my OAI-PMH index. ListSets is coming up empty:

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd" name="OAI-PMH"><responseDate>2019-09-05T14:17:27Z</responseDate><request verb="ListSets">http://localhost:8000/oai/request</request><ListSets/></OAI-PMH>

When I uncheck my new set in the "What to expose" list and rebuild, ListSets is returning the default set:

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd" name="OAI-PMH"><responseDate>2019-09-05T14:25:48Z</responseDate><request verb="ListSets">http://localhost:8000/oai/request</request><ListSets><set><setSpec>oai_pmh:all_repository_items</setSpec><setName>All Repository Items</setName></set></ListSets></OAI-PMH>

Am I missing something?

@mjordan
Copy link
Contributor

mjordan commented Sep 5, 2019

More info on my set issue, results from http://localhost:8000/oai/request?verb=ListRecords&metadataPrefix=oai_dc do not list the new set in their <setSpec> element. So I suspect the problem is with my View. Here's a screenshot of its config:

view

@joecorall
Copy link
Member

As far as the sets not showing: it looks like you're using a taxonomy vocab as a contextual filter. I'm guessing you'd want each term in the vocab to be treated as a set? And that you have an RDF mapping (or some other form of metadata mapping) for the vocab? I'm not 100% sure if entity types other than nodes are available for the sets feature. I'll look into that tomorrow.

@mjordan
Copy link
Contributor

mjordan commented Sep 5, 2019

@joecorall it occurred to me that the contextual filter might be the problem here too, it's essentially empty. I'll try with a specific view and report back.

@mjordan
Copy link
Contributor

mjordan commented Sep 7, 2019

Removing the contextual filter and replacing it with a specific filter using a taxonomy term makes the set show up in the OAI-PMH settings. So based on my testing, filters can't be contextual.

Now that I've got my view as a set, I'm seeing some unexpected behavior with the contents of the OAI requests, but at this point I'm still narrowing down what's going on. I'll continue over the weekend and report back here.

@mjordan
Copy link
Contributor

mjordan commented Sep 8, 2019

Got sets persisting:

curl -v -o sets.xml http://localhost:8000/oai/request?verb=ListSets gives me:

<?xml version="1.0"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd" name="OAI-PMH">
  <responseDate>2019-09-08T21:17:25Z</responseDate>
  <request verb="ListSets">http://localhost:8000/oai/request</request>
  <ListSets>
    <set>
      <setSpec>oai_pmh:all_repository_items</setSpec>
      <setName>All Repository Items</setName>
    </set>
  </ListSets>
  <ListSets>
    <set>
      <setSpec>taxonomy_term:entity_reference_1</setSpec>
      <setName>Entity Reference</setName>
    </set>
  </ListSets>
</OAI-PMH>

But a request for a set returns items that aren't in that set. For example, this request (there are 2 nodes in the "taxonomy_term:entity_reference_1" view display):

curl -v -o oaipmhoutput.xml "http://localhost:8000/oai/request?verb=ListRecords&metadataPrefix=oai_dc&setSpec=taxonomy_term:entity_reference_1"

Returns results that contain records not in that set, e.g.:

<record>
      <header>
        <identifier>oai:localhost:node-13</identifier>
        <datestamp>2019-09-05T13:49:42Z</datestamp>
        <setSpec>oai_pmh:all_repository_items</setSpec>
      </header>
</record>

Records that should be in the set are:

    <record>
      <header>
        <identifier>oai:localhost:node-12</identifier>
        <datestamp>2019-09-07T17:32:02Z</datestamp>
        <setSpec>taxonomy_term:entity_reference_1</setSpec>
        <setSpec>oai_pmh:all_repository_items</setSpec>
      </header>
</record/

Rebuilding my OAI index doesn't have effect on this.

@seth-shaw-unlv
Copy link
Contributor Author

I'll see what I can do.

@seth-shaw-unlv
Copy link
Contributor Author

Of course, I say that 4 hours ago and only now get around to it...

@seth-shaw-unlv
Copy link
Contributor Author

@mjordan Can I get your new view config so I can test it locally? Thanks.

@mjordan
Copy link
Contributor

mjordan commented Sep 9, 2019

@seth-shaw-unlv here my dumped config. It's for the "Taxonomy Term" View, in particular the "Entity Reference" display.

pr4-views.view.taxonomy_term.yml.txt

@seth-shaw-unlv
Copy link
Contributor Author

seth-shaw-unlv commented Sep 9, 2019

Hmm... the view isn't pulling up any of my taxonomy terms. Does the view's preview pane work on yours, @mjordan? I'm wondering what is different.

@mjordan
Copy link
Contributor

mjordan commented Sep 9, 2019

Yes, for that display the preview shows the expected nodes.

@mjordan
Copy link
Contributor

mjordan commented Sep 9, 2019

@seth-shaw-unlv I added my own terms to the vocabulary that the view is using, so you won't have them on your VM. You should be able to adjust it to use any term however.

@seth-shaw-unlv
Copy link
Contributor Author

@mjordan, I noticed that. Adding a term actually has the effect of displaying the node several (4) times over. Do you have repeating results?

FWIW I wouldn't use the taxonomy_term view as my base. I would use the content view as my base and add the filters to that.

@seth-shaw-unlv
Copy link
Contributor Author

Okay, it was repeated for every taxonomy term it had associated with it (Islandora Model, Islandora Access, Linked Agent, and Subject). Removing one of the node's associated terms dropped the number of times it appeared.

@mjordan
Copy link
Contributor

mjordan commented Sep 9, 2019

Did you get it to work as an OAI set?

@seth-shaw-unlv
Copy link
Contributor Author

No, I was just trying to get the view working so I could debug the rest. In anycase, I do see the bug you mentioned. I'm wondering if there was a regression in the rest_oai_pmh module since I submitted the PR, because I know this worked before. I will dig into this some more.

@seth-shaw-unlv
Copy link
Contributor Author

@mjordan the OAI URL appeared to be incorrect. Try http://localhost:8000/oai/request?verb=ListRecords&metadataPrefix=oai_dc&set=taxonomy_term:entity_reference_1 (it replaces 'setSpec' with 'set').

@seth-shaw-unlv
Copy link
Contributor Author

According to the docs, we use the 'set' argument with the value of a 'setSpec'.

@seth-shaw-unlv
Copy link
Contributor Author

So, really, this 'works as designed' rather than a bug.

@mjordan
Copy link
Contributor

mjordan commented Sep 9, 2019

Using set worked (the two nodes I expected to show up in the results where the only ones) but setSpec is the correct argument. Maybe I should file that on the rest_oai_pmh module's issue queue?

@seth-shaw-unlv
Copy link
Contributor Author

'setSpec' is the name of the value, but for an HTTP ListRecords query, the argument is 'set'.

@seth-shaw-unlv
Copy link
Contributor Author

Screen Shot 2019-09-09 at 1 56 26 PM

@mjordan
Copy link
Contributor

mjordan commented Sep 9, 2019

Wow, which is the correct argument is pretty unclear, particularly by lack of examples. Approving and merging. Sorry for the rabbit hole.

@seth-shaw-unlv
Copy link
Contributor Author

@mjordan, no worries! Better to run down the hole. I'd rather be safe than sorry!

@mjordan
Copy link
Contributor

mjordan commented Sep 9, 2019

Nice work @joecorall and @seth-shaw-unlv on this very important feature!

@mjordan mjordan merged commit 72c490b into Islandora:8.x-1.x Sep 9, 2019
@seth-shaw-unlv seth-shaw-unlv deleted the issue-498 branch September 9, 2019 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants