DSU Winter Practicum Report: Haoran Wang (Using OAI PMH with Islandora)

[The following post is shared on behalf of Haoran Wang, one of the practicum students hosted by the Digital Scholarship Unit this term.]

Make Metadata Discoverable via the OAI-PMH in WorldCat

For the past few months, I was working as a practicum student at the University of Toronto Scarborough Library trying to figure out how to utilize the OAI module to help the Digital Scholarship Unit (DSU) make their metadata discoverable via the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) in WorldCat. Based on what I learned from the course INF2186 Metadata Schemas & Apps at UofT, I already had some basic knowledge of how to share metadata with a local, state, or regional digital metadata repository, and expose current metadata for OAI harvesting. This tutorial will teach you how I did this step by step.

Let’s start with some basic terms.

Step 0 - Terms to Get Started

Open Archive Initiative (OAI) is an initiative to develop and promote interoperability standards that aim to facilitate the efficient dissemination of content.

OAI Protocol for Metadata Harvesting (OAI-PMH) is a lightweight harvesting protocol for sharing metadata between services. In the OAI context, harvesting refers specifically to the gathering together of metadata from a number of distributed repositories into a combined data store.

There are two classes of participants in the OAI-PMH framework:

  • Data Providers administer systems that support the OAI-PMH as a means of exposing metadata; and
  • Service Providers use metadata harvested via the OAI-PMH as a basis for building value-added services.

Data Providers (open archives, repositories) provide free access to metadata, and may, but do not necessarily, offer free access to full texts or other resources. OAI-PMH provides an easy to implement, low barrier solution for Data Providers.

Service Providers use the OAI interfaces of the Data Providers to harvest and store metadata. Note that this means that there are no live search requests to the Data Providers; rather, services are based on the harvested data via OAI-PMH. Service Providers may select certain subsets from Data Providers (e.g., by set hierarchy or date stamp). Service Providers offer (value-added) services on the basis of the metadata harvested, and they may enrich the harvested metadata in order to do so.

Basic functioning of OAI-PMH

  • The OAI-PMH protocol is based on HTTP.
  • Responses are encoded in XML syntax. OAI-PMH supports any metadata format encoded in XML. Dublin Core is the minimal format specified for basic interoperability.

 

The diagram below is the overview and structure model of OAI-PMH.

Step 1 - Set Up Your OAI Module

The DSU currently use Islandora, an open-source, OAIS-based digital preservation repository and asset management system built on Drupal. First of all, going to the DSU home page, select Islandora ←  Islandora Utility Modules ← Islandora OAI from the navigation bar.

Then, the OAI module allows you to configure your URL path to the Repository. In this example, the base URL is http://dsu-beta.utsc.utoronto.ca/projects/oai2. If you want to see more records on your base URL, input the number you want to see under the “Maximum Response Size”. The default number here is 20 records per response.

Click on the Configure button below, you will find more setting configurations based on OAI request handler.

In OAI request handler, select the dc.identifier.thumbnail. If selected, a URL to the object's thumbnail will be added as a dc:identifier.thumbnail if the object has a thumbnail.

The DSU currently use MODS for all generic content going forwards - in past DSU used Dublin Core, but Islandora natively prefers MODS and it's more flexible for complex objects. For all fields that you want to display in WorldCat, you have to configure the metadata fields so that they are mapped to Dublin Core. Thus, I choose to transform MODS to Dublin Core.

Services like WorldCat expect links back to the object such as a Handle URL. If your metadata doesn't have this, self transforming XSLTs can be used to add specific elements tailored to individual needs.

Make sure you save all the settings in the end by clicking the Save Configuration button.

Step 2 - Test Your Base URL

OAI-PMH supports six request types (known as "verbs"). You can use them by simply adding these verbs after the base URL.

URLs for GET requests have keyword arguments appended to the base URL, separated from it by a question mark [?]. For example, the URL of a GetRecord request to DSU base URL that is http://dsu-beta.utsc.utoronto.ca/projects/oai2 could be:

http://dsu-beta.utsc.utoronto.ca/projects/oai2?

verb=GetRecord&identifier=oai:dsu-beta.utsc.utoronto.ca:utsc:519&metadataPrefix=oai_dc

Here is an explanation of all six request types:

  • GetRecords: This verb is used to retrieve an individual metadata record from a repository.
  • Identify: to retrieve information about a repository.
  • ListIdentifiers: retrieving only headers rather than records.
  • ListMetadataFormats: retrieve the metadata formats available from a repository.
  • ListRecords: used to harvest records from a repository.
  • ListSets: used to retrieve the set structure of a repository.

After you have exposed content types and some fields, your repository is available at /oai2

Some example requests are as follows:

http://dsu-beta.utsc.utoronto.ca/projects/oai2?verb=Identify

http://dsu-beta.utsc.utoronto.ca/projects/oai2?verb=ListMetadataFormats

http://dsu-beta.utsc.utoronto.ca/projects/oai2?verb=ListIdentifiers&metadataPrefix=oai_dc

http://dsu-beta.utsc.utoronto.ca/projects/oai2?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:dsu-beta.utsc.utoronto.ca:utsc:519

http://dsu-beta.utsc.utoronto.ca/projects/oai2?verb=ListRecords&metadataPrefix=oai_dc

Step 3 - Build a Gateway from WorldCat to Add Records from OAI-PMH

In order to use the Gateway, be sure that the following conditions are met:  

  • Your OAI-PMH compliant repository is running.  
  • You have one or more existing collections with metadata fields mapped to Dublin Core and/or Qualified Dublin Core (dcterms).  
  • You have an OCLC-supplied Key for the Gateway.

If your institution does not already have a Gateway account, Go to the Gateway registration page at http://worldcat.org/digitalcollectiongateway/register.jsp to register your account.

In a few days, OCLC will send a welcome Email that includes user credentials that you can use to log in to the Gateway and add additional users. After you have registered and have received your Gateway user credentials, you can log in to the Gateway and begin synchronizing metadata with WorldCat from your OAI repositories.

After you have registered your account with the Gateway, you need to associate your repositories with the appropriate Gateway key.

  1. Go to the Gateway login page and login http://www.worldcat.org/DigitalCollectionGateway/login.jsp
  2. If you’re not already in the Manage Account tab, click to select it now.
  3. Click Keys and Repositories.
  4. Click to select the key for which you want to add repositories.
  5. Click Add Repository. You’ll see a display something like this:

 

6. When the Add Repository window appears, enter the OAI-PMH base URL for the selected repository. Then click Test.

7. When the repository has been tested successfully, Gateway displays the message “All OAI tests passed.” You can now click Add to associate the repository with your Key.

8. After you have successfully added the repository, you’ll be able to edit and manage settings for the repository you just added.

 

 

In the Repository area of the page, the following information is displayed:  

  • Institution symbol (OCLC symbol)  
  • Gateway license key  
  • URL, name – The OAI-PMH base URL and name of this repository
  • Type – To change the repository type, use the pull-down menu to select one of the following: CONTENTdm (pre version 5), DSpace, Fedora, Eprints, Digital Commons or other. After changing the type, you must click Change to save your choice.

You can use the Show Sets in Collection List? Pull-down menu to configure the way the Gateway harvests content from a repository.

Your OAI repository allows you to manage sets (collections of records) separately in the Gateway. Using sets is the default approach. In the Gateway, a set name is the same as a collection name.

By default, Show Sets… is Yes. This default setting allows you to set up different metadata maps for each collection (or set) in a repository.

If you want to create a single metadata map for all records in your OAI repository, regardless of what collection the records are in, you can select No from the pull-down menu. Selecting No will create a special collection named Entire Repository. When you create a metadata map for that special collection, your mappings apply to the entire repository.  

Note: If you select No, you cannot subsequently undo that setting in the Gateway. For this reason, we strongly recommend that you do not change the default setting. Moreover, with multiple sets you may choose to apply one profile to several (or all) of the sets at any time.

In this example, I select No to apply mappings in the entire repository.

 

9. Since your license key may be used by more than one Gateway user, you can assign users with that key to particular collections. The users can then map metadata and synchronize with WorldCat for each collection to which they are assigned. Then you have to select the type of record processing for this collection and prepare your collection for synchronization with WorldCat through the Gateway.

Then on homepage you will find several sections:  

  • Collection Details – In addition to the general information displayed for this collection, you can set the WorldCat Record Processing type, collection-level record, and more.  
  • Sync Details – You can edit the synchronization schedule for this collection, view its synchronization history, or view a synchronization status report.
  • Metadata Map – You can click the link to edit the collection’s metadata map.  

 

Congratulations! Now you will be able to see your collections in the WorldCat.

The entire repository is avaliable at: http://www.worldcat.org/search?q=on:DGCNT+http://dsu-beta.utsc.utoronto.ca/projects/oai2+DCG_ENTIRE_REPOSITORY+CN7UT&qt=results_page

QA Analysis for Current Repository

Finally, I also did a quality assurance analysis for DSU’s repository. As you can see, the total DC completeness is 73.12%. Some collections need to add dates.

Enjoy your harvesting!

References

Lagoze, C & Van de Sompel, H. (2002). The Open Archives Initiative Protocol for Metadata Harvesting. Available:

<https://www.openarchives.org/OAI/openarchivesprotocol.html>.

OCLC Online Computer Library Center, Inc. (2012). The WorldCat Digital Collection Gateway Tutorial. Available:

<https://www.oclc.org/content/dam/oclc/gateway/gettingstarted/tutorial.pdf>.

Tripp, E. (2014). Get Discovered: Sitemaps, OAI and More. Available:

<http://islandora.ca/sites/default/files/iCampCo_GetDiscovered_Oct15_2014.pdf>.

Jackson et al. (2008). Dublin Core metadata harvested through OAI-PMH. Available:

<https://www.ideals.illinois.edu/bitstream/handle/2142/9091/JLM.pdf?sequence=2>.

Shreeves et al. (2006). Moving towards shareable metadata. Available:

<http://firstmonday.org/ojs/index.php/fm/article/view/1386/1304>.