Thursday, April 2, 2015 - 10:10

Are you a doctoral student (or an advanced MA student) wondering about all the buzz around “the Digital Humanities”? Do you wish you could get a concise introduction to this emerging field as it applies to your own research (and meet a community of scholars working in this area)? If so, please join UTSC faculty, digital librarians, and other graduate students for a day-long workshop on April 21st, 2015. In addition to an overview of the field of DH and a sneak peak of some exciting projects being developed at UTSC, you’ll learn about key principles, methods, and tools available to you right now. No coding or prior experience necessary -- just register to secure your spot by filling out this brief survey. As a bonus, your feedback will help us finalize the schedule to ensure that the training provided is as useful as possible. We will provide lunch, instructional materials, and lots of food for thought. All you need to bring is your laptop and questions.

Please note: All graduate students are welcome to register to this free event, but as seating is limited, priority will be given to pre-candidacy students in the Departments of History, Classics, East Asian Studies, Near and Middle Eastern Civilizations, and the Collaborative Programs in Women and Gender Studies and South Asian Studies. Register by April 13th, and you’ll be notified of acceptance by April 15th.

This event is hosted by the Department of Historical and Cultural Studies, in partnership with the UTSC Library.

If you have any questions, please contact us at

Social Media

Tweet the day at #utscDH15

Join our Facebook Group.


Getting Here

Location - the Social Sciences Building (MW) at the University of Toronto Scarborough, RM 120



According to the TTC Trip Planner
Take the 198 ROCKET towards U OF T SCARBOROUGH - EAST
198 Bus Schedule


Preliminary Schedule

9:30 - Welcome, housekeeping, roundtable, what is DH?

10:00 - Introduction to Digital Humanities at UTSC (Faculty Projects)

10:30 - Open Discussion/break

10:45 - Zotero, and Beyond Bibliographic Managment - What is Zotero? How do you use features for data collection and research collaboration? Intro to open-source community and plugins

12:00 - Lunch (Provided)

1:00 - Academic Blogging, Digital & Data Storytelling, Natural Language Processing & Social Media Analysis

3:00 - break

3:15 - Structuring and Analyzing Data (Network Visualizations) - Structured data and your thesis

4:15 -Closing remarks (Frontiers & Data Curation)

4:30 -  Exit survey

Register Now Event Updates!

We have a full house! Thanks to all for applying to attend this workshop. We're excited at the amount of interest it has garnered. For those of you attending, we have some updates.

Learn more about who's attending!

Join our Facebook Group.


Thursday, October 23, 2014 - 09:51

We’re on day 4 of Open Access (OA) Week! We had a great turn out yesterday at the button making station outside the library and the social media activity is still going strong.  Photos are posted on the @digitalUTSC Instagram and Twitter accounts, as well as the EPSA Facebook page. Thanks to all who have been participating!  

There are still great OA Week activities happening across the tri-campus the rest of the week. Check out what is happening today at across all 3 U of T campuses

Of particular interest to those at UTSC:

We're back TODAY 10:30am-3pm outside the library so if you have a spare moment, please drop by to say hello, make a button and share your thoughts on OA. 
  Drop by the Library Instruction Lab (AC286A) 2-3pm today and Friday if have any questions about depositing copies of your publications in our research repository (TSpace) or want to know more about publishing in an OA journal. 
  If you’re looking to publish in an OA journal but don’t have any funding remaining, you may want to consider the Library’s OA Author Fund pilot which has been extended into 2015. We also have RSC Gold-for-Gold vouchers if you’re publishing in an RSC publication. Questions? Email me or come to one of the drop-in sessions listed above.
  I’ve been periodically tweeting out links to OA publications/open data/other open research outputs to showcase the amazing scholarly output of UTSC researchers.  If you have something you’d like me to highlight, please either email me or include the Library's Digital Scholarship Unit (@digitalUTSC) or my personal (@4Bes) Twitter handle if you decide tweet out a link yourself.
  There will also be more oaweek trivia coming tomorrow! Show off your OA skills on the @UofTSCCO FB page to win a Starbucks gift card
Thursday, October 23, 2014 - 09:28
Open Access Week 2014: Events Listing   Monday, October 20 Wednesday, October 22 Thursday, October 23 Friday, October 24
  Monday, October 20 Open Access Week 2014 Kick Off Event at the World Bank: Generation Open (WEBCAST)

3:00 - 4:00 pm (EDT)

VIEW THE WEBCAST: (pre-registration not required)

Join the World Bank and SPARC (The Scholarly Publishing and Academic Resources Coalition) as they host the International Open Access Week Kick Off Event, live streamed from Washington D.C. The event seeks to provide a forum for early-career researchers and students to discuss how a transition to open access could affect researchers at various stages in their careers.  A panel of experts will also discuss how academic and research institutions can become involved in supporting early-career researchers to make their scholarly articles and data accessible to all.

The following panelists will be involved in the event:

Stefano Bertuzzi: Executive Director, American Society for Cell Biology José-Marie Griffiths: Vice President for Academic Affairs, Bryant University Meredith Niles: Postdoctoral Research Fellow, Sustainability Science Program, Harvard University Jerry Sheehan: Assistant Director for Policy Development, National Library of Medicine
  Wednesday, October 22 Humanities Informational Drop-In Session with a Scholarly Communications Librarian

2:00 - 3:00 pm

AC286A Library Instruction Lab

Have questions about publishing your research? Inquiries about copyright or open access? Or just want to explore your options for the future? Come in from 2 to 3 pm to the Library Instruction Lab to talk one-on-one with the UTSC Scholarly Communications Librarian!

  Student Social Media & Button Table

10:00 am - 3:00 pm

Outside the Library

Is access to research important to you? Are you considering publication of your research someday? Come chat with Sarah Forbes, the UTSC Scholarly Communication Librarian, and EPSA representatives to learn more about open access and show your support by making personalized buttons and sharing on social media how much these issues matter to UTSC students.


Thursday, October 23 Social Sciences Informational Drop-In Session with a Scholarly Communications Librarian 

2:00 - 3:00 pm

AC286A Library Instruction Lab

Have questions about publishing your research? Inquiries about copyright or open access? Or just want to explore your options for the future? Come in from 2 to 3 pm to the Library Instruction Lab to talk one-on-one with the UTSC Scholarly Communications Librarian!


Student Social Media & Button Table

10:30 am - 3:00 pm

Outside the Library

Is access to research important to you? Are you considering publication of your research someday? Come chat with Sarah Forbes, the UTSC Scholarly Communication Librarian, and EPSA representatives to learn more about open access and show your support by making personalized buttons and sharing on social media how much these issues matter to UTSC students.


Friday, October 24 Sciences Informational Drop-In Session with a Scholarly Communications Librarian

2:00 - 3:00 pm

AC286A Library Instruction Lab

Have questions about publishing your research? Inquiries about copyright or open access? Or just want to explore your options for the future? Come in from 2 to 3 pm to the Library Instruction Lab to talk one-on-one with the UTSC Scholarly Communications Librarian!


Complete listing of tri-campus events   For more information contact:

Sarah Forbes

Wednesday, October 15, 2014 - 10:29

For the original post, visit

Recently we found that we needed to revisit our old friend the Drupal Feeds module and get it to play nice with Zotero's API. This time we wanted to use it with Culinaria's Zotero library. Our goal was to pull all of the items and their content from the library into their UTSC website (much like how an RSS feed works). In the original post, Kirsta brought up that she was having trouble when the processor was set to periodically import items to keep the feed up to date with the Zotero Library. What happened was that every time the import ran, the Processor wouldn't just add any new items and update existing ones , but it would create a new item every single time from the API. This meant that there were lots of duplicate entries in Drupal that needed to be removed. We got it to work now, but first:

A Recap

How to set up a Zotero feed in Drupal:
1. Create 2 new content types: Custom Feeds Processor, Custom Feed Item (under Structure)
2. Add a new Feed Importer (under Structure)
3. Add a new Custom Feed Processor (under Add Content)
4. Configure the new Feed Importer: attach it to the Custom Feed Processor, map it to the fields in the Custom Feed Item which will be pased with XPath (under Structure)
5. Go to the Custom Feed Processor-> use XPath as your field output (under Content)
6. Run the Import in your Custom Feed Processor. You should see that every entry in the Zotero library has created a new custom feed item in Drupal in your Content view. You can then create a page with all of your feed items using the Views module.


You can see the final page here:

We needed to set a unique key that we can use to match up with any existing feed item in Drupal. It seemed to work best when we used the element zapi:key in the Title field. That way every time the import runs, it checks if that key exists and if it does it will update (but not create) a new feed item with that key.

These are the fields we selected in the Feed Importer  to map to the processor.  We also set Title as the unique key in the target configuration column. 

Here are the XPaths we used in our Custom Feeds Processor:


Other Tips

The Zotero API maxes out at 100 items, but we had 115 items. Our workaround was to import all the items in the library by running your processor twice, once by sorting your items in descending order, once by sorting your items in ascending order. Then we set the API to sort by descending order from hereon out so it will only grab the 100 most recent items.

Earlier we used Oxygen to get the XPaths, but Google Chrome has an XML tree extension that you can use that will also quickly get you automatic XPaths:

In your Feed Importer, it's useful to use the Tamper settings to clean up your feeds.  We used HTML entity decode  and URL decode which converts hex values such as "&" into "&".  You can also use plugins such as Find and Replace, Filter empty values, or Explode.

You can turn tags in the Zotero into a taxonomy in Drupal, then create a Menu for those terms.  First you'll need to create the new terms from your Feed Importer:

Thursday, October 2, 2014 - 11:40

Date: Monday October 6th, 2014

Time: 4:00 pm to 5:30 pm

Location: Doris McCarthy Gallery, University of Toronto Scarborough 

Co-presented by the Art History Program, Chemistry Program, and the Doris McCarthy Gallery

Calling all art and science students!

Attend this FREE workshop to learn about the role of conservation in technical art history - led by Art Gallery of Ontario conservators:

Margaret Haupt - Deputy Director, Collections Management and Conservation

Maria Sullivan - Manager, Conservation


Monday, September 1, 2014 - 20:53

I feel no shame in the sense of accomplishment that I got from learning how to make a solution pack module for Islandora!

Working at the DSU has given me the chance to explore the possibilities and push the boundaries of what I'm capable of. It was great because I came out of the process learning a lot more about the mechanics behind Islandora, Fedora and Drupal and how the existing processes faciliate the lifecycle of a digital object in the system.

To me, a solution pack module is like the blueprint used to set up a digital object factory in Islandora city™. The module specifies what objects to produce, how they should look and what data types and packages are associated with an object. This is all done the programming languages understood by Drupal and Islandora.

When it is installed, the Living Research Lab solution pack designed for the DSU creates 5 collections (Births, Mice, Protocols, Publications, Experiments) that is used to manage and add objects.  The objects in each collection are ingested using specially designed forms in Darwin Core Standard (Mouse, Birth, Protocol, Experiment, Publication, and Data Session). What I've done is a really basic setup.  You can and should add hooks and configure so much more to streamline the way your data is ingested, processed and accessed.

The github link for the Living Research Lab Solution Pack can be found here:

I should note that so far module works well in development mode, but it is in definite need of debugging and refining before it can actually be used in a production environment.

It began with XML

In one of our projects we needed to build custom XML forms to store metadata in Islandora. We soon realized that customization would be really important in order to automate some of the actions and to improve usability. Building a solution pack was the next logical step to explore customization.

Tips to get started Look at other solution packs! When you're starting with nothing, it helps to look at something. What I did was look at simple examples, going through and breaking down the code line by line, and trying to trace the lines in the code and when each step is called in Islandora. Draw. Creating diagrams made it easier for me to make connections between different the little parts that contribute to the module's functionality What I used* Sandbox Virtual Machine Image ( - the testing environment Creating a shared folder from a local folder to the VM - that way I could modify module on my desktop Drupal's Devel module ( - Selected Display $page array, Display machine names of permissions and modules, Kromo backtrace). Used to debug and quickly re-install the module Islandora Documentation ( Sublime Text - the code editor Module Examples - a reference, comparing across multiple solution packs when you're trying to start your own helps Islandora Porcus ( - basic and heavily commented Biological Entity ( - usage of Darwin Core and multiple content models Biodiversidad ( - lots of customization and uses Darwin Core

Update 2014-09-11 - other useful resources:

PHP code checker: Documentation: The start of something

The first step I took to understand module building was to create map to see where and how variables and files were being referenced. What I soon discovered is that most files, functions and variables are initially called from the .module file. The .module file is the main file that contains the configuration and hooks and variables. It's in there where you add hooks, and where you wield a lot of power in if you know what you're doing.

The map below is a neater version to my original sketch.  It lists all of the hooks from the .module file and what files they refer to.  It also shows what component it affects in the Islandora system architecture.

Here are my original notes that went with the map when I was going through the Islandora Porcus module. I wrote out the names of all of the files contained in the module and within those files I wrote out all of the function names and tried to figure out what they did:

islandora_porcus.install > install and uninstall the module under Modules, module_load_include draws from islandora modules usually .inc islandora_porcus.module > everything comes together > SEE BELOW > info under Modules ./css/slandora_porcus.css > for Object View Page called from .tpl.php ./images/piggie.png > for Object View Page, called from ./includes/ > SEE BELOW > upload form > admin config page when you go to Islandora > Click on Module ./js/islandora_porcus.js > for Object View Page from .tpl.php ./theme/islandora-porcus.tpl.php > Object View Page, when you create a new object this is what appears ./xml/islandora_porcus_form_mods.xml > the form!

Questions: So content models customized for your own datastreams and way you want things to be what is a hook even - when you go to a page something (function, template) is activated?

function islandora_porcus_menu() > the admin menu item, getting things started and loading
function islandora_porcus_theme($existing, $type, $theme, $path) > the Theme is the Template - the Object View Page
function islandora_porcus_preprocess_islandora_porcus(array &$variables) > sets up variables to be placed within the template (.tpl.php) files. From Drupal 7 they apply to templates and functions
function islandora_porcus_islandora_porcusCModel_islandora_view_object($object, $page_number, $page_size) > first content model association, if object is a Cmodel that you specify here
function islandora_porcus_islandora_porcusCModel_islandora_ingest_steps(array $configuration) > hooks ingest steps
function islandora_porcus_islandora_porcusCModel_islandora_object_ingested($object) > calls
function islandora_porcus_islandora_xml_form_builder_forms() > load form
function islandora_porcus_islandora_content_model_forms_form_associations() > form association, for the form
function islandora_porcus_islandora_required_objects(IslandoraTuque $connection) > construct content model object, ingests it
function islandora_porcus_create_all_derivatives(FedoraObject $object) > derivative spec for the .txt uploaded object, potential file conversion
function islandora_porcus_transform_text($input_text) > for the .txt uploaded object
function islandora_porcus_upload_form(array $form, array &$form_state) > http://localhost:8181/islandora/object/porcus%3Atest/manage/overview/ingest this page to upload the file
function islandora_porcus_upload_form_submit(array $form, array &$form_state) > submit into what datastream

islandora-porcus.tpl.php connects to things from preprocess theme

Even if you don't want to create a module from scratch, this should help you modify existing modules in little ways that could make using Islandora easier.  Feel the power.

[*] My secret wish for tutorials is to list tools and strategies for troubleshooting.  That would help immensely.

Wednesday, August 27, 2014 - 10:28

Working at the Digitial Scholarship Unit this summer has been amazing. I can’t believe how complementary it has been for my education. I have learned so much in a practical sense but also in a much broader sense of real workplace experience in a library with digital initiatives. It’s hard to list all the skills and knowledge I have acquired from such an immersive experience but let me talk a bit about the highlights of this summer.

Right out of the gate I was given the daunting task of preparing metadata for 7000 plus photographs in UTSC’s Photographic Services Collection, one of the archival collections currently housed at the DSU. The end goal of the item level description was to create detailed metadata to be used in an online collection that intended to be representative of the collection and to compliment UTSC’s 50th Anniversary celebrations in 2015.

My first task was to design a workflow and a selection criteria for the series. I was assisted by two excellent work-study students, Vrishti Dutta and Mary-Ellen Brown, who took on the very exciting work of individually numbering each image. This was extremely time consuming but very necessary as it allows us to easily find and process specific images. The numbering, measuring and meta-data creation took approximately one month.

During this time I was also selecting images for the future online collection. Though, as my eyes started to hurt and my brain was sluggish I inexplicably stopped this practice halfway through. This turned out to be a blessing in disguise. By the time I finished processing everything I was able to look at the collection differently. Having seen every image, I had gathered a better understanding of what would best represent the collection and was able to create five sub-series. The sub-series or subjects are Student Life, Academic Life, Campus, Faculty and Staff, and Community. It was very exciting when we finally reached the last box and I realised that I had written metadata for over 7000 images. My feelings of brain-melting and elation are preserved via Twitter and reproduced here for future generations.

But even through all the brain hurt I could tell that this project was so useful. Here I was able create and execute a curation and digitization workflow. This provided me with a better understanding of what it means to work with digital initiatives. After the meta-data was finished, we used OpenRefine and exported the MODS spreadsheet lines as .XML files. I had assigned each image a digital identifier so we used that identifier to name both the .XML files and the .TIF’s of the images, creating neat little packages of images and their metadata. These packages were then ingested into Islandora over the course of a week and make available online.

In addition to processing the series I was also able to assist with archival reference questions. Some people were interested in finding interested photographs for the 50th Anniversary celebration projects. One user was interested in discovering why Pierre Eliot Trudeau had visited UTSC between 1979 and 1980, we were able to find a photograph and a newspaper article, though, no reason was given. Another user was hoping to find a photograph of her sister performing in UTSC’s student theatre for her upcoming 50th birthday party. When we were able to find a previously unseen photograph she was so happy there were tears and hugs. My previous experience with Archives had been as a researcher so it was exciting to be given the chance to be on the other side of the exchange.

To round out the experience I had the opportunity to attend workshops, camps and conferences. First, a workshop on AtoM and Archivematica see earlier blog post. A couple weeks back I volunteered to help out at Islandora Camp GTA. In between coffee breaks I was able to sit in on the admin track sessions. The sessions were expertly run and were very useful whether you were just starting out or building on foundation. I’ve already been working with the Islandora system for some time now but with the help of the sessions I was able to better understand what I had been working with all this time. I look forward to continuing to learn about all the customizable options in Islandora’s offerings.

Lastly, I was able to attend many of the sessions offered at the Digital Pedagogy Institute put on by the Digital Scholarship Unit at UTSC. There I learned about all the incredible educators, librarians, and faculty members that were making use of digital tools in their classrooms. So many of the presenters were embroiled in the most engaging projects and it was refreshing to see such novel uses for technology in the interest of learning and teaching. One great take away was how lucky we seem to be at the University of Toronto to have administrators and librarians dedicated to digital initiatives and how this kind of openness will only become more important as we continue to move in that direction. One attendee of the DPI told me that her takeaway was TTYL (Talk To Your Librarian). I’m constantly blown away by the expertise and commitment of people working in the fields where libraries, archives and technology intersect.

Some final points I’ve gleaned from my time here are a general but nonetheless important for someone starting in this field.

1) Professionalism, attention to detail and the ability to communicate well are key.

2) Staying connected to colleagues via social media is extremely useful, there are many interesting discussions and helpful tips that go down on twitter on a daily basis.

3) Whatever you don’t know, you can learn.

4) If you have a question, Google it first. Chances are someone else has already answered it.

5) Volunteer at conferences. They are excellent opportunities to learn and meet people. I can’t believe I made it this far without volunteering.

6) The archivist/librarian divide is often arbitrary in a digital context and the rivalry is dumb

I am so happy to have been given the opportunity to work at the DSU this summer. I feel as though my experiences here have taught me new skills and given me the confidence to pursue fields where I may have been uncomfortable before. I feel privileged to have worked with so many incredible people here, that they have let me to pick their brains and that they have offered me excellent advice for the future. Thanks to them, I have now secured part-time employment as the Digital Curation Intern at Information Technology Services (ITS) at Robarts Library this school year. I am confident that my time at the DSU has prepared me for new challenges both at ITS and after graduation. I am so very thankful for everything they have done for me this summer.

Rachelann Pisani

Wednesday, August 6, 2014 - 11:37

by Sara Allain

Islandora Camp GTA kicked off with nearly 40 librarians, developers, and archivists gathered at the University of Toronto Scarborough. Fortified by coffee and muffins, we got down to the business of getting to know each other. Campers hailed from throughout Ontario and the East Coast, as well as Oklahoma, Michigan, Ohio, and Florida, and represented a diverse range of use cases and experience levels. The group ran the gamut from people who'd heard the word "Islandora" thrown around but had never touched the platform to folks who've been developing/administering Islandora for years. Leading us through the day's activities were Nick Ruest (York University), Jordan Dukart (discoverygarden), Kirsta Stapelfeldt (UTSC), and David Wilcox (Duraspace).

Introductions included the question, "If you were a sandwich, what kind of sandwich would you be?"

Turkey rueben, an acquired taste (our own Kirsta) Ice cream sandwich, because he's also interested in Android development (University of Toronto ITS's Ken Yang) Poutine, because she'd always choose poutine over a sandwich (University of Toronto ITS's Kelli Babcock) Montreal smoked meat with wayyyy too much stuff in the middle (DuraSpace's David Wilcox) (That's all I can remember - if you recall any more, leave them in the comments!) New Release and Future Developments

Nick described the new modules and tools that are contained within the newest release, Islandora 7.x-1.3. In particular, he talked about the suite of modules - Checksum and Checksum Checker, FITS, BagIt, and PREMIS - that make Islandora so much stronger as a preservation system. Some excellent work here.

David talked about Fedora 4, which is a major rearchitecting of the repository software that Islandora works on. David highlighted the way that Fedora will now structure data within the repository, as well as the linked data capabilities and performance enhancements. Jordan Dukart talked about Drupal 8, which is much more object-oriented. 

Community Overview

We looked at some cool things being done with Islandora in the community:

Nick presented York U's browse map using Solr queries + Leaflet.js MJ Suhonos presented Ryerson's usage stats tracking module Nick talked about the Islandora Deployments github repo, a place where developers can write out their deployment stories, which he created with Mark Jordan during Open Repositories 2014 Mark Jordan's Background Processes Discussion Paper, looking at what's going on under the hood when Islandora ingests objects University of Southern Carolina's Moving Image Research Collections, which uses PBCore and runs a lot of non-standard processes York U's Solr Views galleries - specifically, dogs and cats Using Fedora Connector to represent Islandora objects via Omeka Interest Groups

There are currently four Islandora Interest Groups. The groups are formed and maintained by community members (read: anyone who wants to convene one) in order to address specific problems or questions related to the software and/or the community.

Nick Ruest, Donald Moses, and Mark Jordan convened the Preservation Interest Group to standardize and steward some of the new preservation modules in Islandora, including Checksum, FITS, BagIt, Vault, and PREMIS. The Preservation Interest Group is also working with Archivematica (/Artefactual Systems).

The Documentation Interest Group, convened by Kirsta, Kelli Babcock, and Gabriela Mircea, is focused on improving the Islandora documentation wiki as well as creating new documentation for training and development purposes.

David convened the Fedora 4 Interest Group to help plan how Islandora will integrate with the new version of the repository during the next phase of development.

The newest group is the Archival Interest Group, convened by me, which focuses on how archivists and archival collections interact with Islandora, incorporating questions of training, development, and linked services.


Random notes and recurring themes:

Deployment - specifically, issues with deployment that are consistent across implementations Integration with other systems, specifically archival description systems (AtoM and ArchivesSpace) and Omeka Ontologies, migrations, systems, integration with other systems, deployment Good idea/bad idea: multi-sites Drupal 8 and Fedora 4 and how Islandora 7/8 releases will play with one or both of these "It depends"
Friday, June 27, 2014 - 13:21

by Sara Allain

Lately we've been trying to come up with a better way to create metadata for batch ingestion into Islandora. We just started preparing the UTSC Photographic Services Collection to go online - our lovely Young Canada Works summer student, Rachel, has been diligently selecting a few hundred candidates for the first phase of digitization - and it makes sense to start creating the metadata as well so that once we have digital surrogates we can bundle it all into Islandora via the batch ingest quickly. Since metadata creation/manipulation takes up a lot of my day, I started thinking about the most effective way to create XML using a workflow that would be optimal for our students, our systems, and me.

This is fairly long and detailed, so feel free to jump to the bottom for the highlights.

We often work with faculty and other people outside of the unit to create metadata for the various digital scholarship projects that we steward. Spreadsheets are an easy and accessible way for faculty, students, researchers - whomever - to come to grips with structured data. Things are tidy, they're easy to manipulate, we can derive CSV files - but most importantly, our project collaborators are familiar with how they work. There's no learning curve. We use a range of products from Excel to LibreOffice to Google Drive to do this - whatever's most suited to the project.

Step 1 - Set Up Your Spreadsheet

We're using MODS for all generic content going forwards - in past we used Dublin Core, but Islandora natively prefers MODS and it's more flexible for complex objects. (We may use other schemas for subject-specific content in the future, like Darwin Core for biodiversity data, which will be an interesting blog post in itself.) I set up a Google spreadsheet that uses human-friendly versions of the smallest child elements in MODS as column headers; that specific spreadsheet doesn't reflect all the fields in MODS that are available, so think of it as an infinitely extensible collection mechanism. In truth, it doesn't even matter what the headers are, as long as they map easily to MODS and the content is consistent.

Step 2 - Add Some Metadata

This step is pretty simple. We have generic guidelines for creating metadata - things like "Transcribe title from the object or create a title that describes the object." or "Use the format YYYY-MM-DD." Our goal in the DSU is to intervene as little as possible into this process. Usually all we'll do is a bit of clean-up before making it publicly available. You can see the instructions that we provide for users as comments if you hover over the column headers on the spreadsheet.

Step 3 - Import into Google Refine

Open Refine (also called Google Refine) allows you to perform sophisticated manipulations on tabular data. It supports regular expressions and a host of other ways to mash up your info. Once you have the program installed, it works in the Chrome browser. One word of warning, though - a desktop install can only handle so many rows of content before it will die on you. It's possible to allocate more memory if the program is having trouble parsing the data that you import.

The import process is simple - export the spreadsheet from Google as .xls, then import into Google Refine using the Create Project function. It looks like this:

Make sure that your data is rendering properly in the preview window and click on Create Project. You'll end up with - surprise! - another spreadsheet, this time in Open Refine.

Step 4 - Refine the Data

You might want to take this time to refine your data, since that's the whole point of Open Refine. You can do things like removing trailing spaces or splitting columns as needed. In the Google spreadsheet, for example, the Subject field includes multiple entities delimited by semicolons; Open Refine will do the work of isolating each of these into a separate column for you, if you should so desire. As mentioned above, it support regular expressions and is very powerful at manipulating data.

Step 5 - Export as MODS

This is the trickiest part, and by "trickiest" I mean surprisingly simple once you've figured it out. Open Refine has several options for exporting data; the one I use to export as MODS is Templating. When you click on it, you get it a form that looks like this:

Within the exporter, you can build any schema you desire. On the left is the editable template and on the right is a preview of how your file will look once it's exported. In this case we want MODS, which was easy to model. You simply need to add the proper tags around the jsonize tags. Here is a template for Open Refine that will show you exactly what to put where - the only thing that might need to be changed is the content within the square brackets in the jsonize tag - the bolded word here: {{jsonize(cells["Title"].value)}} (this is the column header from your spreadsheet). The exporter with the MODS template applied looks like this:

Click export and you'll get a big .txt file of structured data that you can work with - one you save it as .xml it will be valid MODSXML. I like to split that huge file using xml_split, part of the XML::Twig package, but there are any number of different ways of doing it. Zip your individual MODS records up with your objects and everything is ready to batch ingest into Islandora!




This spreadsheet will make metadata creation easy.

Open Refine will make metadata editing easy.

This template will make exporting MODS from Open Refine easy.

Everything is now easy.

Thursday, June 12, 2014 - 14:16

The UTSC Library, in collaboration with the Centre for Digital Scholarship, the Office of the Dean and VP Academic, and the University of Toronto Libraries Chief Librarian’s Office, is organizing a THATCamp.

More and more frequently, professors are creating courses that are centered around digital projects, and incorporate digital tools into their courses. Part of the larger Digital Pedagogy Institute, this THATCamp will focus allow participants to discuss best practices around teaching courses that are centered on digital methods, and digital tools that improve and facilitate research.  It is hoped that a variety of case studies will be presented and discussed in order to bring to light best practices surrounding these emerging methodologies, and the skills that faculty members and librarians need to develop in order to maximize their impact on undergraduates in this specific area.

For more information and to register, please see:

When: Friday, August 15th, 2014


Where: University of Toronto Scarborough Campus, 1265 Military Trail, Toronto, ON, M1C 1A4.

If you have any questions about THATCamp Digital Scholarship Institute, please contact us at

Thursday, June 12, 2014 - 08:14

by Sara Allain

We're really excited that our poster, entitled "Bye Bye, CONTENTdm: a migration to Islandora", was a co-winner for best poster at Open Repositories 2014! Almost 60 posters were presented at the conference on a huge range of subjects. We're incredibly proud to be part of such a diverse and intelligent group of people.

The poster was co-authored by Lingling Jiang, Kim Pham, Kirsta Stapelfeldt, Paulina Rousseau, and myself. Check it out on Slideshare.

Huge congratulations as well to our co-winners Minna Marjamaa, Tiina Tolonen, and Anna-Liisa Holmstrom, whose work on the Theseus Open Repository is inspiring.

Thursday, June 12, 2014 - 04:12

by Sara Allain

We're away at Open Repositories this week (taking lots of notes, so watch out for our blog posts after we all get back to Canada). Everybody is staying up too late since the days are so long, and I've been working on mapping the tweets of attendees. It's still a work in progress, but you can check out mapping on my personal website


Friday, June 6, 2014 - 10:42

This past week I had the opportunity to attend a free information session put on by Toronto Area Archivist Group (TAAG) and University of Toronto Archivist Group (UTAG). As a new summer student employee of the Digital Scholarship Unit it was a great opportunity for someone who trying to break into the world of digital archival initiatives and scholarship. Courtney Mumma, MAS/MLIS, of Artefactual Systems Inc. led the session and introduced the group to Archivematica.

“Archivematica is a free and open-source digital preservation system that is designed to maintain standards-based, long-term access to collections of digital objects. Archivematica is packaged with the web-based content management system AtoM for access to your digital objects.

Archivematica uses a micro-services design pattern to provide an integrated suite of software tools that allows users to process digital objects from ingest to access in compliance with the ISO-OAIS functional model. Users monitor and control the micro-services via a web-based dashboard. Archivematica uses METS, PREMIS, Dublin Core and other best practice metadata standards.” [1]

The session was held in Thomas Fisher Rare Book Library at University of Toronto, St. George Campus. Sitting among the floors and floors of unique and beautiful books set up an interesting dynamic between the analog, the digital and the initiative to bring them together. The workshop started with a demonstration showing steps that may be included in a basic workflow, she explained the output capabilities and used a staggering amount of acronyms, which, as I gather is par for the course in this field. Mumma did an excellent job of explaining the program and the demonstration helped to guide those of us who were new to the program. Even with Mumma’s skill as a presenter there was a lot of information to process and it was impossible to grasp all of the programs’ capabilities in the time given. Thankfully, they have a detailed wiki that explains the basic capabilities of the program.

Archivematica, as Mumma said, “Allows an archivist to remain an archivist,” by facilitating appraisal (a forthcoming feature), preservation and metadata creation. What Archivematica has done is gather together all the best open-source tools, what they call micro-services, to allow for individual configuration to the specifications and needs of individual repositories. In providing an individually configurable program this allows archivists to use the program without fussing around with the multiple and varied individual tools for discrete tasks. Archivematica is also compatible with most storage and access systems.

Archivematica can be downloaded for free and used for free, and it is open-source. It also comes with a detailed user manual and an online forum where users can discuss issues and post questions. In theory, it can be used without costs. However, for those who are uncomfortable with more robust technologies, the set-up and maintenance may be daunting without the help of an IT department. Thus, Artefactual Systems offers Archivematica set-up, configuration, tutorials and maintenance services and more, at a cost. The services provided are extensive and highly valuable but they bring up the issue that is plaguing all heritage organizations these days: money.

Artefactual is upfront about their costs (they can be found here). But many archives or library departments are small and have small budgets and some institutions do not have access to the kind of IT support needed for the DIY option. While we recognise the digital future and want to move toward it, sometimes it seems insurmountable in terms of resources.

During the workshop there was some talk about how to spread out the costs amongst institutions willing to engage in Artefactual’s services. For example, the Council of Prairie and Pacific University Libraries (COPPUL) have formed a consortium to employ and pay for Artefactual’s services amongst them. At the workshop, there was some talk of Ontario institutions also employing Artefactual’s services consortially.

Overall, the workshop was very informative and promising. It shows that there are great initiatives and great interest in the move toward digital. It is exciting to see where the push toward digital will bring archival institutions and how it will shape the heritage professions. Thanks is due to TAAG and UTAG for putting on this session and also thank you to Courtney Mumma and Artefactual Systems for the opportunity to learn more about your services and resources.

Wednesday, May 28, 2014 - 11:20

Oh, they do so many things they never stop. Oh, the things they do there, my stars.

Why hello, I'm the new contract hire at the DSU since May.  So far it's been lovely - I love the work pace and I immediately felt like I was a part of the team.  The first thing I worked on here was to get content into their shiny new online repository (3 weeks my senior).  I was to move all of the metadata from the Doris McCarthy Image Collection contained in ContentDM (the old asset management software) into Islandora (the new asset management software).  My aim is to be as transparent as possible in the hopes that this will be of value to someone such as myself starting out in libraries and working with library data and metadata.  Of course, I will be more than happy to answer any questions if you too share a similar pain.

Hey, lets make it easy.  The code is available on github.

What we had: Doris McCarthy Simple DC Export (.xml) Rename map (document) What we used: oxygen XML editor (30 day trial) text editor (we used Sublime Text, excel) xml_twig (cpan needs to be installed) Scripts: xslt rename map (.xsl) rename (.sh) LOC DC2MODS (.xsl)

To start:

the exported XML file from CONTENTdm - in Simple DC had 750 records The renaming map project document that has old filename and new filename (done manually)

To end:

one individual .xml record in MODS for each associated .tif object (they need to have the same filename in order to be properly batch ingested using the Large Image Content Model)


Create a rename map: create an xml style sheet document (xslt) to replace all the text within <dc:source> to read the current name, lookup in xslt its corresponding replacement identifier

Rename transformation in oxygen xml editor -> 750 records (no loss)

a. ~20 identified duplicates: some had identical identifiers because, some just had two metadata records associated with the object (like some records included OCR transcriptions while the dupe didn’t have)
b. ~30 container metadata records didn’t have a mapping name so they weren’t transformed - acceptable

Split the files -> 750 files (no loss): using xml_twig > xml_split module

Rename the split files -> 730 records as predicted from 2b: using

Transform metadata records from DC to MODS -> 730 records (no loss from step 4): using oxygen xml editor, LOC has templates for MODS transformations that we modified to match our CONTENTdm metadata export

Ready for ingest: single image + xml package, book batch too (steps not included). yay



in almost every step something didn’t work – you will need to go back a few steps, fix, and proceed it’s difficult to figure out the order to do everything, don’t be afraid to try it a different way (you can do one step first and it may cause more problems than if you did it another way - e.g. deciding if you do the dc to mods first or wait until the very end) cleanup is crucial to every step – the more time you devote to clean up earlier on in your workflow the easier the rest of the process will be

All in all, it's been a very exciting month, not just for me but for everyone at the DSU.  Or maybe it's always like this...