Monday, April 20, 2015 - 13:33

The following sessions may be of interest to Humanities faculty and library staff this Friday, April 24th.

11.30-1.30 (lunch provided), Instructional Centre, Room IC 318 

"BigDIVA: Search as Research” Laura Mandell (Texas A&M)

"The Renaissance Knowledge Network as Social Knowledge Hub" Daniel Powell (Kings College London), with William Bowen (UTSC) and Ray Siemens (University of Victoria)

2:00-3:00 BigDiva update, Humanities Wing, Room HW 525C

3:00-5:00 ARC Metadata Committee presentation, Humanities Wing, Room HW 525C

Thursday, April 16, 2015 - 09:19
Thursday, April 2, 2015 - 10:10

Are you a doctoral student (or an advanced MA student) wondering about all the buzz around “the Digital Humanities”? Do you wish you could get a concise introduction to this emerging field as it applies to your own research (and meet a community of scholars working in this area)? If so, please join UTSC faculty, digital librarians, and other graduate students for a day-long workshop on April 21st, 2015. In addition to an overview of the field of DH and a sneak peak of some exciting projects being developed at UTSC, you’ll learn about key principles, methods, and tools available to you right now. No coding or prior experience necessary -- just register to secure your spot by filling out this brief survey. As a bonus, your feedback will help us finalize the schedule to ensure that the training provided is as useful as possible. We will provide lunch, instructional materials, and lots of food for thought. All you need to bring is your laptop and questions.

Please note: All graduate students are welcome to register to this free event, but as seating is limited, priority will be given to pre-candidacy students in the Departments of History, Classics, East Asian Studies, Near and Middle Eastern Civilizations, and the Collaborative Programs in Women and Gender Studies and South Asian Studies. Register by April 13th, and you’ll be notified of acceptance by April 15th.

This event is hosted by the Department of Historical and Cultural Studies, in partnership with the UTSC Library.

If you have any questions, please contact us at

Social Media

Tweet the day at #utscDH15

Join our Facebook Group.


Getting Here

Location - the Social Sciences Building (MW) at the University of Toronto Scarborough, RM 120



According to the TTC Trip Planner
Take the 198 ROCKET towards U OF T SCARBOROUGH - EAST
198 Bus Schedule


Preliminary Schedule

9:30 - Welcome, housekeeping, roundtable, what is DH?

10:00 - Introduction to Digital Humanities at UTSC (Faculty Projects)

10:30 - Open Discussion/break

10:45 - Zotero, and Beyond Bibliographic Managment - What is Zotero? How do you use features for data collection and research collaboration? Intro to open-source community and plugins

12:00 - Lunch (Provided)

1:00 - Academic Blogging, Digital & Data Storytelling, Natural Language Processing & Social Media Analysis

3:00 - break

3:15 - Structuring and Analyzing Data (Network Visualizations) - Structured data and your thesis

4:15 -Closing remarks (Frontiers & Data Curation)

4:30 -  Exit survey

Event Updates!

We have a full house! Thanks to all for applying to attend this workshop. We're excited at the amount of interest it has garnered. For those of you attending, we have some updates.

Learn more about who's attending!

Join our Facebook Group.


Thursday, October 23, 2014 - 09:51

We’re on day 4 of Open Access (OA) Week! We had a great turn out yesterday at the button making station outside the library and the social media activity is still going strong.  Photos are posted on the @digitalUTSC Instagram and Twitter accounts, as well as the EPSA Facebook page. Thanks to all who have been participating!  

There are still great OA Week activities happening across the tri-campus the rest of the week. Check out what is happening today at across all 3 U of T campuses

Of particular interest to those at UTSC:

  • We're back TODAY 10:30am-3pm outside the library so if you have a spare moment, please drop by to say hello, make a button and share your thoughts on OA. 
  • Drop by the Library Instruction Lab (AC286A) 2-3pm today and Friday if have any questions about depositing copies of your publications in our research repository (TSpace) or want to know more about publishing in an OA journal. 
  • If you’re looking to publish in an OA journal but don’t have any funding remaining, you may want to consider the Library’s OA Author Fund pilot which has been extended into 2015. We also have RSC Gold-for-Gold vouchers if you’re publishing in an RSC publication. Questions? Email me or come to one of the drop-in sessions listed above.
  • I’ve been periodically tweeting out links to OA publications/open data/other open research outputs to showcase the amazing scholarly output of UTSC researchers.  If you have something you’d like me to highlight, please either email me or include the Library's Digital Scholarship Unit (@digitalUTSC) or my personal (@4Bes) Twitter handle if you decide tweet out a link yourself.
  • There will also be more oaweek trivia coming tomorrow! Show off your OA skills on the @UofTSCCO FB page to win a Starbucks gift card
Thursday, October 23, 2014 - 09:28

Open Access Week 2014: Events Listing


Monday, October 20

Open Access Week 2014 Kick Off Event at the World Bank: Generation Open (WEBCAST)

3:00 - 4:00 pm (EDT)

VIEW THE WEBCAST: (pre-registration not required)

Join the World Bank and SPARC (The Scholarly Publishing and Academic Resources Coalition) as they host the International Open Access Week Kick Off Event, live streamed from Washington D.C. The event seeks to provide a forum for early-career researchers and students to discuss how a transition to open access could affect researchers at various stages in their careers.  A panel of experts will also discuss how academic and research institutions can become involved in supporting early-career researchers to make their scholarly articles and data accessible to all.

The following panelists will be involved in the event:

  • Stefano Bertuzzi: Executive Director, American Society for Cell Biology
  • José-Marie Griffiths: Vice President for Academic Affairs, Bryant University
  • Meredith Niles: Postdoctoral Research Fellow, Sustainability Science Program, Harvard University
  • Jerry Sheehan: Assistant Director for Policy Development, National Library of Medicine

Wednesday, October 22

Humanities Informational Drop-In Session with a Scholarly Communications Librarian

2:00 - 3:00 pm

AC286A Library Instruction Lab

Have questions about publishing your research? Inquiries about copyright or open access? Or just want to explore your options for the future? Come in from 2 to 3 pm to the Library Instruction Lab to talk one-on-one with the UTSC Scholarly Communications Librarian!


Student Social Media & Button Table

10:00 am - 3:00 pm

Outside the Library

Is access to research important to you? Are you considering publication of your research someday? Come chat with Sarah Forbes, the UTSC Scholarly Communication Librarian, and EPSA representatives to learn more about open access and show your support by making personalized buttons and sharing on social media how much these issues matter to UTSC students.


Thursday, October 23

Social Sciences Informational Drop-In Session with a Scholarly Communications Librarian 

2:00 - 3:00 pm

AC286A Library Instruction Lab

Have questions about publishing your research? Inquiries about copyright or open access? Or just want to explore your options for the future? Come in from 2 to 3 pm to the Library Instruction Lab to talk one-on-one with the UTSC Scholarly Communications Librarian!


Student Social Media & Button Table

10:30 am - 3:00 pm

Outside the Library

Is access to research important to you? Are you considering publication of your research someday? Come chat with Sarah Forbes, the UTSC Scholarly Communication Librarian, and EPSA representatives to learn more about open access and show your support by making personalized buttons and sharing on social media how much these issues matter to UTSC students.


Friday, October 24

Sciences Informational Drop-In Session with a Scholarly Communications Librarian

2:00 - 3:00 pm

AC286A Library Instruction Lab

Have questions about publishing your research? Inquiries about copyright or open access? Or just want to explore your options for the future? Come in from 2 to 3 pm to the Library Instruction Lab to talk one-on-one with the UTSC Scholarly Communications Librarian!


Complete listing of tri-campus events


For more information contact:

Sarah Forbes

Wednesday, October 15, 2014 - 10:29

For the original post, visit

Recently we found that we needed to revisit our old friend the Drupal Feeds module and get it to play nice with Zotero's API. This time we wanted to use it with Culinaria's Zotero library. Our goal was to pull all of the items and their content from the library into their UTSC website (much like how an RSS feed works). In the original post, Kirsta brought up that she was having trouble when the processor was set to periodically import items to keep the feed up to date with the Zotero Library. What happened was that every time the import ran, the Processor wouldn't just add any new items and update existing ones , but it would create a new item every single time from the API. This meant that there were lots of duplicate entries in Drupal that needed to be removed. We got it to work now, but first:

A Recap

How to set up a Zotero feed in Drupal:
1. Create 2 new content types: Custom Feeds Processor, Custom Feed Item (under Structure)
2. Add a new Feed Importer (under Structure)
3. Add a new Custom Feed Processor (under Add Content)
4. Configure the new Feed Importer: attach it to the Custom Feed Processor, map it to the fields in the Custom Feed Item which will be pased with XPath (under Structure)
5. Go to the Custom Feed Processor-> use XPath as your field output (under Content)
6. Run the Import in your Custom Feed Processor. You should see that every entry in the Zotero library has created a new custom feed item in Drupal in your Content view. You can then create a page with all of your feed items using the Views module.


You can see the final page here:

We needed to set a unique key that we can use to match up with any existing feed item in Drupal. It seemed to work best when we used the element zapi:key in the Title field. That way every time the import runs, it checks if that key exists and if it does it will update (but not create) a new feed item with that key.

These are the fields we selected in the Feed Importer  to map to the processor.  We also set Title as the unique key in the target configuration column. 

Here are the XPaths we used in our Custom Feeds Processor:


Other Tips

The Zotero API maxes out at 100 items, but we had 115 items. Our workaround was to import all the items in the library by running your processor twice, once by sorting your items in descending order, once by sorting your items in ascending order. Then we set the API to sort by descending order from hereon out so it will only grab the 100 most recent items.

Earlier we used Oxygen to get the XPaths, but Google Chrome has an XML tree extension that you can use that will also quickly get you automatic XPaths:

In your Feed Importer, it's useful to use the Tamper settings to clean up your feeds.  We used HTML entity decode  and URL decode which converts hex values such as "&" into "&".  You can also use plugins such as Find and Replace, Filter empty values, or Explode.

You can turn tags in the Zotero into a taxonomy in Drupal, then create a Menu for those terms.  First you'll need to create the new terms from your Feed Importer:

Thursday, October 2, 2014 - 11:40

Date: Monday October 6th, 2014

Time: 4:00 pm to 5:30 pm

Location: Doris McCarthy Gallery, University of Toronto Scarborough 

Co-presented by the Art History Program, Chemistry Program, and the Doris McCarthy Gallery

Calling all art and science students!

Attend this FREE workshop to learn about the role of conservation in technical art history - led by Art Gallery of Ontario conservators:

Margaret Haupt - Deputy Director, Collections Management and Conservation

Maria Sullivan - Manager, Conservation


Monday, September 1, 2014 - 20:53

I feel no shame in the sense of accomplishment that I got from learning how to make a solution pack module for Islandora!

Working at the DSU has given me the chance to explore the possibilities and push the boundaries of what I'm capable of. It was great because I came out of the process learning a lot more about the mechanics behind Islandora, Fedora and Drupal and how the existing processes faciliate the lifecycle of a digital object in the system.

To me, a solution pack module is like the blueprint used to set up a digital object factory in Islandora city™. The module specifies what objects to produce, how they should look and what data types and packages are associated with an object. This is all done the programming languages understood by Drupal and Islandora.

When it is installed, the Living Research Lab solution pack designed for the DSU creates 5 collections (Births, Mice, Protocols, Publications, Experiments) that is used to manage and add objects.  The objects in each collection are ingested using specially designed forms in Darwin Core Standard (Mouse, Birth, Protocol, Experiment, Publication, and Data Session). What I've done is a really basic setup.  You can and should add hooks and configure so much more to streamline the way your data is ingested, processed and accessed.

The github link for the Living Research Lab Solution Pack can be found here:

I should note that so far module works well in development mode, but it is in definite need of debugging and refining before it can actually be used in a production environment.

It began with XML

In one of our projects we needed to build custom XML forms to store metadata in Islandora. We soon realized that customization would be really important in order to automate some of the actions and to improve usability. Building a solution pack was the next logical step to explore customization.

Tips to get started

  • Look at other solution packs! When you're starting with nothing, it helps to look at something. What I did was look at simple examples, going through and breaking down the code line by line, and trying to trace the lines in the code and when each step is called in Islandora.
  • Draw. Creating diagrams made it easier for me to make connections between different the little parts that contribute to the module's functionality

What I used*

Update 2014-09-11 - other useful resources:

The start of something

The first step I took to understand module building was to create map to see where and how variables and files were being referenced. What I soon discovered is that most files, functions and variables are initially called from the .module file. The .module file is the main file that contains the configuration and hooks and variables. It's in there where you add hooks, and where you wield a lot of power in if you know what you're doing.

The map below is a neater version to my original sketch.  It lists all of the hooks from the .module file and what files they refer to.  It also shows what component it affects in the Islandora system architecture.

Here are my original notes that went with the map when I was going through the Islandora Porcus module. I wrote out the names of all of the files contained in the module and within those files I wrote out all of the function names and tried to figure out what they did:

  1. islandora_porcus.install > install and uninstall the module under Modules, module_load_include draws from islandora modules usually .inc
  2. islandora_porcus.module > everything comes together > SEE BELOW
  3. > info under Modules
  4. ./css/slandora_porcus.css > for Object View Page called from .tpl.php
  5. ./images/piggie.png > for Object View Page, called from
  6. ./includes/ > SEE BELOW
  7. > upload form
  8. > admin config page when you go to Islandora > Click on Module
  9. ./js/islandora_porcus.js > for Object View Page from .tpl.php
  10. ./theme/islandora-porcus.tpl.php > Object View Page, when you create a new object this is what appears
  11. ./xml/islandora_porcus_form_mods.xml > the form!

Questions: So content models customized for your own datastreams and way you want things to be what is a hook even - when you go to a page something (function, template) is activated?

  1. islandora_porcus.module
    function islandora_porcus_menu() > the admin menu item, getting things started and loading
    function islandora_porcus_theme($existing, $type, $theme, $path) > the Theme is the Template - the Object View Page
    function islandora_porcus_preprocess_islandora_porcus(array &$variables) > sets up variables to be placed within the template (.tpl.php) files. From Drupal 7 they apply to templates and functions
    function islandora_porcus_islandora_porcusCModel_islandora_view_object($object, $page_number, $page_size) > first content model association, if object is a Cmodel that you specify here
    function islandora_porcus_islandora_porcusCModel_islandora_ingest_steps(array $configuration) > hooks ingest steps
    function islandora_porcus_islandora_porcusCModel_islandora_object_ingested($object) > calls
    function islandora_porcus_islandora_xml_form_builder_forms() > load form
    function islandora_porcus_islandora_content_model_forms_form_associations() > form association, for the form
    function islandora_porcus_islandora_required_objects(IslandoraTuque $connection) > construct content model object, ingests it

    function islandora_porcus_create_all_derivatives(FedoraObject $object) > derivative spec for the .txt uploaded object, potential file conversion
    function islandora_porcus_transform_text($input_text) > for the .txt uploaded object

    function islandora_porcus_upload_form(array $form, array &$form_state) > http://localhost:8181/islandora/object/porcus%3Atest/manage/overview/ingest this page to upload the file
    function islandora_porcus_upload_form_submit(array $form, array &$form_state) > submit into what datastream

  4. islandora-porcus.tpl.php connects to things from preprocess theme

Even if you don't want to create a module from scratch, this should help you modify existing modules in little ways that could make using Islandora easier.  Feel the power.

[*] My secret wish for tutorials is to list tools and strategies for troubleshooting.  That would help immensely.

Wednesday, August 27, 2014 - 10:28

Working at the Digitial Scholarship Unit this summer has been amazing. I can’t believe how complementary it has been for my education. I have learned so much in a practical sense but also in a much broader sense of real workplace experience in a library with digital initiatives. It’s hard to list all the skills and knowledge I have acquired from such an immersive experience but let me talk a bit about the highlights of this summer.

Right out of the gate I was given the daunting task of preparing metadata for 7000 plus photographs in UTSC’s Photographic Services Collection, one of the archival collections currently housed at the DSU. The end goal of the item level description was to create detailed metadata to be used in an online collection that intended to be representative of the collection and to compliment UTSC’s 50th Anniversary celebrations in 2015.

My first task was to design a workflow and a selection criteria for the series. I was assisted by two excellent work-study students, Vrishti Dutta and Mary-Ellen Brown, who took on the very exciting work of individually numbering each image. This was extremely time consuming but very necessary as it allows us to easily find and process specific images. The numbering, measuring and meta-data creation took approximately one month.

During this time I was also selecting images for the future online collection. Though, as my eyes started to hurt and my brain was sluggish I inexplicably stopped this practice halfway through. This turned out to be a blessing in disguise. By the time I finished processing everything I was able to look at the collection differently. Having seen every image, I had gathered a better understanding of what would best represent the collection and was able to create five sub-series. The sub-series or subjects are Student Life, Academic Life, Campus, Faculty and Staff, and Community. It was very exciting when we finally reached the last box and I realised that I had written metadata for over 7000 images. My feelings of brain-melting and elation are preserved via Twitter and reproduced here for future generations.

But even through all the brain hurt I could tell that this project was so useful. Here I was able create and execute a curation and digitization workflow. This provided me with a better understanding of what it means to work with digital initiatives. After the meta-data was finished, we used OpenRefine and exported the MODS spreadsheet lines as .XML files. I had assigned each image a digital identifier so we used that identifier to name both the .XML files and the .TIF’s of the images, creating neat little packages of images and their metadata. These packages were then ingested into Islandora over the course of a week and make available online.

In addition to processing the series I was also able to assist with archival reference questions. Some people were interested in finding interested photographs for the 50th Anniversary celebration projects. One user was interested in discovering why Pierre Eliot Trudeau had visited UTSC between 1979 and 1980, we were able to find a photograph and a newspaper article, though, no reason was given. Another user was hoping to find a photograph of her sister performing in UTSC’s student theatre for her upcoming 50th birthday party. When we were able to find a previously unseen photograph she was so happy there were tears and hugs. My previous experience with Archives had been as a researcher so it was exciting to be given the chance to be on the other side of the exchange.

To round out the experience I had the opportunity to attend workshops, camps and conferences. First, a workshop on AtoM and Archivematica see earlier blog post. A couple weeks back I volunteered to help out at Islandora Camp GTA. In between coffee breaks I was able to sit in on the admin track sessions. The sessions were expertly run and were very useful whether you were just starting out or building on foundation. I’ve already been working with the Islandora system for some time now but with the help of the sessions I was able to better understand what I had been working with all this time. I look forward to continuing to learn about all the customizable options in Islandora’s offerings.

Lastly, I was able to attend many of the sessions offered at the Digital Pedagogy Institute put on by the Digital Scholarship Unit at UTSC. There I learned about all the incredible educators, librarians, and faculty members that were making use of digital tools in their classrooms. So many of the presenters were embroiled in the most engaging projects and it was refreshing to see such novel uses for technology in the interest of learning and teaching. One great take away was how lucky we seem to be at the University of Toronto to have administrators and librarians dedicated to digital initiatives and how this kind of openness will only become more important as we continue to move in that direction. One attendee of the DPI told me that her takeaway was TTYL (Talk To Your Librarian). I’m constantly blown away by the expertise and commitment of people working in the fields where libraries, archives and technology intersect.

Some final points I’ve gleaned from my time here are a general but nonetheless important for someone starting in this field.

1) Professionalism, attention to detail and the ability to communicate well are key.

2) Staying connected to colleagues via social media is extremely useful, there are many interesting discussions and helpful tips that go down on twitter on a daily basis.

3) Whatever you don’t know, you can learn.

4) If you have a question, Google it first. Chances are someone else has already answered it.

5) Volunteer at conferences. They are excellent opportunities to learn and meet people. I can’t believe I made it this far without volunteering.

6) The archivist/librarian divide is often arbitrary in a digital context and the rivalry is dumb

I am so happy to have been given the opportunity to work at the DSU this summer. I feel as though my experiences here have taught me new skills and given me the confidence to pursue fields where I may have been uncomfortable before. I feel privileged to have worked with so many incredible people here, that they have let me to pick their brains and that they have offered me excellent advice for the future. Thanks to them, I have now secured part-time employment as the Digital Curation Intern at Information Technology Services (ITS) at Robarts Library this school year. I am confident that my time at the DSU has prepared me for new challenges both at ITS and after graduation. I am so very thankful for everything they have done for me this summer.

Rachelann Pisani

Wednesday, August 6, 2014 - 11:37

by Sara Allain

Islandora Camp GTA kicked off with nearly 40 librarians, developers, and archivists gathered at the University of Toronto Scarborough. Fortified by coffee and muffins, we got down to the business of getting to know each other. Campers hailed from throughout Ontario and the East Coast, as well as Oklahoma, Michigan, Ohio, and Florida, and represented a diverse range of use cases and experience levels. The group ran the gamut from people who'd heard the word "Islandora" thrown around but had never touched the platform to folks who've been developing/administering Islandora for years. Leading us through the day's activities were Nick Ruest (York University), Jordan Dukart (discoverygarden), Kirsta Stapelfeldt (UTSC), and David Wilcox (Duraspace).

Introductions included the question, "If you were a sandwich, what kind of sandwich would you be?"

New Release and Future Developments

Nick described the new modules and tools that are contained within the newest release, Islandora 7.x-1.3. In particular, he talked about the suite of modules - Checksum and Checksum Checker, FITS, BagIt, and PREMIS - that make Islandora so much stronger as a preservation system. Some excellent work here.

David talked about Fedora 4, which is a major rearchitecting of the repository software that Islandora works on. David highlighted the way that Fedora will now structure data within the repository, as well as the linked data capabilities and performance enhancements. Jordan Dukart talked about Drupal 8, which is much more object-oriented. 

Community Overview

We looked at some cool things being done with Islandora in the community:

Interest Groups

There are currently four Islandora Interest Groups. The groups are formed and maintained by community members (read: anyone who wants to convene one) in order to address specific problems or questions related to the software and/or the community.

Nick Ruest, Donald Moses, and Mark Jordan convened the Preservation Interest Group to standardize and steward some of the new preservation modules in Islandora, including Checksum, FITS, BagIt, Vault, and PREMIS. The Preservation Interest Group is also working with Archivematica (/Artefactual Systems).

The Documentation Interest Group, convened by Kirsta, Kelli Babcock, and Gabriela Mircea, is focused on improving the Islandora documentation wiki as well as creating new documentation for training and development purposes.

David convened the Fedora 4 Interest Group to help plan how Islandora will integrate with the new version of the repository during the next phase of development.

The newest group is the Archival Interest Group, convened by me, which focuses on how archivists and archival collections interact with Islandora, incorporating questions of training, development, and linked services.


Random notes and recurring themes:

  • Deployment - specifically, issues with deployment that are consistent across implementations
  • Integration with other systems, specifically archival description systems (AtoM and ArchivesSpace) and Omeka
  • Ontologies, migrations, systems, integration with other systems, deployment
  • Good idea/bad idea: multi-sites
  • Drupal 8 and Fedora 4 and how Islandora 7/8 releases will play with one or both of these
  • "It depends"
Friday, June 27, 2014 - 13:21

by Sara Allain

Lately we've been trying to come up with a better way to create metadata for batch ingestion into Islandora. We just started preparing the UTSC Photographic Services Collection to go online - our lovely Young Canada Works summer student, Rachel, has been diligently selecting a few hundred candidates for the first phase of digitization - and it makes sense to start creating the metadata as well so that once we have digital surrogates we can bundle it all into Islandora via the batch ingest quickly. Since metadata creation/manipulation takes up a lot of my day, I started thinking about the most effective way to create XML using a workflow that would be optimal for our students, our systems, and me.

This is fairly long and detailed, so feel free to jump to the bottom for the highlights.

We often work with faculty and other people outside of the unit to create metadata for the various digital scholarship projects that we steward. Spreadsheets are an easy and accessible way for faculty, students, researchers - whomever - to come to grips with structured data. Things are tidy, they're easy to manipulate, we can derive CSV files - but most importantly, our project collaborators are familiar with how they work. There's no learning curve. We use a range of products from Excel to LibreOffice to Google Drive to do this - whatever's most suited to the project.

Step 1 - Set Up Your Spreadsheet

We're using MODS for all generic content going forwards - in past we used Dublin Core, but Islandora natively prefers MODS and it's more flexible for complex objects. (We may use other schemas for subject-specific content in the future, like Darwin Core for biodiversity data, which will be an interesting blog post in itself.) I set up a Google spreadsheet that uses human-friendly versions of the smallest child elements in MODS as column headers; that specific spreadsheet doesn't reflect all the fields in MODS that are available, so think of it as an infinitely extensible collection mechanism. In truth, it doesn't even matter what the headers are, as long as they map easily to MODS and the content is consistent.

Step 2 - Add Some Metadata

This step is pretty simple. We have generic guidelines for creating metadata - things like "Transcribe title from the object or create a title that describes the object." or "Use the format YYYY-MM-DD." Our goal in the DSU is to intervene as little as possible into this process. Usually all we'll do is a bit of clean-up before making it publicly available. You can see the instructions that we provide for users as comments if you hover over the column headers on the spreadsheet.

Step 3 - Import into Google Refine

Open Refine (also called Google Refine) allows you to perform sophisticated manipulations on tabular data. It supports regular expressions and a host of other ways to mash up your info. Once you have the program installed, it works in the Chrome browser. One word of warning, though - a desktop install can only handle so many rows of content before it will die on you. It's possible to allocate more memory if the program is having trouble parsing the data that you import.

The import process is simple - export the spreadsheet from Google as .xls, then import into Google Refine using the Create Project function. It looks like this:

Make sure that your data is rendering properly in the preview window and click on Create Project. You'll end up with - surprise! - another spreadsheet, this time in Open Refine.

Step 4 - Refine the Data

You might want to take this time to refine your data, since that's the whole point of Open Refine. You can do things like removing trailing spaces or splitting columns as needed. In the Google spreadsheet, for example, the Subject field includes multiple entities delimited by semicolons; Open Refine will do the work of isolating each of these into a separate column for you, if you should so desire. As mentioned above, it support regular expressions and is very powerful at manipulating data.

Step 5 - Export as MODS

This is the trickiest part, and by "trickiest" I mean surprisingly simple once you've figured it out. Open Refine has several options for exporting data; the one I use to export as MODS is Templating. When you click on it, you get it a form that looks like this:

Within the exporter, you can build any schema you desire. On the left is the editable template and on the right is a preview of how your file will look once it's exported. In this case we want MODS, which was easy to model. You simply need to add the proper tags around the jsonize tags. Here is a template for Open Refine that will show you exactly what to put where - the only thing that might need to be changed is the content within the square brackets in the jsonize tag - the bolded word here: {{jsonize(cells["Title"].value)}} (this is the column header from your spreadsheet). The exporter with the MODS template applied looks like this:

Click export and you'll get a big .txt file of structured data that you can work with - one you save it as .xml it will be valid MODSXML. I like to split that huge file using xml_split, part of the XML::Twig package, but there are any number of different ways of doing it. Zip your individual MODS records up with your objects and everything is ready to batch ingest into Islandora!




This spreadsheet will make metadata creation easy.

Open Refine will make metadata editing easy.

This template will make exporting MODS from Open Refine easy.

Everything is now easy.

Thursday, June 12, 2014 - 14:16

The UTSC Library, in collaboration with the Centre for Digital Scholarship, the Office of the Dean and VP Academic, and the University of Toronto Libraries Chief Librarian’s Office, is organizing a THATCamp.

More and more frequently, professors are creating courses that are centered around digital projects, and incorporate digital tools into their courses. Part of the larger Digital Pedagogy Institute, this THATCamp will focus allow participants to discuss best practices around teaching courses that are centered on digital methods, and digital tools that improve and facilitate research.  It is hoped that a variety of case studies will be presented and discussed in order to bring to light best practices surrounding these emerging methodologies, and the skills that faculty members and librarians need to develop in order to maximize their impact on undergraduates in this specific area.

For more information and to register, please see:

When: Friday, August 15th, 2014


Where: University of Toronto Scarborough Campus, 1265 Military Trail, Toronto, ON, M1C 1A4.

If you have any questions about THATCamp Digital Scholarship Institute, please contact us at

Thursday, June 12, 2014 - 08:14

by Sara Allain

We're really excited that our poster, entitled "Bye Bye, CONTENTdm: a migration to Islandora", was a co-winner for best poster at Open Repositories 2014! Almost 60 posters were presented at the conference on a huge range of subjects. We're incredibly proud to be part of such a diverse and intelligent group of people.

The poster was co-authored by Lingling Jiang, Kim Pham, Kirsta Stapelfeldt, Paulina Rousseau, and myself. Check it out on Slideshare.

Huge congratulations as well to our co-winners Minna Marjamaa, Tiina Tolonen, and Anna-Liisa Holmstrom, whose work on the Theseus Open Repository is inspiring.

Thursday, June 12, 2014 - 04:12

by Sara Allain

We're away at Open Repositories this week (taking lots of notes, so watch out for our blog posts after we all get back to Canada). Everybody is staying up too late since the days are so long, and I've been working on mapping the tweets of attendees. It's still a work in progress, but you can check out mapping on my personal website