Friday, June 27, 2014 - 13:21

by Sara Allain

Lately we've been trying to come up with a better way to create metadata for batch ingestion into Islandora. We just started preparing the UTSC Photographic Services Collection to go online - our lovely Young Canada Works summer student, Rachel, has been diligently selecting a few hundred candidates for the first phase of digitization - and it makes sense to start creating the metadata as well so that once we have digital surrogates we can bundle it all into Islandora via the batch ingest quickly. Since metadata creation/manipulation takes up a lot of my day, I started thinking about the most effective way to create XML using a workflow that would be optimal for our students, our systems, and me.

This is fairly long and detailed, so feel free to jump to the bottom for the highlights.

We often work with faculty and other people outside of the unit to create metadata for the various digital scholarship projects that we steward. Spreadsheets are an easy and accessible way for faculty, students, researchers - whomever - to come to grips with structured data. Things are tidy, they're easy to manipulate, we can derive CSV files - but most importantly, our project collaborators are familiar with how they work. There's no learning curve. We use a range of products from Excel to LibreOffice to Google Drive to do this - whatever's most suited to the project.

Step 1 - Set Up Your Spreadsheet

We're using MODS for all generic content going forwards - in past we used Dublin Core, but Islandora natively prefers MODS and it's more flexible for complex objects. (We may use other schemas for subject-specific content in the future, like Darwin Core for biodiversity data, which will be an interesting blog post in itself.) I set up a Google spreadsheet that uses human-friendly versions of the smallest child elements in MODS as column headers; that specific spreadsheet doesn't reflect all the fields in MODS that are available, so think of it as an infinitely extensible collection mechanism. In truth, it doesn't even matter what the headers are, as long as they map easily to MODS and the content is consistent.

Step 2 - Add Some Metadata

This step is pretty simple. We have generic guidelines for creating metadata - things like "Transcribe title from the object or create a title that describes the object." or "Use the format YYYY-MM-DD." Our goal in the DSU is to intervene as little as possible into this process. Usually all we'll do is a bit of clean-up before making it publicly available. You can see the instructions that we provide for users as comments if you hover over the column headers on the spreadsheet.

Step 3 - Import into Google Refine

Open Refine (also called Google Refine) allows you to perform sophisticated manipulations on tabular data. It supports regular expressions and a host of other ways to mash up your info. Once you have the program installed, it works in the Chrome browser. One word of warning, though - a desktop install can only handle so many rows of content before it will die on you. It's possible to allocate more memory if the program is having trouble parsing the data that you import.

The import process is simple - export the spreadsheet from Google as .xls, then import into Google Refine using the Create Project function. It looks like this:

Make sure that your data is rendering properly in the preview window and click on Create Project. You'll end up with - surprise! - another spreadsheet, this time in Open Refine.

Step 4 - Refine the Data

You might want to take this time to refine your data, since that's the whole point of Open Refine. You can do things like removing trailing spaces or splitting columns as needed. In the Google spreadsheet, for example, the Subject field includes multiple entities delimited by semicolons; Open Refine will do the work of isolating each of these into a separate column for you, if you should so desire. As mentioned above, it support regular expressions and is very powerful at manipulating data.

Step 5 - Export as MODS

This is the trickiest part, and by "trickiest" I mean surprisingly simple once you've figured it out. Open Refine has several options for exporting data; the one I use to export as MODS is Templating. When you click on it, you get it a form that looks like this:

Within the exporter, you can build any schema you desire. On the left is the editable template and on the right is a preview of how your file will look once it's exported. In this case we want MODS, which was easy to model. You simply need to add the proper tags around the jsonize tags. Here is a template for Open Refine that will show you exactly what to put where - the only thing that might need to be changed is the content within the square brackets in the jsonize tag - the bolded word here: {{jsonize(cells["Title"].value)}} (this is the column header from your spreadsheet). The exporter with the MODS template applied looks like this:

Click export and you'll get a big .txt file of structured data that you can work with - one you save it as .xml it will be valid MODSXML. I like to split that huge file using xml_split, part of the XML::Twig package, but there are any number of different ways of doing it. Zip your individual MODS records up with your objects and everything is ready to batch ingest into Islandora!




This spreadsheet will make metadata creation easy.

Open Refine will make metadata editing easy.

This template will make exporting MODS from Open Refine easy.

Everything is now easy.

Thursday, June 12, 2014 - 14:16

The UTSC Library, in collaboration with the Centre for Digital Scholarship, the Office of the Dean and VP Academic, and the University of Toronto Libraries Chief Librarian’s Office, is organizing a THATCamp.

More and more frequently, professors are creating courses that are centered around digital projects, and incorporate digital tools into their courses. Part of the larger Digital Pedagogy Institute, this THATCamp will focus allow participants to discuss best practices around teaching courses that are centered on digital methods, and digital tools that improve and facilitate research.  It is hoped that a variety of case studies will be presented and discussed in order to bring to light best practices surrounding these emerging methodologies, and the skills that faculty members and librarians need to develop in order to maximize their impact on undergraduates in this specific area.

For more information and to register, please see:

When: Friday, August 15th, 2014


Where: University of Toronto Scarborough Campus, 1265 Military Trail, Toronto, ON, M1C 1A4.

If you have any questions about THATCamp Digital Scholarship Institute, please contact us at

Thursday, June 12, 2014 - 08:14

by Sara Allain

We're really excited that our poster, entitled "Bye Bye, CONTENTdm: a migration to Islandora", was a co-winner for best poster at Open Repositories 2014! Almost 60 posters were presented at the conference on a huge range of subjects. We're incredibly proud to be part of such a diverse and intelligent group of people.

The poster was co-authored by Lingling Jiang, Kim Pham, Kirsta Stapelfeldt, Paulina Rousseau, and myself. Check it out on Slideshare.

Huge congratulations as well to our co-winners Minna Marjamaa, Tiina Tolonen, and Anna-Liisa Holmstrom, whose work on the Theseus Open Repository is inspiring.

Thursday, June 12, 2014 - 04:12

by Sara Allain

We're away at Open Repositories this week (taking lots of notes, so watch out for our blog posts after we all get back to Canada). Everybody is staying up too late since the days are so long, and I've been working on mapping the tweets of attendees. It's still a work in progress, but you can check out mapping on my personal website


Friday, June 6, 2014 - 10:42

This past week I had the opportunity to attend a free information session put on by Toronto Area Archivist Group (TAAG) and University of Toronto Archivist Group (UTAG). As a new summer student employee of the Digital Scholarship Unit it was a great opportunity for someone who trying to break into the world of digital archival initiatives and scholarship. Courtney Mumma, MAS/MLIS, of Artefactual Systems Inc. led the session and introduced the group to Archivematica.

“Archivematica is a free and open-source digital preservation system that is designed to maintain standards-based, long-term access to collections of digital objects. Archivematica is packaged with the web-based content management system AtoM for access to your digital objects.

Archivematica uses a micro-services design pattern to provide an integrated suite of software tools that allows users to process digital objects from ingest to access in compliance with the ISO-OAIS functional model. Users monitor and control the micro-services via a web-based dashboard. Archivematica uses METS, PREMIS, Dublin Core and other best practice metadata standards.” [1]

The session was held in Thomas Fisher Rare Book Library at University of Toronto, St. George Campus. Sitting among the floors and floors of unique and beautiful books set up an interesting dynamic between the analog, the digital and the initiative to bring them together. The workshop started with a demonstration showing steps that may be included in a basic workflow, she explained the output capabilities and used a staggering amount of acronyms, which, as I gather is par for the course in this field. Mumma did an excellent job of explaining the program and the demonstration helped to guide those of us who were new to the program. Even with Mumma’s skill as a presenter there was a lot of information to process and it was impossible to grasp all of the programs’ capabilities in the time given. Thankfully, they have a detailed wiki that explains the basic capabilities of the program.

Archivematica, as Mumma said, “Allows an archivist to remain an archivist,” by facilitating appraisal (a forthcoming feature), preservation and metadata creation. What Archivematica has done is gather together all the best open-source tools, what they call micro-services, to allow for individual configuration to the specifications and needs of individual repositories. In providing an individually configurable program this allows archivists to use the program without fussing around with the multiple and varied individual tools for discrete tasks. Archivematica is also compatible with most storage and access systems.

Archivematica can be downloaded for free and used for free, and it is open-source. It also comes with a detailed user manual and an online forum where users can discuss issues and post questions. In theory, it can be used without costs. However, for those who are uncomfortable with more robust technologies, the set-up and maintenance may be daunting without the help of an IT department. Thus, Artefactual Systems offers Archivematica set-up, configuration, tutorials and maintenance services and more, at a cost. The services provided are extensive and highly valuable but they bring up the issue that is plaguing all heritage organizations these days: money.

Artefactual is upfront about their costs (they can be found here). But many archives or library departments are small and have small budgets and some institutions do not have access to the kind of IT support needed for the DIY option. While we recognise the digital future and want to move toward it, sometimes it seems insurmountable in terms of resources.

During the workshop there was some talk about how to spread out the costs amongst institutions willing to engage in Artefactual’s services. For example, the Council of Prairie and Pacific University Libraries (COPPUL) have formed a consortium to employ and pay for Artefactual’s services amongst them. At the workshop, there was some talk of Ontario institutions also employing Artefactual’s services consortially.

Overall, the workshop was very informative and promising. It shows that there are great initiatives and great interest in the move toward digital. It is exciting to see where the push toward digital will bring archival institutions and how it will shape the heritage professions. Thanks is due to TAAG and UTAG for putting on this session and also thank you to Courtney Mumma and Artefactual Systems for the opportunity to learn more about your services and resources.

Wednesday, May 28, 2014 - 11:20

Oh, they do so many things they never stop. Oh, the things they do there, my stars.

Why hello, I'm the new contract hire at the DSU since May.  So far it's been lovely - I love the work pace and I immediately felt like I was a part of the team.  The first thing I worked on here was to get content into their shiny new online repository (3 weeks my senior).  I was to move all of the metadata from the Doris McCarthy Image Collection contained in ContentDM (the old asset management software) into Islandora (the new asset management software).  My aim is to be as transparent as possible in the hopes that this will be of value to someone such as myself starting out in libraries and working with library data and metadata.  Of course, I will be more than happy to answer any questions if you too share a similar pain.

Hey, lets make it easy.  The code is available on github.

What we had: Doris McCarthy Simple DC Export (.xml) Rename map (document) What we used: oxygen XML editor (30 day trial) text editor (we used Sublime Text, excel) xml_twig (cpan needs to be installed) Scripts: xslt rename map (.xsl) rename (.sh) LOC DC2MODS (.xsl)

To start:

the exported XML file from CONTENTdm - in Simple DC had 750 records The renaming map project document that has old filename and new filename (done manually)

To end:

one individual .xml record in MODS for each associated .tif object (they need to have the same filename in order to be properly batch ingested using the Large Image Content Model)


Create a rename map: create an xml style sheet document (xslt) to replace all the text within <dc:source> to read the current name, lookup in xslt its corresponding replacement identifier

Rename transformation in oxygen xml editor -> 750 records (no loss)

a. ~20 identified duplicates: some had identical identifiers because, some just had two metadata records associated with the object (like some records included OCR transcriptions while the dupe didn’t have)
b. ~30 container metadata records didn’t have a mapping name so they weren’t transformed - acceptable

Split the files -> 750 files (no loss): using xml_twig > xml_split module

Rename the split files -> 730 records as predicted from 2b: using

Transform metadata records from DC to MODS -> 730 records (no loss from step 4): using oxygen xml editor, LOC has templates for MODS transformations that we modified to match our CONTENTdm metadata export

Ready for ingest: single image + xml package, book batch too (steps not included). yay



in almost every step something didn’t work – you will need to go back a few steps, fix, and proceed it’s difficult to figure out the order to do everything, don’t be afraid to try it a different way (you can do one step first and it may cause more problems than if you did it another way - e.g. deciding if you do the dc to mods first or wait until the very end) cleanup is crucial to every step – the more time you devote to clean up earlier on in your workflow the easier the rest of the process will be

All in all, it's been a very exciting month, not just for me but for everyone at the DSU.  Or maybe it's always like this...

Tuesday, May 27, 2014 - 13:29

The UTSC Library, with the support of the UTSC and York University libraries, is proud to be the host for Islandora Camp GTA

In lieu of our usual summer Islandora Camp in PEI, our 2014 Canadian camp is going to the big city. #iCampGTA will take place on the campus of the University of Toronto Scarborough from August 6-8, 2014. If you have any questions about the GTA Camp, please contact us.

Register Schedule Accommodations and Travel  Call for Proposals T-Shirt Logo Contest - win your registration!

See additional details on the Islandora Website. Hope to see you there!

Tuesday, May 27, 2014 - 13:23

Register for this event

We invite proposals for Digital Pedagogy and the Undergraduate Experience: An Institute

Proposals should contain a title, an abstract (of approximately 250 words) plus list of works cited, and the names, affiliations, and website URLs of presenters; fuller papers will be solicited after acceptance of proposals, for circulation in advance of the gathering to registered participants.

Alternatively, you can propose a workshop related to Digital Pedagogy, with the same stipulations as above. 

Please send proposals before 30th June 2014 to Paulina Rousseau,

Tuesday, May 27, 2014 - 12:41

Register for this event

See the Institute and THATCamp Schedule


Instructional Centre - Rooms IC-300, 302, 306, 1265 Military Trail, Toronto 

Join us for an event that considers the way digital scholarship is changing the landscape of undergraduate pedagogy!

Emerging technologies have had an immense impact on the way that research is now conducted by scholars in academic disciplines. There is a move toward the use of computers, applications, and larger, non-discrete data sets for what is increasingly termed “digital scholarship.” Thanks to these advanced developments in computing, research in all fields has taken on a much more collaborative nature, and has resulted in experimentation with research outputs in new formats, and the creation of new intellectual products. These major changes in research methodology mandate the development of new skill sets, both in faculty and in the training of students. As such, Digital Literacy and Pedagogy must become a priority for undergraduate and graduate students, as well faculty members who must inherit and participate in new, digitally-mediated methodologies.

This institute will explore the potential impact that Digital Pedagogy can have on student experience, with specific focus on the undergraduate level. This will include the following topics:

How can digital research methodologies be used to improve undergraduate engagement? What are the best methods for teaching students digital skills so that they can participate in the creation of digital research? What has proven to be successful, What political and ideological decisions do educators involved in digital scholarship need to make in order to benefit students, preparing them for work beyond the academy, and how can this influence the formation of canons that might help stabilize the field, and How can faculty members shift from transmitting knowledge to facilitating projects, co-inquiring and co-learning with students in activity-centered projects.

This institute will bring together digital scholars with considerable expertise in the area of Digital Pedagogy, and will consist of plenary sessions, informational sessions, hands on workshops involving digital tools, workshops focusing in methods for integration of digital pedagogy into both specific courses and the larger curriculum. It will also involve presentations from students who have participated in the development of digital scholarship projects in hopes of gaining insight into the integration of this skill sets improves the learning experience and job readiness.

Featured events will include plenary talks by Rebecca Frost Davis, Director of Instructional and Emerging Technologies (St. Edwards University, Fellow at NITLE), and Lisa Spiro (Executive Director, Digital Scholarship Services, Rice University, Former Director of NITLE labs).

The institute will close with THATCamp, which will allow scholars to discuss the most pertinent issues that concern them in the realm of digital pedagogy, and will be preceded by Islandora Camp. 

This Institute is generously sponsored by the UTSC Library Chief Librarian, the UTSC Office of the Dean and Vice President Academic, the Office of the University Chief Librarian, University of Toronto, and UTSC's Centre for Digital Scholarship. 

See our Call for Proposals. Due by June 30th, 2014

Register for this event

Sunday, May 11, 2014 - 18:09

I'm away from the Digital Scholarship Unit this week in semi-sunny London,  as an instructor for Islandora Camp UK. Here are some of my notes:

Note: Fresh and Maybe Flawed.

The final day of Islandora Camp reunited the Developers and Adminstrator's tracks.

People were trickling to the whiteboard to record their Github handles for addition to the Islandora Github Organization. 

Last night’s late-nighters are among those who got up this morning for a run through the city. After everybody got some coffee, we started with Alan Stanley’s presentation on producing digital editions. 

The presentation comes out of the Editing Modernism In Canada (EMIC) project and its partners. I worked on EMIC in earlier days, and it was interesting to see the progress - particularly the integration of Desmond Schmidt/Austese Work, and the CollateX tool. In FedoraCommons, versions of a work are being stored as separate objects, against which these tools are run to detect differences and help in the contruction of digital editions. I need to get back to a review of the AUSTese workbench to explore what's been happening in the Digital Humanities community. 

The Module that Alan (and discoverygarden) are building also provides WYSIWYG TEI creation through the Canadian Writing and Research Collaboratory (CWRC)’s CWRCWriter application. Alan says that “it works, but it needs a lot of tweaking,” meaning that we have a little while to wait before this project is generalized and released to the community, but it's very exciting to see in action. 

Donald’s Form Builder session came next; Form Builder's a big and complex tool, and Donald promises to post his slides from this and other presentations on the same page as the conference schedule. Beyond teaching the tool and its interface, Donald faciliatated coversation about encoding validation steps for forms through the interace. Validation is currently in the hands of the form creator, or encoded by hand. It would be great to see a more generalized solution. Since camp, there’s been an interesting post on the lists about validation for specific form fields

After this, it was time for a break, followed by the awards - here are our recipients!

“Old School Strength” to Draženko Celjak for VM installation on Windows XP (the brave soul).  “Continuous Passion About Integration” to Simon Fox, from the Freshwater Biological Association, future Travis expert, and sherif of Islandora code.  “Friendly Traveller” to Ken Kim from Next Aeon Korea,  for coming such a long way and being such a collegial camper. “The Spirit Award” to Anna Jordanous with many thanks for her hard work making camp a success, everything from finding us a space to bringing power cords and coordinating the social event. 

We took a group pic at this point. You might have seen it on twitter

After this, we were on to the community presentations.

Luis Martinez-Uribe, from Fundación Juan March, talked about how discoverygarden helped his organization, which distributes funding and provides stewardship for Spain’s cultural life, set up Islandora. There were a lot of interesting things about this presentation: FJM chose Islandora because the project was led by a librarian (Mark Leggott), many of the views in the site were generated via exports from Archivist Toolkit, and there are a lot of custom views for content, some using third-party tools (like the popular Simile widget). FJM has also used Fusion Tables in Google to make some neat visualizations about the artists that have been showcased over the years. The project is a testament to the value of structured data, as Luis says,“It actually pays off to prepare the data.”

FJM is also interesting because they’ve gone to a completely different display layer (not Drupal). Being a Windows shop without any in-house PHP expertise, they developed a .net FJM-Islandora Library that replaces Tuque. We saw the library in action on a large collection of exhibition catalogues dating back to the early 1970s.

This is around the time that Nick turns around with twinkling eyes and says: “I wonder if I can get d3 to work with Solr” - I’m still watching his twitter feed to see what emerges. 

The last presentation before lunch was from Caleb Derven at the University of Limerick. He's spent the last few years developing infrastructure. Although many of the repositories in Ireland run Dspace, Hydra, or bespoke front ends for Fedora, Caleb worked with discoverygarden to build out 20TB of storage affiliated with an Islandora installation, citing Islandora as a more flexible approach better suited to the types of staff and expertise at his institution. He’s interested in EAD support in Islandora, and I sadly have to run to feed Henry before I get to hear the rest of his presenation. I felt like Caleb and I spent most of the camp trying to get together for a conversation about Islandora and archives, but weren’t successful.   

Next up were two presentations from the Freshwater Biological Association (FBA) outlining their approach to RDF - First Nicholas Bywell showed off FBA's Object Linker module, which integrates with fixed vocabularies and provides autocomplete against a preferred term collection. The group creates authorities for terms using MADS, but notes that one could modify to use SKOS pretty easily.  Anna Jordanous follows, and her presentation introduces the group’s use of Linked API from Epimorphics and sparks a discussion of how to take data in a spreadsheet and produce linked data. 

After the FBA presentation, Donald Moses introduces the new IR code, which is probably worthy of its own blog post. It’s a really big suite of modules, with good support for ingest of citations from things like doi, pmid, endnote and RIS through to display that leverages CSL stylesheets and the creation of custom bibliographies. Then it’s time for a quick break.

After the break, Ken Kim talks about how his group has photographed thousands of Korean artifacts and made them available in a Drupal website that supports eight languages. We end camp with a discussion of the future, including the implications of Fedora 4 and Drupal 8 - while nobody had any clear timetables or deadlines, the commitment of the community to a future Islandora is pretty clear, as is the desire for a good upgrade path. For now, there are lots of new sites going up daily in Islandora 7, and I’m amazed at how far the code has come since my first camp in 2010. 

Watch for the presentation slides to go up and we'll tweet when we see them. 

If this sounds interesting, come to Islandora Camp GTA!

Thursday, May 8, 2014 - 16:32

I'm away from the Digital Scholarship Unit this week in semi-sunny London,  as an instructor for Islandora Camp UK. Here are some of my notes:

Note: Fresh and Maybe Flawed.

The second day of Islandora Camp has ended, and I’ve had far less time to take notes. This is because we split into our separate sessions (administrators and developers) and Donald Moses and I were instructing in earnest. Though we missed our developer friends, we pushed on into a deep-dive of the Islandora administrative interface. 

Typically, the administrative track starts off with an overview of basic site-building and user management functions in Drupal (a hurdle for some Islandora administrators) before moving to a review of Islandora permissions (and an overview of FedoraCommons) before ending in with Solr. This day-long session was designed by Melissa Anez, the Islandora Foundation Project & Community manager. Donald Moses and I both admire the graceful approach Melissa has taken in designing hands-on sessions that ease people into the sometimes daunting world of Drupal, FedoraCommons, and Solr and how these applications meet in the Islandora ecosystem. 

That said, this admiration didn’t stop us from getting diverted (sorry Melissa!) into discussions of media management, the philosophy behind Islandora’s extension of pre-existing Drupal modules, the art of authoring namespace prefixes, and desirable server setups (to vagrant or not to vagrant?).

Because we could not get enough of being smooshed together in small underground spaces, camp finished off with a lovely dinner at the bottom of Covent Garden. The dev and admin tracks were reunited with much comparison of personal histories and accents, and plans for Islandora. It sounds like the Dev track also went well. I came back to the hotel with my family (one year olds don't really like talking about Islandora), but most of camp is still out there in the city, painting the town Islandora-t-shirt red.

I forgot how much I get out of these camps, and how great it is (after 4 years of Islandora) to see new faces interspersed with established Islandorians. What a lovely bunch of people!

If this sounds interesting, come to Islandora Camp GTA!

Wednesday, May 7, 2014 - 17:07

I'm away from the Digital Scholarship Unit this week in semi-sunny London,  as an instructor for Islandora Camp UK. Here are some of my notes:

Note: Fresh and Maybe Flawed.


Camp started this morning in a basement room of the King’s college Strand campus. Altogether, we are 20 folks from Canada, Korea, Germany, Italy, Croatia, and areas of the UK. Our morning roundtable discussion revealed that our professions are as diverse as our countries of origin - there are developers (of course) as well as librarians, archives, administrators, private company service providers, and government staff. But we’re unified by our interests - In no particular order, the leitmotifs of IslandoraCamp UK are shaping up to be:

The Institutional Repository Multisites RDF/Linked Data Long-term preservation Archives and Islandora Running at Head & Migration (Systems Sustainability) Internationalization


After a very animated morning break, it was time for show and tell. Nick Ruest showed off his WARC Solution pack for archiving websites.  It looks like this. Particularly cool is the idea of running a local wayback machine to show off the archived files.  

Nick then showed some great new content views he’s been building using Islandora Solr Views. This module isn’t part of the current release, basically because it doesn’t respect access control. But, it is great for an open collection, and particularly impressive in the hands of Nick and his (York’s) rich metadata. Nick showed us how he’s leveraging Infinite Scroll to show off 11,000 digitized slides, and a beautiful map of georeferenced assets using leaflet.js.

We also took a look at Innisfil Public Library’s “Faces of Innisfil” project. This awesome public library is using the Islandora Simple Workflow Module to crowdsource community photos, which are then vetted by a site administrator. 

At this point, there was a discussion of Biological Data sets in Islandora. Giancarlo Birella presented on the V2P2 repository for storing, searching, and sharing data from research on plant microorganism-virus interactions. Mike Haft, from the Freshwater Biological Association had stories from the trenches, and many good suggestions for useful tools and existing taxonomies. 
A great initial output of Giancarlo’s project has been a documentation wiki V2P2 Repository Dev Zone. As a convener of the documentation group, I’m definitely bookmarking this to see how we might be able to promote or contribute to this work. 

We talked about Darwin Core, which, as Mike pointed out, is good for samples, but not for books. This sparked a discussion of repositories using mixed schemas (for example MODS for book records and Darwin Core for specimens). An interesting outshoot of this question was whether attendees wanted to consume other taxonomical authorities, or wanted to be sources of taxonomical authority. Donald also talked about the progress made in the OAI module over the last release cycle, which has really emerged as a great way of publishing metadata. Also good for folks to know: D6 had a harvester for OAI, but D7 does not - 

The camp then shifted to a discussion of the peculiarities of different systems infrastructures and the challenges this poses for generalizing different migration, ingestion, and update scripts in diverse server environments (particularly when the scripts need to be reviewed for any sensitive information). Donald Moses provided a tour of the Robertson Library (UPEI) Github Organization, including the scripts his team has published. The general consensus is that the more public-facing tools, the better.  

Donald also showed off the Newspaper Solution Pack in  UPEI has built a very nice landing page for the Newspaper level of the collection, and a calendar view of the repository contents. The newspaper pack’s native ITQL queries were slowing down the system, as did an attempt to swap out ITQL with one large Solr query. The solution was to leverage smaller custom Solr queries in a module that will be released in the next cycle. 

Nick also gave us an overview of how he has overwritten default urls to make them prettier and more informative using an Islandora Path Auto module written by “Rosie,” Rosemary le Faive (who is sadly not at camp). Leveraging Drupal’s path auto module, you can use this code to set up URL patterns using tokens. This hasn’t been offered as a release module, because the default SPARQL query is hardcoded, and has to be edited to accommodate each project. 

at 11:45 the air conditioning kicks in. It’s a good moment for everybody in a tiny room with a bunch of people and computers. Offhanded comments are made about SKOS vs MADS.

The last presentation before we move to the release modules is about a new viewer for the Video solution pack (video.js). Watch for Nick’s link on the lists in the coming weeks! 

After lunch, we finished a discussion of tools and modules in the latest Islandora release, which led to a discussion of why certain things wind up in the release and some things don’t, which meant looking at some Travis files to determine how contributors prepare contributions and a quick tour of the developer documentation. In the end, we toured all of the currently released modules and tools, including the new digital preservation suite, image annotation module, command-line batch ingests, and lots of others. We talked media annotation and the media fragments spec.

Alan gave us a tour of a new Xquery module for Islandora that allows for batch editing of repository content - this means you can do things like find and replace text across the whole repository. This seems very powerful, and also terrifying (thank goodness there is a preview query function). As Alan pointed out, the bigger your xquery, the more chance there is of making a mistake. For now, it was exciting enough to see Alan batch edit the DC of several objects to make the content UPPERCASE. Finally, Islandora has its very own Kanye West Button. 

When we were talking about XML Form Builder, it became clear that this tool has uses we are still figuring out. In particular, various Drupal modules can be leveraged by the form builder. Nick demoed the use of Chosen for better select lists, and Donald talked about Taxonomies and Forms. It would be great to get a coordinate picker in there from one of Drupal's many map modules.

The end-of-day Installfest went pretty well, but revealed that some windows users have to edit BIOS in order to use the new VM (which is 64 bit instead of 32 bit). Tomorrow, we split into our administrative and developer streams. As always, I wish I could be sitting in both places at once... 

If this sounds interesting, come to Islandora Camp GTA!



Tuesday, April 29, 2014 - 13:06

The annual TRY Library Staff Conference brings together librarians from the libraries of the University of Toronto, Ryerson University, and York University for a day of communal professional development and networking. DSUers Sara Allain and Sarah Forbes will be conducting a session on free and open source tools for digital curation projects, focusing on the practical application of Inkscape, GIMP, SublimeText, and ImageMagick to process digital images and create metadata.

Sara Allain will also be presenting as UTSC's representative for the Collections UofT repository project, along with Kelli Babcock (UofT Libraries ITS), Karen Suurtamm (UofT Archives and Records Management), and Danielle Robichaud (Kelly Library, St Michael's College). Check out the session description here.


Friday, April 25, 2014 - 18:20

by Sara Allain

On April 21st, members of the DSU had the pleasure of hearing Claire Potter, Professor of History at The New School, speak about the role of digital humanities and its practitioners in academic departments. There were two areas in particular that resonated with us in the DSU: professional acceptance of digital scholarship and the gap in digital competencies.

Digital humanities reopens classic texts to new forms of scholarship.

Claire talked in depth about the problems facing humanities departments, and the problems that digital scholarship has had in addressing those issues. The presentation cited the privileging of codex-based research - that is, deep reading and other forms of book-focused knowledge generation, culminating with a monograph -  in the hiring and tenure process as an ongoing struggle facing researchers in digital scholarship. She noted that the AHA just wrote their first hiring guidelines for digital scholars this year. This area also concerns the DSU   - our lifeblood is the scholars who come to the library seeking a partnership that will support, sustain, and expand their digital projects. UTSC is a good place to be a digital scholar.  But, in order to create a professional culture that can sustain projects across years, digital scholarship must be understood as on par with codex-based research. We often wonder, "What scholarship isn't digital nowadays?" Building that understanding into hiring and tenure practices will make digital scholarship stronger as it brings a wider representation of the scholarly population into these conversations.

Computers too often feel like brooms, not pens.

Claire's second point resonated with us directly - a digital competency gap still very much exists within our institutions - in particular, to quote her verbatim: "The idea that young people are digital natives is crap." Faculty members, researchers, and students all face a steep learning curve. It is hard to envision the possibilities represented by digital scholarship when concepts like text encoding and relational databases are little understood. Claire suggested that the place to start addressing this is with our graduate students - as the next generation of scholars, they will be the individuals who lead the charge towards widespread adoption of scholarly research using digital technologies. While they may not be digital natives, there's certainly potential for them to become digital converts. Introducing graduate students to browser coding (HTML and CSS), markup (XML), encoding (TEI), and programming languages (python, Ruby, whatever's hot) will give them a point of contact is a great place to start. Let's introduce graduate students to programmers and developers and give them a vocabulary to communicate with tech people - librarians included. After Claire's talk, it seems obvious that this will improve digital literacy and stand them in good stead in both academia and the business world.

The point led to an in-house discussion of how we serve faculty members, researchers, and students here at UTSC - not just the ones that are already embedded in the unit, but also the ones that might be peeking into digital scholarship from the edges. We want to bring those people in - we want to make them feel comfortable in a digital environment. Which is why, this summer, we're participating in several events that will introduce digital scholarship tools like Islandora and OpenRefine to faculty members from a variety of subject areas - at the Berkshire Conference on the History of Women, the Roots and Routes Summer Institute, and through our own pedagogical institute following August's Islandora Camp GTA (details TBA). We hope to see you there!

Check out a Storify of tweets from the event.

Friday, April 25, 2014 - 16:52

Consider us for your practicum placement!

We invite students in INF2173 at the University of Toronto's iSchool to take a look at the four practicum placements currently on offer at the Digital Scholarship Unit in the UTSC Library. The four projects encompass data migration, citation management, coyright research, and information literacy, among many other relevant skills. To learn more about working with the DSU for a practicum, check out our previous student Ned Struthers' blog post: Why do a practicum project at the DSU?

The UTSC Library is easily accessible via TTC, and as all of our projects contain a digital component there is a possibility that work can be done from home (subject to discussion between the student and supervisor). All practicum projects are listed via the iSchool's practicum website.

OpenOasis Wordpress Migration

The practicum student will be responsible for assessing, advising, and completing a migration of the content from the OASIS sourcebook’s Joomla-based website to a fresh Commons in a Box installation. Commons in a Box is a Wordpress-based system that packages CUNY Academic Commons functionality for use in research and teaching. The student will work with professionals and technical staff and will be using project management software to track progress. Associated tasks may include the creation of a current content list, the development of a set of recommendations for new information architecture, and migration of content.

History of the Periodic Table of Elements

The UTSC Library’s Digital Scholarship Unit is currently working with UTSC’s Department of Physical and Environmental Sciences (DPES) in order to create an online History of the Periodic Table of Elements. This project aims to highlight the rich history of the Periodic Table of the Elements by assembling citations and increasing the access to the most significant primary sources (papers related to the development of the periodic law and periodic table) together through one comprehensive, openly accessible online resource that can be utilized for students and scholars interested in chemistry and the history of the discipline alike. The practicum student will be responsible for identifying and searching relevant library catalogues, databases and repositories in order to compile a listing of citations, library holdings (print and full-text electronic) and copyright status of seminal, primary source papers related to the development of the Periodic Table of the Elements. 

Digital Literacy Instructional Modules

UTSC Library’s Digital Scholarship Unit has identified a need for undergraduate and graduate students to develop skills in the area of digital scholarship in order to make them fully capable researchers. As such, the Digital Scholarship Unit is seeking a practicum student to contribute to an infrastructure-building project to develop curriculum, assessment tools, and program metrics for a suite of course modules on common digital scholarship tools and topics. The practicum student will be responsible for creating instructional content in the form of online modules for the following tools: Zotero, wikis and blogs, screen-casting and Prezi, text analysis, TEI, scholarly publishing and movie editing. These modules will be delivered using BlackBoard course management software, and will be integrated into a summer Insititute hosted at UTSC that will focus on Digital Pedagogy. The practicum student will work closely with the Digital Scholarship Librarian, the Information Literacy Librarian, as well as a Liaison Librarian for a specific subject area. This is a wonderful opportunity for the practicum student to gain skills in developing digital literacy curriculum and assessment skills. 

CONTENTdm Migration

The UTSC Library’s Digital Scholarship Unit is currently moving its digital content from CONTENTdm to Drupal, and eventually to Islandora. The practicum student will build on the work of our previous practicum student to research, develop and implement a migration plan for the metadata harvested from CONTENTdm. The student will work closely with library professionals and technical staff and will be using project management software to track progress. Associated tasks may include the development of Drupal content models and modules, metadata crosswalks, feedback on high-level milestones, and research.