Digitization within Archives

Google’s mission is to “Organize the world’s information and make it universally accessible and useful” (Google, 2012).  In April of 2010, Google Books Library Project announced that they had digitized over 12 million books (OCLC, 2010).  While digitizing does increase the accessibility of materials, digitization also has many challenges.  Google’s endeavor, for example, has been fraught with legal battles over intellectual property rights (New York Times, 2011).  This is not surprising considering copyright is one of the more obvious challenges people face when approaching digitization projects; still, there are many other challenges that also must be overcome.

Digitization can be done successfully and it can overcome the various challenges it faces, but the curator of the project must be aware of the challenges from the beginning. Florida Southern College began their first digitization project in 2006 with the Child of the Son: Frank Lloyd Wright collection.  The McKay Archive building was later opened in February of 2009 and their digital collection has now grown to include the Child of the Son (digital photographs and papers), The Southern (student newspaper), a collection of fruit and vegetable crate labels, and the Shirley Jackson Case Collection and the A.P. Bolton Collection (McKay Archive, 2012).  Part I will detail why a digital collection was necessary for the McKay Archive.  Part II will identify the challenges patrons’ face.  Part III identifies the challenges archivists face.  Finally, part IV concludes with suggestions on how to actively seek out and resolve unknown challenges.

Part I – Why Digitize

As a subset of the FSC’s Roux Library, the McKay Archive shares Roux library’s mission statement.  The mission of the library and archive is, “to educate students in developing lifelong, critical, information-seeking skills by identifying and making information resources available to the college community, and by providing engaged instruction for access and use of information resources” (FSC, 2011).  First, digitization makes rare, inaccessible materials “available to the college community.”   Second, research indicates that digitization projects increase usage statistics and therefore promote education and research (Heaney, 2011).  Digitization is able to do this because, in a sense, digitization advertises and highlights the item.  Thus, digitization is in line with the Mckay Archive’s mission.

In addition to their mission statement, the McKay archive’s collection development policy indicates that the facility is open to for public use and serves to “tell a comprehensive story of the… college and the state of Florida” (2010).  The most efficient way to reach the largest public audience is through digitization.  In Florida, 72% of homes have internet access; because of public agencies, like libraries, 79.93% of Floridians have internet access (U.S. Census Bureau, 2012).  Also, according to the National Center for Education Statistics, as of 2008, every public school has a minimum of one “instructional computer with internet access” and the ration of students to instructional computers was 3.1 to 1 (2010). The internet is ubiquitous.  Once a collection is digitized, almost everyone has access to it regardless of location (Frank-Wilson, 2010).  In 2011, the European Commission added further weight to this argument by announcing plans to further invest in Europeana, a digital library (http://www.europeana.eu/portal/).  Each country in the European Union is expected to contribute a set number of objects to the digital library totaling to about 11 million new objects by 2015 (Carlton, 2012).  Digitization is the most efficient way to reach the largest public audience.  Because of how effective digitization is, the EU expects that Europeana will help create jobs, new businesses and increase “tourism, learning, design and games” (Carlton, 2012).


            Digitization is a useful tool; but ultimately, the main question on the minds of patrons is:
Is this going to work when I need it?  Is the archive easy to use?  Is the archive going to accommodate a variety of users, including users with disabilities?  Is the archive going to accommodate our changing language?  These are the challenges users face every time they attempt to use a new website.  Is this going to work?

Reliability gives an archive a significant competitive advantage.  While this may seem obvious, the vast numbers of unreliable and disorganized websites indicate that the subject of reliability merits significant discussion.  McKay Archive’s web presence borders on unreliable.  The source of the archive’s reliability challenge is that the archive was added on to the main Roux Library page and was not a part of the original design.  Also, the McKay archive is in a developmental stage is not employing any standard reliability testing methods at this time; though they are investigating reliability testing options in hopes of employing them in the near future.

Still, reliability issues must be identified and addressed.  First, on the McKay Archive website the “home” button does not consistently take you back to the primary McKay Archive page.  If you are in the Florida Citrus collection, “home” will take you to the primary Florida Citrus collection page; if you are on the digital collections page, “home” will take you to the primary digital collections page.  In these examples, there is no button available that will immediately take you back to the primary McKay Archive home page.  A standard “home” key aids the user in finding their way and is a necessary component to a reliable digital archive (Witten, Bainbridge, & Nichols, 2003).

Second, the McKay Archive website does not have a simple hierarchy.  Hierarchies are generally identified in the website’s site tree (often located on the left side of the page in an isolated table).  A website’s hierarchy/site tree serves as the outline for the website and should give users an idea of where to find information.  On the McKay Archive home page, some of the collections are listed in the site tree, other collections are listed elsewhere.  The “about” section is listed in the body of the page, not in the site tree.  This makes it difficult to navigate the website and it negatively impacts the reliability of the archive as a whole (Witten, Bainbridge, & Nichols, 2003).  Finally, the website does not have a membership area for users to sign up for archive updates.  Website memberships increase the patron base.  Also, the success and failure rates of these memberships can be used as indicators of whether or not the website is achieving its goals (Kostagiolas, 2011).  These reliability issues are significant challenges for patrons and need to be addressed.

Physical impairments can also be challenges for patrons when attempting to use digital archives, specifically: visual impairments.  Visual impairments can include everything from being legally blind, near/far sited, to being color blind. Color blindness is possibly one of the more overlooked visual impairments.  In addition to limiting the colors a person can see, color blindness can inhibit a child’s leaning progress and impact their ability to read.  In adults, color blindness can limit career choices (WebMD, 2012).  Color blindness can also render some images shapeless making it difficult to experience visual objects (Vischeck, 2008).  Digital archives rely heavily on the patron’s ability to see digital objects so visual impairments must be considered when creating a digital archive.

Original image                                                                                                   Simulated Deuteranopia


Finally, linguistics plays a unique role in the challenges patrons face when using a digital archive.  Most general searches utilize full text searching and, in general, this is what patrons expect and hope to find.  The problem is: full text searching searches everything.  First, a standard full text search can retrieve so many results that the desired result may be buried underneath thousands of results.  For example, one of the more prominent collections in the McKay Archive is the Florida Citrus Archive.  If I search the digital archive for “citrus,” the search will return 1,268 results.  If I am searching for a specific digital object, it will be hard to find it.  Linguistics also introduces an interesting problem in how words change over time (Garrett, 2006).  For example: Should a patron search for color or colour? In the 1942, Jan 17 edition of The Southern the word “colour” was used in discussing the proper hues for clothes in January.  Later editions only use the word “color.”  Because archives specialize in older documents, this type of linguistic change over time presents a significant challenge for patrons.  Misspellings present similar challenges for patrons as do typographic ligatures like Æ and Œ that are found in many Latin based words (Sobel & Beal, 2011).  In short, in a full text search world of infinite possibilities the biggest problems are the infinite possibilities.


(Resolving challenges for patrons)

            The main challenge for digital archive patrons is using the digital archive.  The main challenge for archivists is meeting the needs of the patrons and resolving their challenges. As previously discussed, the main challenges archivists must resolve for their patrons are: reliability, serving patrons with disabilities, and linguistics.  In resolving these issues, archivists are able to help close the the digital divide by providing services that are user friendly and intuitive.

Digital divide is a term used to express the substantial advantage internet use gives to those who have it and the oppressive disadvantage assigned to those who do not.  Over the past ten years public libraries and the public school system have played key roles in increasing free access to computers and the internet; because of this effort, they have made significant advances in closing the digital divide (National Center for Education Statistics, 2010 & Ruben, 2004).  However, the digital divide is not just about access to digital resources.  The digital divide is also about a person’s ability to use digital resources after access is granted (Ruben, 2004).  Archives cannot try to close the digital divide by providing a public computer lab like the public library does; archives are generally not equipped or designed for this type of endeavor.  Alternatively, archives can address the digital divide by making their websites and digital collections user friendly and reliable, by making accommodations for the visually impaired, and by including intuitive search features that can help bridge linguistic gaps.

Reliability can be tested through a variety of measures.  Informally, an archive can rely on the opinions of others.  The critique of McKay Archive’s digital presence in Part II is representative of an informal analysis.  Archives can also use more formal analytical methods that utilize usage statistics and survival/failure rates.  Finally, perceived reliability may also be integrated into the approach by surveying users.  Regardless of what method is utilized, the general reliability of a digital archive cannot be measured by one factor alone.  If the archive’s digital collection is well supported, but the overall library system is not – reliability is damaged.  The whole system must be analyzed (Kostagiolas, 2011).

Formal analytical methods can employ either parametric analysis or non-parametrical analysis when testing the reliability of digital interfaces.   Examples of non-parametric systems include Kaplan-Meier or Cumulative-Hazard; these methods rely on probability estimations.  Kaplan-Meier’s equations focus on survival rates, while the Cumulative-Hazard equations focus on a system’s failure rate.  An example of a parametric method is Weibull.  Weibull relies on statistical methods and data.  Because it relies on statistics, it can also be applied to studies outside of user reliability (Kostagiolas, 2011).  Regardless of what method is used to determine reliability, the reliability of an archive’s digital collection must be assessed.

Addressing the challenge of visual impairments in a digital archive seems like insurmountable task.  How can a website help a patron see?   However, there are simple solutions that will increase usability to the visually impaired.  The McKay Archive employs the use of Content DM for their digital archive.  Content DM enables the user to enlarge images and zoom in/out of images.  This feature is intuitive and exceptionally useful for patrons with a far sited visual impairment.  For legally blind patrons, synthetic voice output software can assist.  The archive can also pair audio files with the visual files to enhance the patron’s ability to experience the digital objects (Tedd & Large, 2005).

For color blindness, the solution can be simple and inexpensive.  Google Chrome has an option within its settings to daltonize the browser’s content (Daltonize, n.d.).   When an image is daltonized, the colors are adjusted so that the reader can clearly differentiate between images.  There are three primary types of color blindness: Red/Green, Luminance, and Blue/Yellow (Vicheck, 2008 and WebMD, 2012).  Daltonization works by enhancing the colors the individual is able to see and/or increasing the contrast of the colors the individual cannot see (see Appendix B).  The type of daltonization process used depends on the type of color blindness the individual is dealing with.  The least expensive solution would be for the archive to advertise Google Chrome Daltonize feature on their website.  Archives could also add a daltonization feature to their websites that would allow users to enable the daltonization effect as needed.

Original Image                                                                                    Simulated Deuteranopia


Daltonized Image                                                                                 Simulated Deuteranopia of Daltonized Image


The linguistic problem is more difficult to resolve.  Sobel and Beal suggest that the linguistic problem is more common in subjective studies like the humanities than it is in the sciences because of how terms are used (2011).  Thus an archive, like the McKay Archive, which specializes in collections like religion and history, is going to have a more difficult time dealing with linguistic challenges.  One solution to the linguistic problem is making advanced search options available to patrons.  Advanced search options are simple tools that walk users through different methods for limiting and specifying their search parameters so the patron is not faced with 1,268 search results.  However, this does not solve the issue of misspellings and how words change over time (Garrett, 2006).

To resolve the issue of alternate spellings, some databases include options for the user to expand their search to include alternate spellings. There are two primary types of variant spelling search features.  In the Eighteenth Century Collections Online database, Gale Cengage Learning uses an alternate spelling search feature that uses algorithms to identify words similar to the original term (a probability based search).  The Early English Books Online database uses an alternate spelling search that will offer several variant spellings, but will also allow the user to manually include or exclude specific words as the user deems appropriate (Sobel & Beal, 2011).  Unfortunately, databases like these are expensive and their programs are proprietary.  These programs are also intended for use in libraries, not archives.  However, alternate spelling searches exist and should be discussed with the developers hired to manage/create an archive’s website.

Part IV – Conclusion

            Reliability, accommodating disabilities and preparing for linguistic changes are all challenges archives face and they must be addressed.  But these aren’t the only challenges archives face.  In as little as one to two years from now, there may be an entirely new set of challenges and better solutions to the old challenges.  The real challenge is in discovering what the next challenge is.

Purdue is currently marketing Data Curation Profiles as a means for librarians to assist researchers in organizing their data and helping researchers in creating data management plans (Purdue University Libraries, n.d.).  The profiles are basically extended reference interviews that detail the needs of a particular data set; but data management plans are not the only application for these profiles.  The University of South Florida Polytechnic is currently designing a digital asset management system and is using Purdue’s data profiles to interview their professors and discover what their professors might need in a digital asset management system.  Data profiles are useful because they guide the discussion though each facet of what a particular data set might need.  The McKay Archive could use Purdue’s data profile to interview their archivists; this interview would help the archive discover what different collections might need.  A modified version of Purdue’s data profile could be used to interview patrons to discover their evolving needs.

There is not one stable set of needs or challenges an archive faces.  Needs and challenges change and evolve over time and because of this, archives must constantly seek out the new challenges in order to address them.  Purdue’s data profile interviews are one method for seeking out new challenges, but archivists can also choose to follow the innovations and discoveries of other archives.  Regardless of how an archive chooses to address current and future challenges, archives must develop plans that accommodate current and future challenges.  It is not the patron’s responsibility to advise the archive about the archive’s challenges.  It is the archive’s responsibility to actively seek out and address these challenges.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: