WA Secretary of State Blogs

Newspaper Discussion: Preservation and Access Issues

Monday, April 22nd, 2013 Posted in Articles, Digital Collections, For Libraries, For the Public, State Library Collections, Technology and Resources | Comments Off on Newspaper Discussion: Preservation and Access Issues


From the desk of NDNP Coordinator, Shawn Schollmeyer:  In our NDNP Office located in the basement of Suzzallo Library at the University of Washington we share this insight into the world of newspaper digitization and preservation by guest writer Casey Lansinger. Casey participated as an intern in our program and will be graduating with an MLIS in June 2013.

iphonephoto_CaseyLansinger2

In July of 2012, I left my sunny and dry hometown of Denver, CO for wet and green Seattle. I  suddenly found myself in a world where drivers are uncomfortably polite, the coffee is understandably strong and where this Colorado girl had to buy her first raincoat and pair of galoshes (yet still manages to get dripping wet with or without them). In Seattle, I would finish up my third and final year at University of Washington’s iSchool, where I am pursuing a Masters in Library and Information Science. My life in Denver, however, was all about journalism and writing. Prior to the big move I had spent the last five years at The Denver Post as an editorial assistant and occasional freelance writer. The connection here is a life-long infatuation with the written word. I’ll admit I did what we were all advised not to do on a Library School application: I explained that part of my wanting to become a librarian is because I am in love with books. They accepted me anyway.

From an early age, I’ve digested everything I could get my hands on; books have introduced me to characters that felt like friends; countless hours have been spent with my nose stuck in anything from embarrassingly trashy tabloid magazines to fascinating social justice articles from Mother Jones; and, of course, newspapers have opened my mind to what really matters to me. I like to highlight favorite passages in books and later transfer those passages to a journal. Or, in an act that tells me I’m turning more and more into my mother, I rip out articles from magazines or newspapers and stow them away for future reference. A big part of the connection for me is the tactile experience of handling the medium in which the written word is upon. I love taking an old book off of a shelf and smelling its musty pages; and, although I hated when it got on my clothing, I secretly loved the charcoal stain newsprint left on my hands while working at the Post. All of these experiences led to my involvement with the National Digital Newspaper Program (NDNP) through the Washington State Library.

When I first heard about NDNP, I envisioned an experience in which I could marry my two career interests: journalism SeattleStar_CSarticleand library science. The obvious draw was the word ‘Newspaper’; the word I was hesitant about, however, was ‘Digital’. Don’t get me wrong, the practicality of digitizing content has not been lost on me, nor has the reasoning behind some news sources going completely digital for that matter; but this doesn’t mean I haven’t been without concern for my beloved “old-fashioned” mediums. However, as a budding librarian in an environment that is experiencing sweeping change, I knew that being a part of NDNP would be an invaluable learning experience for me. I knew there was an entire conversation about digitization that I was missing out on; and here was my chance to be a part of that conversation.

NDNP is a country-wide initiative to digitize historic newspapers between the years 1836 and 1922. The Library of Congress (LOC) and National Endowment for the Humanities (NEH) partnered together to make this project possible. Each state chooses one institution to apply for a grant to be a part of the program; after a grant is awarded, this institution can partner up with other institutions in the state to complete the digitization process. Each state is also responsible for selecting its own newspaper titles. In Washington’s case, Washington State Library and University of Washington have taken the reins. Additionally, such agencies as The Association for African American History and Preservation Research, Seattle Public Library, Washington State University History Department, Everett Public Library and Central Washington University have had representatives on the advisory committee for Washington State. Washington became involved with NDNP in 2008 and, as March, 2013, has contributed over 200,000 pages of historic newspapers to the Library of Congress digital repository that houses the newspaper pages: Chronicling America (chroniclingamerica.loc.gov). Currently, 22 states have contributed newspaper pages to the repository.  At the fingertips of the public (Chronicling America is an open-access repository – meaning free) is news, as it was unfolding, on the sinking of the Titanic, the Great Seattle Fire of 1889 or – a personal favorite – the first Carnegie Library. Or you can read about historic individuals such as Chief Seattle, Buffalo Bill or the Flapper girls. Stories come alive and context is created from these vessels of information.

And so, every Thursday and Friday morning you can find me in the basement of Suzzallo Library (on UW’s Seattle campus) where I perform a small, albeit important, part of the work-flow process in which newspaper pages are taken from microfilm all the way to what the end user sees on the Chronicling America website. I perform processing tasks on the newspaper pages, such as verifying page numbers (VPN) and optical character recognition (OCR) results. OCR consists of scanning the original newspaper page and converting the text to machine-encoded text, so that original pages can be archived as accurately as possible. The processing tasks must adhere to LOC standards and each state must follow very specific technical guidelines for processing pages. Not all of my work has been technical, however; a large part of my involvement with NDNP has been as an active participant in the access vs. preservation debate, a hot topic in the library field right now.

Do we preserve historic newspaper pages or do we digitize them? Who gets to decide what gets saved in its original form and what is discarded? Are people actually accessing original historic newspapers? These are just some of the questions I asked myself as I entered the preservation vs. access debate.  As I first approached the conversation, what I saw was a very black and white issue. I read essays from those that were strictly in favor of preservation, arguing that we have already lost so many valuable historic newspapers therefore making it our duty to preserve those that remain. But then, there is the argument that newspapers take up space and are becoming increasingly inconvenient and expensive to house, making access the most practical solution. One of the reasons this debate is so tricky is that at the heart of the matter is a medium that was never intended for preservation, or access for that matter, in the first place. Publishers in the late 19thand early 20th century certainly didn’t think that librarians in 2013 would be taking efforts to preserve their newspapers; this is evident right down to the medium itself: it tears easily, yellows over time and generally makes for difficulty in preservation.

PullmanHeraldDamage

One of the first questions it is important to pose when discussing this debate is why, with technology available to digitize historical documents, would we want to preserve historic newspapers in the first place? As expressed by my experiences with books, magazines and newspapers, I think there is a certain intrinsic value that can only come from interacting with an original document. An article I read on the subject described it like this: the extrinsic value of a historic document, such as the Declaration of Independence, exists in the information recorded on it; the intrinsic value, however, is the original format independent of the information recorded on it.  Imagine if the Declaration of Independence were somehow damaged or destroyed. The impact would be profound and Americans might feel some sort of personal loss with such destruction. Sure, what is recorded on the Declaration of Independence would never be lost –as it can be found in any history book or through a quick Google search – but the value of the original would be gone forever. I believe the same case can be made for historic newspapers; imagine holding the original paper that contained headlines about the sinking of the Titanic. You could run your fingers over the headline and turn the pages in the very spot where someone in 1912 turned the pages. You can see the pictures and details on the page and could be transported to that day in April of 1912. Does a computer screen provide that?

Having worked in print journalism, I witnessed many news sources switching to an online only format; the reality being that it is possible (though it pains me to say) that future generations will grow up in a world where they’ll have no exposure to printed newspapers. These generations need to know about the advent of the printed newspaper and how this medium swept the nation and created context for the way news is reported today. Shouldn’t we preserve historic newspapers for those generations?

Conversely, while those who are pro-access certainly see the value in historic newspapers, they also see the logistical challenges that preserving newspapers creates: whose responsibility is it to decide what gets saved in original form and who pays the rising costs of doing so? Furthermore, as mentioned above, newspapers pose storage challenges for libraries that, more often than not, have budget and space issues to consider.

WenatcheeDW_08281907_DavisTrial

I had the opportunity to talk to Kate Leonard, Conservation Supervisor in the Special Collections department at UW Libraries, about this conversation and she brought up a few points that allowed me to look at the debate from a different angle. Kate and I agree on the tactile experience and how it is such a profound part of interacting with a medium, however, she also pointed out this notion of finding historic documents through access that one would otherwise never find. Because some historic newspapers are rare and housed in research libraries across the country, I might not feasibly access an old copy of The Seattle Times in print were it not for digitization. By providing access, we expose individuals to information they may otherwise not have found or may have never even known was out there in the first place.  This aspect of the debate has personally affected me; as I perform my work with NDNP, making OCR corrections here and there on old issues of the 1908 edition of The Seattle Times, I’ve happened across articles about my new surroundings that have provided me with a rich layout of Washington State’s colorful history. I now know about Washington’s road to Statehood in 1889 or the Walla Walla Massacre of 1847 that later led to the Cayuse War between the Cayuse people and local Euro-American settlers. In fact, just the other day my colleague and I were saying that some articles we happen across make us feel like we aren’t so different than the men and women of the early 1900’s. There was an article about Seattle’s terrible traffic, written in The Seattle Time’s 1908 paper, and the last time I checked the traffic in Seattle was still terrible and a topic of constant conversation among residents. Or there are the same sensationalist stories that the media decides is newsworthy enough to devote their attention to over other – often similar – stories; such as the Davis barroom murder trial of 1907, covered extensively in the Wenatchee Daily World.

ReformersDawn_Nov1893Kate also brought to my attention an issue that came up recently in which The Reformer Dawn – the earliest known publication of what eventually became the Ellensburg Dawn, running from November 1893 to January 1894 – posed serious digitization issues. The paper is the size of a pamphlet and has been bound and stitched at the binding to prevent further damage to its already fragile pages and spine. The desire to digitize this paper proved to be dicey, as it would have required unstitching the binding to scan the pages. Thankfully those measures were not taken and Kate and her Special Collections team were able to take digital photos of the paper, which were later uploaded as TIFF files and added to the Chronicling America repository. The Reformer Dawn will also remain as a part of WSL’s permanent digital collection. Because The Reformer Dawn is in danger of being housed in “dark archives” (a dungeon-like place where historic documents go to spend the end of their lives) this is yet another example of access providing individuals a chance to interact with documents they may otherwise never have had the opportunity to do so with.

Given the evidence of both preservation and access providing rich educational experiences for all users, I began to wonder why some present the debate as so black and white. The way I see it, there is so much gray area; a gray area in which we can provide both preservation and access. Some librarians and archivists suggest a model in which responsibility for both original and surrogate documents is distributed among institutions. And isn’t this the very purpose of a library in the first place: to preserve documents that provide the public with lasting value so that future generations can access them, be it in its original or surrogate form?

All of this leads to an increasingly important question: if we know now how much we drastically want to save historic newspapers of the past, what steps are we taking to preserve digital information of the present? After all, building and maintaining a digital repository is a completely different ballgame than preserving old newspaper pages. Each medium has its own benefits and downfalls as it pertains to preservation techniques but, as opposed to newspaper print, building a digital repository is an area of preservation that archivists are still exploring and fine-tuning best practices. Similarly, a digital repository is much different to maintain because digital objects will always need a software environment to render it; newspapers, however, provide unmediated access to content. Important to consider is the way computer systems age much faster than data media; something new is always in the works and we are constantly upgrading.

Today, archivists are implementing a slew of preservation techniques for digital content. In the case of Washington’s involvement with NDNP, we are involved in a work-flow process that takes microfilm to transferable TIFF files and on through a series of processing tasks and quality control checks before we finally send the files, along with the microfilm, to Library of Congress. LOC then uploads these files and now users can access the newspaper pages on Chronicling America. During the processing and quality control checks, we are performing tasks such as text correction, cropping and de-skewing pages and other various measures that will enable the end user to more accurately access pages and read articles. Furthermore, Washington State Library will maintain all of the files we create in their digital collection; making Washington State residents aware of this expanding digital collection is yet another step the library is taking towards providing access.

While I’d certainly never call myself a Luddite, it was a rather big leap to immerse myself in the digitization world. When I approached the project, I wondered if digitizing documents would make originals, at least over time, obsolete; as it turns out, librarians don’t want that at all. They simply want to make access just as important as preservation; they want to provide entry to the all-important grey area: an area where users find both preservation and access. And though I’ll take sipping coffee and dropping muffin crumbs over a daily print newspaper, the efforts LOC and NEH are taking to make historic newspapers available is nothing short of amazing. It is our duty as information professionals to provide access to documents that are rich in value and history, such as newspapers. Just as we take effort today to save papers from the past, so too are we taking efforts to preserve the news we see today on our computer screen, for tomorrow.

Breaking News! New titles for Washington NDNP!

Thursday, March 21st, 2013 Posted in Articles, Digital Collections, For Libraries, For the Public, State Library Collections | Comments Off on Breaking News! New titles for Washington NDNP!


From the desk of Shawn Schollmeyer, NDNP Washington Coordinator

This week the Library of Congress uploaded the next set of our long awaited newspaper titles for the National Digital Newspaper Program. Historic Washington state newspapers can now be searched and viewed on the Chronicling America website.  The added benefit, besides being able to search early newspapers from Washington Territory and early statehood, is each title also includes publication information and a short essay about the paper’s history. Take a scroll through this example from the Aberdeen Herald

aberbeen masthead

Among the titles added this month:

Aberdeen Herald, W.T., 1890-1917                        Adams County News, Ritzville, 1898-1906,

Columbia Courier, Kennewick, 1902-1905                   Kennewick Courier, 1905-1914

Evening Statesman, Walla Walla, 1903-1910               Lynden Tribune, 1908-1922

Newport Miner, 1899-1922                                                Vancouver Independent, 1875-1910

Washington State Journal, Ritzville, 1906-1907        Wenatchee Daily World, 1905-1922

Seattle Star, 1899-02-27 1922-12-30

We are on the third and last grant cycle of this project, sponsored by the National Endowment for the Humanities and Library of Congress.  Approximately two thirds of the states across the country are now participating, contributing over

newspaper trio6 million pages of newspaper content to date. In the west Oregon and California are current participants and over the next few years we should be seeing the contributions of our neighbors, Alaska & Idaho.

Over the next two years we’ll be adding:

Seattle Post-Intelligencer, 1876-1900     

Seattle Star, 1918-1922

Morning Olympian, 1876-1922

These newspapers, all in the public domain (pre-1922), are free for public use. Educators, historians, genealogists, students and other members of the public are welcome to use these images for their primary research, history presentations, and educational tools. We encourage you to share the great history of Washington and learn about the development of civics and industry across the great Pacific Northwest.

To learn more about the NDNP program, popular topics, valuable teaching resources (check out NEH’s EDSITEment! page), podcasts and videos, start with a look at the http://www.loc.gov/ndnp website and click on “NDNP Extras.”

Over 5.2 million pages strong… and counting

Tuesday, November 6th, 2012 Posted in Articles, Digital Collections, For Libraries, For the Public, News, State Library Collections, Technology and Resources | Comments Off on Over 5.2 million pages strong… and counting


The Torch Bearer at the Library of Congress
Interior of the Library of Congress

From the futuristic desk of Shawn Schollmeyer.

With 100,000 pages contributed each two year grant cycle from over 30 states and reaching for participation by all 50 states, the National Digital Newspaper Program (NDNP) is the biggest digital newspaper project in U.S. history and sponsored by the National Endowment for the Humanities (NEH) and the Library of Congress (LC). Each of those 5.2 million pages need related lines of code and metadata along with the page images.  Title, city, date, as well as Optical Character Recognition (OCR) files that turn an image into machine-readable text, allow users to search newspaper content on the Chronicling America website.

That’s a lot of files! Who manages all these files? Less than a dozen people at Library of Congress support the websites & wikis, upload files, and help project managers learn the NDNP digitization process. Here in Washington State, we rely on this handful of people to guide us on best practices for digitization and image standards for our participation in the program.  In September, all the participating states gathered to meet our sponsors, advisors, and fellow awardees to discuss the great ways people are using the content from this project.  At the end of the three day conference, our heads are filled with practical knowledge of processes, resources, and exciting new ideas. While I was there I had the rare opportunity to meet the magicians behind the curtain…

Our main contact for the National Digital Newspaper Program in Washington, DC is Chris Ehrman. Nearly a librarian by birth (his parents are both librarians), Chris began his newspaper experience in the University of Utah Ski Archives , uploading photos and video of America’ favorite winter sport before moving on to the NDNP program in Montana. There he honed his technical expertise learning the selection and upload process for Montana’s newspaper collection, becoming a great candidate for the Library of Congress’ Digital Conversion Specialist position. Chris is our “go-to” man when we have questions about how to resolve the challenges of working with so many files and metadata. If the data checks out OK, Chris prepares the scripts to load files for the automatic ingestion process so the newspaper images will appear in the Chronicling America database. He also supports the LC’s NDNP website.

There are four Digital Conversion Specialists who evaluate and help load our submitted batches of files to the website. Missing pages, cataloging conflicts, or date misprints are among the situations that may flag a batch for further review.  These four take turns validating batches from all awardees for final approval in addition to their specialized tasks, which include validation tool support and digitizing from LC’s own historic newspaper collection.  Chris estimates that they see 150,000-180,000 pages per month, translating to about six terabytes. One of their biggest challenges is to keep the workflow moving and avoid bottlenecks in the system.

Robin Butterhof is another LC specialist. Friendly & energetic, Robin supports the NDNP wiki page that contains the technical specifications, trainings, tools, deliverables, and state by state project information. She is a woman of many talents, having held several different library jobs, including book publishing, reference librarian, non-profit work and consulting, all while attending classes as a library student. Excellent training for the many tasks she juggles daily at LC.

Chris, Shawn & Robin with “batch_wa_lacamas”
Pulling all the teams, awardees, conversion specialists, NEH contacts, and LC resources together is the NDNP Coordinator, Deborah Thomas. Deb has a long history of working with digital collections in our national library, most notably, the American Memory project, a multimedia collection of American history and culture with over nine million items. In my short interview with the team, she really helped put the national project into context for me. One of the most significant challenges is managing “a sustainable collection of significant scale produced by many organizations” which includes careful planning for maintaining access and managing the data and processes long term. She reminds us that “Digital objects are not just pictures. For newspapers, they are pictures of pages and machine-readable text from those pages and metadata that describes the pages and the relationships between pages.” In order to help people find what they’re looking for we need to figure out “how to make the cream rise to the top.” These millions of pages of newspapers would be pretty overwhelming to wade through without text search capabilities at the page level. Creating standards for metadata and text recognition software (OCR) is only a piece of making these pages accessible. Each state has their own workflow; software vendors; page or article level OCR; file storage systems; and even multiple languages that need to be filtered and standardized.

When I asked the team about what they enjoy most about their work Robin admitted she loves how “something wacky pops up every day” referring to the many series of cartoons, entertaining articles and sometimes sensational headlines. Chris agreed and mentioned his favorites are the illustrations of the future, which led to discussion of Deb’s favorite article from the December 20, 1908, New-York Tribune, “Public Library of the Future.”

Unlike the library vision in the article, we may not be sending facsimiles of our newspapers and important manuscripts through pneumatic tubes to our Congressional Library, but we will be sending a dozen or so hard drives with thousands of files of newspaper pages to real people, the people I met in the James Madison Building. These are the people who will be helping us create the new digital libraries of a very real future where we can still have “a library in every hotel, train, trolley car and steamship!”