WA Secretary of State Blogs

Digitizing Newspapers: Part I – Source material

We began a post about changes we’d recently implemented in our post-processing of digital images of newspaper pages. Of course, we found this hard to talk about without delving first into the process of digitizing newspapers. So we’ve decided to cover the topic more thoroughly through a series of posts.

These posts are not meant to be a “Steps A-Z” type of tutorial but rather a discussion of things we consider when scanning and processing newspaper pages. Please feel free to add to the discussion, ask questions, or leave comments.

Part I – Considering your source material

Some things to consider:

1. Newspapers are a difficult to organize. Newspapers have long been the historic record of a locale or group of people so there are often a lot of them (in the sense of a sheer quantity of actual pages). Also, newspapers constantly evolve; they change owners, editors, names, publication dates, publication frequency, etc. Collating historic newspapers and the information about them can be as much (or more) work than capturing the image of the pages themselves.

2. Many newspapers are old (the ones we can legally digitize anyway). An obvious but very important consideration. Newspapers were documents that were not meant to last centuries.  Fading, tearing, foxing (i.e. stains) and ink bleedthrough are some of the many problems encountered when dealing with the pages themselves. We’ll talk more about how we try to combat problems posed by the quality of the originals in a later post.

3. Most historic newspapers aren’t paper anymore. Unfortunately (or maybe fortunately depending on your persepctive), when scanning newspapers, we rarely deal with the original pages.  Nearly all of the NDNP material and most of our pioneer newspaper collection is scanned from microfilm (i.e. one type of microform) – and often old microfilm. Early microfilm quality (film created before microfilm standards and guidelines) varies wildly as the original photograph of a newspaper page depended as much upon the quality of source material as the equipment used, and the skill of the photographer.

Another consideration is that the film we scan is often not even the original film but rather duplicate negatives (i.e. duplicate masters). So we run the risk of working with a bad reproduction from a perfectly fine master reel.  Or, the film itself, like those old negatives you store in a cardboard box in the basement, can be damaged or scratched. Needless to say, any problems with the original paper materials only compound each generation away from the original and each time the image is reformatted (i.e. migrated from one medium or format to another).

And we haven’t even started scanning yet – another form of reformatting – which tends to even further magnify any of the problems mentioned above. Scanning transparent images has its own challenges. Film scanners require a more sensitive sensor – one that can capture the tonal values of a very small  transparent image. Often, because film is so small and yields so much detail, scanners have to operate at their maximum levels – resulting in artifacts and noise.

4. Newspapers are large (larger than your average document). Another obvious statement, but important when you consider the relational size of the page to the size of the smallest details and text. The problem becomes this: the larger the surface area of the image, the further away the camera head must be from the original, the larger the reduction ratio (the ratio of the film image in relation to the original), the smaller the text, the harder it is to re-capture the detail during scanning.

And did I mention that we haven’t even started scanning yet?!? We’ll talk about the scanning process in our next post in this series.



Tags: , ,

You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

AddThis Social Bookmark Button

3 Responses to “Digitizing Newspapers: Part I – Source material”

  1. Never thought its that difficult to produce digital images of the newspaper pages. What I have in mind is put the paper on a scanner and scan. Now I know that its not as easy as I think.

  2. lrobinson Says:

    Yes, exactly. Thanks for your comment.

    That is a common sentiment (and part of why it is hard to explain what is tough about newspaper). Just throw the pages on a copy stand and take a picture. Right?

    This is where point #3 is important. Historic newspaper pages are rarely paper. Most significant runs of newspaper images are on microfilm – and much of it early, poorly filmed, microfilm of old, poor-quality newspaper pages. A good film scanner that can handle reels of film images quickly and get good enough resoultion to render decent OCR results – those machines are expensive.

    But, to be honest, the scanning of microfilm is not the hardest part (not by a long shot). It’s all the work before and after scanning (film testing, image processing, image collation). Not to mention the cataloging and description of one of the most unweildy types of serials (because, what’s the point of scanning the images if they can’t be browsed, searched and accessed?) and the sheer volume of pages, issues, and titles make newspaper a complicated (and sometimes under appreciated 😉 ) scanning project.

  3. This is a fairly sad article! Everyone gets so excited about capturing old news(papers) and other historical memorabilia, but who knew it was so difficult? Thank you for all your hard work! People will appreciate the newspapers being digitized, even if they don’t realize how it happened!