Digitizing Newspapers: Part I – Source material

March 10, 2009 Secretary of State's Office

We began a post about changes we’d recently implemented in our post-processing of digital images of newspaper pages. Of course, we found this hard to talk about without delving first into the process of digitizing newspapers. So we’ve decided to cover the topic more thoroughly through a series of posts.

These posts are not meant to be a “Steps A-Z” type of tutorial but rather a discussion of things we consider when scanning and processing newspaper pages. Please feel free to add to the discussion, ask questions, or leave comments.

Part I – Considering your source material

Some things to consider:

1. Newspapers are a difficult to organize. Newspapers have long been the historic record of a locale or group of people so there are often a lot of them (in the sense of a sheer quantity of actual pages). Also, newspapers constantly evolve; they change owners, editors, names, publication dates, publication frequency, etc. Collating historic newspapers and the information about them can be as much (or more) work than capturing the image of the pages themselves.

2. Many newspapers are old (the ones we can legally digitize anyway). An obvious but very important consideration. Newspapers were documents that were not meant to last centuries. Fading, tearing, foxing (i.e. stains) and ink bleedthrough are some of the many problems encountered when dealing with the pages themselves. We’ll talk more about how we try to combat problems posed by the quality of the originals in a later post.

3. Most historic newspapers aren’t paper anymore. Unfortunately (or maybe fortunately depending on your persepctive), when scanning newspapers, we rarely deal with the original pages. Nearly all of the NDNP material and most of our pioneer newspaper collection is scanned from microfilm (i.e. one type of microform) – and often old microfilm. Early microfilm quality (film created before microfilm standards and guidelines) varies wildly as the original photograph of a newspaper page depended as much upon the quality of source material as the equipment used, and the skill of the photographer.

Another consideration is that the film we scan is often not even the original film but rather duplicate negatives (i.e. duplicate masters). So we run the risk of working with a bad reproduction from a perfectly fine master reel. Or, the film itself, like those old negatives you store in a cardboard box in the basement, can be damaged or scratched. Needless to say, any problems with the original paper materials only compound each generation away from the original and each time the image is reformatted (i.e. migrated from one medium or format to another).

And we haven’t even started scanning yet – another form of reformatting – which tends to even further magnify any of the problems mentioned above. Scanning transparent images has its own challenges. Film scanners require a more sensitive sensor – one that can capture the tonal values of a very small transparent image. Often, because film is so small and yields so much detail, scanners have to operate at their maximum levels – resulting in artifacts and noise.

4. Newspapers are large (larger than your average document). Another obvious statement, but important when you consider the relational size of the page to the size of the smallest details and text. The problem becomes this: the larger the surface area of the image, the further away the camera head must be from the original, the larger the reduction ratio (the ratio of the film image in relation to the original), the smaller the text, the harder it is to re-capture the detail during scanning.

And did I mention that we haven’t even started scanning yet?!? We’ll talk about the scanning process in our next post in this series.

From Our Corner

Washington Secretary of State Blog

Digitizing Newspapers: Part I – Source material

March 10, 2009 Secretary of State's Office

Part I – Considering your source material