From the desk of Shawn Schollmeyer
The Library of Congress has once again found a fun new way to promote our digital newspapers collections by launching a new crowd-sourced tagging tool to help us capture the key names & titles of the WWI cartoons in our Washington digital newspapers. From 2008-2014 the Washington State Library participated in the National Digital Newspaper Program, contributing over 306,000 pages of Washington newspapers to the Chronicling America website.
One of the biggest advances in making a digital collection full-text searchable is the Optical Character Recognition (OCR) software, which is great for text, but does a poor capture of handwritten and small text that is often used for cartoons & illustrations. The Beyond Words lab just launched this month by Library of Congress is a fun way to learn about the editorials and opinions of WWI by tagging these cartoons & illustration adding value as a primary resource for researchers, teachers and students. Give it a try and let us know what you think of it! We’ll be passing along feedback to the NDNP staff at Library of Congress. And tell us more about these Sammies!
Crowdsourcing: Beyond Words
One of the first features on labs.loc.gov is Beyond Words, a website that invites the public to identify cartoons and photographs in historic newspapers and provide captions that will turn images into searchable data. This fun crowdsourcing program grows the data set of text available for researchers who use visualization, text analysis and other digital humanities methodologies to discover new knowledge from Chronicling America—the Library’s large collection of historic American newspapers. Beyond Words is available as a pilot project to help the Library of Congress learn more about what subsets of Library data researchers are interested in and to grow the Library’s capacity for crowdsourcing.
“What I like about crowdsourcing is it gives people a chance to discover hidden gems in the collection. You never know what you’ll find poking through old newspapers,” said Tong Wang, the IT specialist who created Beyond Words during a three-month pilot innovator-in-residence program.
Beyond Words will also generate public domain image galleries for scholarship and creative play. As this data set grows, educators, researchers and artists will be able to group image collections by time frame, such as identifying all historic cartoons appearing in World War I-era newspapers.