Here’s what we know:
- Researchers love newspapers.
- Libraries and archives love newspapers.
- North Carolina has produced a lot of newspapers.
- No, really. There. are. a. lot.
Well, we do know a little bit more than that, but those are the Cliff’s Notes of our newspaper story. Because we work with so many papers, we try and stay on top of what’s happening with newspaper digitization in the state and around the country. We thought we’d write a few blog posts to share some of what we’ve seen and are seeing in that area, and to help get the word out that there’s a lot happening in this space in North Carolina.
So, why is digitizing and sharing newspapers online so tough?
There are a lot of them. We’re saying it once more simply because it is the most costly factor in digitization and preservation. Let’s take, for example, a weekly newspaper published from 1870-1920. That’s over 2,500 issues. Say each issue is 8 pages long. Now we’re up to 20,000+ pages. And let’s say there’s one of those types of papers in every county. We’re already at 2 million pages for the state, for only 50 years. This is hugely conservative, considering many counties had more than one paper. And we didn’t even talk about papers published by schools, companies, or ambitious individuals. Or about dailies…
By our estimation, digitization of just the microfilmed newspapers located in the North Carolina Collection at UNC-Chapel Hill would result in over 40 million pages, which means 40 million digitized images. That could be upwards of 180 TB of data. For JUST storage (not including serving this up to the web, maintenance, staff) you’d pay a paltry $6,000 per month*.
We kid you not.
Beside quantity, the remaining challenges look petite. Broadside newspaper pages need a larger scanner than most institutions can afford, especially if the papers are bound. Tabloid sized pages won’t fit on typical flatbed scanners either, and we rarely recommend flatbeds for something like this because they’re just too slow.
Although uniform, which is a plus, historic newspapers can be fragile, friable, and fiddly. The more carefully you have to handle material when you digitize, the more time you’re going to need.
Having images of newspapers is really helpful. It’s portable, physically compact, and easier to copy. But the true advantage of a digital version is when it’s full-text searchable. Full-text searchability across large quantities of files requires indexing and search software, and enough IT infrastructure to make that happen.
While most newspapers published before 1923 can be safely shared online, those published in the years since can have attendant rights issues (pun intended). The massive changes in newspaper ownership over the last 20 years can make institutions wary about publishing a paper from 1924 or 1994.
Hopefully it’s clearer now why more historic newspapers aren’t yet freely available online. Albeit daunting, the challenges mentioned above are all surmountable with enough resources (money and expertise) and time. In our next blog post we’ll highlight where you can find historic North Carolina newspapers online right this very minute.
* We’re quoting Amazon S3 storage here, but YMMV.