Digital North Carolina Blog

Digital North Carolina Blog

This blog is maintained by the staff of the North Carolina Digital Heritage Center and features highlights from the collections at DigitalNC, an online library of primary sources from institutions across North Carolina.

RSS Subscribe By Mail UNC Social Media Statement


Viewing entries tagged "howto"


Six Steps To Consider Before Scanning Vertical Files

Long, open filing cabinet drawer filled with red and manilla filesVertical files are groups of subject-based materials often compiled over time to help an organization’s staff with frequent reference questions or research.  Like the example above at Shepard-Pruden Library in Edenton, NC, they’re typically housed in filing cabinets. They are a good place to store items that wouldn’t necessarily be cataloged or accessioned (individually and formally documented by the institution) but are valuable for research. Inside you might find photographs, clippings, family trees, pamphlets, handwritten notes – but because the contents accumulate over time you can find any number of surprises inside.

Vertical files are also the worst – for digitization that is. The same thing that makes them valuable for research – their convenience, their long term growth, and the variety of contents – makes them incredibly challenging to scan. If you’re interested in digitizing vertical files, we have suggestions! These have been compiled from our own experience at NCDHC along with the experiences of a number of our partners who kindly responded to a recent email asking for advice.

When facing full filing cabinets you may be tempted to dive in right at the beginning and get going, but we always suggest starting with a pilot project using a subset of materials. We can’t emphasize this enough! It’ll give you a sense of workflow, help you establish how you’re going to name and organize the scanned files, and uncover obstacles you didn’t anticipate. If it goes poorly, you can back out without losing a large investment. The suggestions below can be used for a pilot and for a full-fledged project.

Suggestion 1: Prep First, Thank Yourself Later

photocopies and papers from a manila folder spread on a wooden table

Here’s an example of a vertical file with newspaper clippings, letters, and publications about World War II. From the collection at Shepard-Pruden Library in Edenton.

Scan it all or be selective? Decide if you want to go from beginning to end or to be selective about what you’ll scan. There’s no right answer but each way has ups and downs. This decision will be subject to your users’ needs and your local resources.

Scan it all? Is there enough high value and unique content in your vertical files to warrant scanning everything? For example, some newspaper titles have been digitized in their entirety and are full text searchable (like those available at DigitalNC or Chronicling America) so you might decide that scanning clippings from those same papers is superfluous. As another example, many books published before 1924 in the United States are available online at the Internet Archive or through a simple search in your internet browser. If your files have a lot of book excerpts you may want to skip those.

Be selective? Being selective can be more time consuming and you may unintentionally miss items that would be of use, but it can be appropriate if you’re trying to simply scan items related to one or two topics or for a particular event. It is also a great option if you don’t have a lot of time or resources but want to help give access to high demand files.

First pass for organization. Go through the files in a first pass, during which you’ll assess the files’ contents, sort them in a way that will make scanning easier, and prepare the different formats for scanning. Here are some tasks to complete as you make a first pass:

  • Pull out items to be cataloged separately. As your organization’s collection strategies have changed over time you may find materials within the files that should be pulled out and cataloged or accessioned independently. For example, perhaps you are a library where pamphlets were previously stuck in vertical files but are now cataloged and put on the shelf. These can be pulled out and dealt with separately.
  • Weed. Take this opportunity to weed out duplicates and look for misfiled items.
  • Group items with copyright or privacy concerns. I talk about this in more detail below, but if you plan to put these files online, you may want to skip scanning items that would be too risky or unethical to share online. You could put them towards the back of each file with a divider that indicates they should be either reviewed in more detail or skipped while scanning.
  • fingers holding microspatula and prying up staple legs

    Ashlie, an NCDHC staff member, uses a microspatula to pry up staple legs. License: CC0

    Remove staples and fasteners. The caveat here is that you should only do this with materials you’re sure you’re ready to scan. One partner mentioned that staff had removed staples and paperclips from a large quantity of files, but then the project got stalled. Because the vertical files were still in use, this led to papers being misfiled, shuffled out of order, and lost.

  • Organize by type then by date. Within each file, organize individual items first by format of item, then by date. Group all of the photographs. Put like sized publications together. Group all of the single page items, all of the clippings. Putting like sized types together will speed up the workflow, saving time when scanning by streamlining your efforts at cropping. Once you’ve grouped things by type, within each group put things in date order (if dates are available). This will help you when you make the scans available.

Suggestion 2: Prepare for the Digital Files

Unless you don’t have that many vertical files (or you have a LOT of time and help) think of each vertical file as a single unit. Here’s what I mean by this. If you have a vertical file about a popular local landmark called the “Turtle Log,” and it includes a few photos, some clippings, and a handwritten narrative, all of those scans would be kept together in a virtual group, folder, or album just as they are in real life. When you describe that group/folder/album either in an internal or online database, you’d describe the unit as a whole, rather than describing each individual photo, clipping, or narrative. This will save a ton of time.

yellow square with file folder and file names starting worldwarii01With this in mind, you’ll want to think of a file naming scheme that will keep all of these digital groups organized. Thankfully, file systems mimic files in real life, with the use of folders. Make sure you have a consistent naming convention for files and folders that ensures everything sorts appropriately. On the right is a quick example of how you might decide to name your files. This example is very basic – you could choose to give more detail, include known dates. But note the numbers (01, 02, etc.) included that will make the files sort in order.

Suggestion 3: Determine How You’ll Work with Additions

If you intend to keep these vertical files active after scanning, you’ll need to figure out how to denote what’s been scanned and what you’ll do with new additions to the files. A light pencil mark or some other non-permanent note on the back of all scanned items can signal what’s been scanned. Decide if you have the time and staff to scan new additions before filing new donations or if you plan to do that wholesale at a later date. It might also be helpful to have a marker of some sort that you can insert into the file cabinets that lets researchers and staff know about files that have been removed for scanning and whether or not they can still request them.

Suggestion 4: You Should Scan these In House. Or You Shouldn’t.

I wish I could give a single way forward here, but like so many things the answer to whether or not to outsource scanning depends on your situation. Here are a few considerations for the two routes.

Scanning in house. This gives you a lot of flexibility. You can work on the project over time. For active files, they’ll be close at hand if needed. Your staff will gain experience scanning, if they don’t already have it.

Unless you can afford an overhead scanner or camera mount setup, scanning vertical files on a flatbed or other multifunction machine will make a very long process a lot longer. Sheetfed scanners can speed things up a little but only for extremely uniform, non-unique materials that are in good shape. Because the project is large, if you don’t have dedicated scanning staff (or even if you do) be prepared for the contents of multiple file cabinets to take years to scan. You may also need to hire new staff or reskill current staff to do this work, trading this for other duties they currently complete.

Outsourcing scanning. Outsourcing can mean a quicker outcome because the organization doing the scanning will have dedicated workflows and equipment for high volume output. If you don’t have digitization expertise on staff, their expertise can be helpful for avoiding pitfalls.

Unless you are working with an organization that typically scans special collections, the variety of formats can be a challenge and frequently increase the cost. Companies that specialize in corporate files will claim attractively inexpensive prices for scanning but they are frequently used to working with homogenous typing or copy paper. Be sure to interrogate them regarding their expertise, showing them examples and even asking for a quote after they scan a subset. Make sure they offer digital files of a quality and in file formats that you can use into the future.

Suggestion 5: Decide About Your Access Priorities and the Rest will Follow

As we’re fond of saying, digitization is the easy part. Even in a project of this size and complexity, the scanning and preparation of the digital files is more straightforward then what comes next. Here are confounding factors to take into account when you consider how you’ll provide access to the digital vertical files.

image of a page from a furniture guide with the term LaBarge highlightedFull Text Search

Full text search greatly increases the usefulness of digital vertical files. It’s one of the most cited reasons for scanning them in the first place. To be able to search full text within a scanned document, you’ll need to run that digital file through software that recognizes the text and then either embeds it within the file or stores it separately. (Note that this will only happen with typewritten text – accurate automated recognition of handwriting isn’t widely available at this point.) Here are two different options:

  • Some institutions choose the PDF format for their vertical files. Software like Adobe Acrobat (not to be confused with the freely available Adobe Reader) will recognize text within a PDF. However PDFs are made for easy transmission and sharing, not for longevity and quality. We recommend that you scan initially to a higher quality or lossless format, and then, if it fits your goals and resources, create derivatives like PDFs. The upside of PDFs is that many desktop and laptop computers can natively search across PDFs. This means you could have them searchable locally, say on a reading room computer, and not necessarily have to provide internet access.
  • Alternatively, you can use a system that can store both the text recognized in an image and the image itself and then link them together. Some library or museum catalogs will do this, or you’ll need a content management system. This means additional ongoing costs and the need for technological infrastructure and expertise. But with these types of systems you can provide full text search of your files on the internet.

Copyright and Privacy

Copyright is one of the biggest confounding factors related to making vertical files accessible. Depending on how old your files are, it’s likely that there are materials in there that will be in copyright. If you want to post copyrighted materials on the internet, your organization will need to assess the amount of risk you’re willing to accept. Some items are riskier than others. Regardless, whoever is working on your vertical files will need training and the authority to determine what can and should go online and what should not. Here are a few resources to help get you started:

In addition to copyright, you should always consider privacy concerns. For some of the non-published items in your vertical files, the donations or additions were made with the expectation of local use by a single person or small group. Family history documents that discuss recent events are an example of the type of item you may judge to be too personal for broad consumption without the permission of the creator. There may be documents that share information about communities that would prefer they not be shared broadly. These are all good things to assess as you do your first pass.

Suggestion 6: Find Examples and Friends

Here are some examples of digitized vertical file collections online. These are large projects with a goodly number of staff and funding involved, so take that into account as you look. Note that the files are put online whole rather than breaking out individual items.

This first example comes from the Digital Collections of the University Libraries at UNC-Greensboro and showcases their “class folders.” UNC-G has done quite a bit with vertical files of various types, but this is a great example of folders that have a variety of items grouped by subject. These items are in a system called CONTENTdm, which is specifically designed to host special collections.

manilla bifold invitation to the 1898 commencement at the State Normal and Industrial College of NC

This invitation is one of a number of items in the vertical file entitled “Class of 1898.” You can see the item title at the top and a list of the different items inside on the right.

We’ve also done some vertical files at NCDHC, and you can take a look at an example here. This is from a large collection of vertical files shared by the Kinston Lenoir-County Public Library. Our system is called TIND.

screenshot in TIND of a newspaper clipping and a manuscript page, with thumbnails of other items on the left

This screenshot shows a large view of a newspaper clipping alongside a typewritten manuscript from the Sybil Hyatt Papers. To the left are thumbnails showing other items in this particular file.

Keep in mind that both of these systems are made for hosting large numbers of special collections items and, like a library or museum catalog, cost money and staff to maintain. While it’s outside of the scope of this article, you can take a look at another post we did regarding how to share your digital files.

For any digitization project, we heartily recommend trying to find friends and peers at area or regional organizations. Ask if they have vertical files or digitization projects (or both?!). A quick phone call or email can help you avoid duplication of effort, at the least, and may gain you advice or a collaboration. You can even choose to share staff or other resources, or collaboratively apply for funding. We also like to be friends! If you’ve made it this far and still want to digitize, but you have questions or would like additional advice, feel free to get in touch.


What Should You Do With Your Scanned Photos? What We Suggest for Libraries, Archives, and Museums

We frequently get asked by institutions “what should I do with my scanned photos/documents?” This is a great question but not an easy one – digitization/scanning is the easy part.

What these institutions are often asking is how they should keep track of the files they created during scanning (scans) and the information about what they scanned (metadata). In addition to tracking, they’d like to know what their options are for sharing the scans and metadata with an online audience.

When you see websites like ours with extensive collections of scans paired with metadata (like in the screenshot below), there’s usually a piece of software behind it that keeps track of the scans and the metadata and then matches them up for online display. That’s what a content management systems (CMS) does, if you’ve heard that term before. The benefit of using a CMS is that it makes sure the scans and their metadata remain paired over time, and often allows users to do fun things like search, sort, and filter.

Color photograph of a woman in a WWI uniform.

Screenshot of an item on DigitalNC, as presented by a content management system called TIND.

There are different types of CMSs for different types of industries. This post focuses on options for cultural heritage institutions, because CMSs made for cultural heritage institutions generally address the things we care about most. They make sure metadata is shareable, that scans can be described really well, and that you can express one-to-many relationships (think: many scans linked up to a single metadata record).

If your institution is considering implementing a CMS, here are the very first steps we suggest considering.

First, Plan 

  • Decide on your goals. Do you want your scans to be available online? Or are you just looking for software that will manage your scans and metadata locally? Who will use the end product – your staff, your patrons/users, or both? Your answer will help guide where you go next.
  • Do some prep work. Like any other service your institution wants to maintain, figure out (1) how much money you have to spend both now and on an ongoing basis, (2) who will need to be involved in installation and support, and (3) what staff expertise you already have related to technology.
  • Talk to your administration and coworkers. What are their goals and needs for scanning and sharing those scans, if any? It’s a lot harder to implement a system if you don’t have the buy-in of others where you work. 
  • Be realistic. Start small and build up your capacity. We’ve never heard of someone saying “our first scanned collection was too small,” but we have heard a lot of people say “I bit off way more than we could chew.”

Options for Keeping Track of Scans Locally

If you just need to keep track of scans and metadata locally for staff use, you can do this easily with a spreadsheet and a really consistent file naming structure. The spreadsheet could include things like a title or description, maybe a physical location, any other helpful keywords or dates, and the file or folder names for the scans. Staff can search the spreadsheet for what they need, and then find the file or folder name so they can pull up the scans from storage.

If you’d like something more sophisticated for keeping track of scans and metadata locally for staff use, there are programs that allow you to tag and describe scans that live locally. If you search for “photo management software” or “photo organizing software” online you’ll discover a number of options. We’re not terribly familiar with these; just be sure that you can export whatever you put into the software before committing.

Options for Putting Scans Online

If you decide you’d like to put your scans online, here are some choices you can consider.

A Content Management System Already in Place

Examples Include: LibGuides (screenshot below), library catalogs, museum databases

Screenshot of a public library's LibGuide site.

Screenshot of a LibGuide with extensive information about North Carolina maps.

Typically Chosen By: Institutions who already have a CMS that they can stretch to serve their needs.

The Positive Side: You may be able to start sharing your scans faster because the CMS is already adopted and paid for by your institution and familiar to staff and online users. 

Possible Challenges: LibGuides, library catalogs, and museum databases do not always follow best practices and standards for digital collections. For example, it may not allow you to attach multiple scans to a single record, or it may not export your metadata in a structured way. In other words, you may be fitting a “square peg into a round hole.” In addition, if the features you want to use are secondary to the system’s main purpose, the vendor or developer could drop those features later. 

Recommended? Depending on your resources and needs this can be the best solution. Just be aware of the possible down sides mentioned above.

A Social Media or Photo Sharing Website

Examples Include: Facebook, Flickr (Screenshot below), Tumblr

Screenshot of a yearbook cover photo on Flickr

Screenshot of an item on Flickr.

Typically Chosen By: Private individuals, small organizations with limited technical staff, institutions seeking to engage with broad communities where those communities already congregate online.

The Positive Side: These reach broad, built-in audiences. There is frequently no cost up front.

Possible Challenges: These do not adhere to best practices and standards for digital collections, which can cause a lot of work later on. Sites like these can shut down or change their terms of service with little or no regard for or warning to users. There are ads displayed near to your files, over which your organization has no control. It’s frequently impossible or extremely difficult to get your files and metadata back out of these sites.

Recommended? Not recommended as the main mechanism for managing and storing your files and metadata. These sites are best used only for outreach and engagement.

Hosting your Content on DigitalNC.org

Typically Chosen By: Institutions of all sizes who prefer not to host their own software, possibly due to local IT limitations or as a result of strategic priorities;  institutions who would like their scans and metadata searchable alongside others from around the state.

The Positive Side: Your content reaches a broad, built-in audience. It would be searchable with similar digital collections from around North Carolina. Currently no cost to institutions.

Possible Challenges: We do the uploading and editing for you, and it takes place within a broader schedule. We’d ask you to create images and metadata that follow our standards before we could upload. (These could be positives, depending on your perspective.)

Recommended? Sure! Depending on your resources and needs this can be a great option.

A Content Management System Hosted by an External Company

Examples Include: CONTENTdm (screenshot below), hosted Islandora, ArtStor’s JSTOR Forum, Omeka.net, Past Perfect Online, or TIND (which is what we use, see screenshot at the beginning of this post)

Photograph of a man and boy with two dogs, along with metadata below it.

Screenshot of a hosted instance of CONTENTdm.

Typically Chosen By: Institutions of all sizes who prefer not to host their own software, possibly due to local IT limitations or as a result of strategic priorities.

The Positive Side: Many systems like these are built with best practices like consistency, standards, and integration with other systems. They will allow users to search your metadata, and often offer things like filtering, file downloading, and other desired user services. Your organization does not have to set up or maintain the software locally. You can establish a brand and dedicated site for your digital collections.

Possible Challenges: They require staff with specialized training in the system, and the ability to pay a vendor both initially and on an ongoing basis. You’re limited to the services or features the vendor chooses to offer.

Recommended? Sure! Depending on your resources this can be a great option.

Hosting Your own Content Management System

Examples Include: Self-hosted Islandora, Omeka (screenshot below), Samvera, Collective Access

Screenshot of a colorful campus map along with metadata.

Screenshot of a self-hosted instance of Omeka.

Typically Chosen By: Institutions with programmers on staff, dedicated IT support, and collections that require a lot of customization.

The Positive Side: Like the hosted systems above, these are also often built with best practices like standards and interoperability. They will allow users to search your metadata, and often offer things like filtering, image downloading, and other user services. When you host your own system you can frequently customize more features.

Possible Challenges: They require staff with specialized training, and a robust and flexible IT support infrastructure. They’re more time intensive and costly to maintain.

Recommended? Sure! Depending on your resources and needs this can be a great option.

Final Thoughts

In the end, there isn’t much that’s an “always wrong” choice. There are only choices that have different consequences down the road. We encourage people to choose the systems that adhere to digital collections best practices, because those best practices come from people who’ve made choices they regretted. In the end, it’s most important to choose a solution that meets your needs and fits the resources you have now and those you anticipate having in the future. Above all, always be sure that your scans and metadata are backed up and can be extracted from the system you choose!

Did we miss anything? Leave us a comment below.

If you’re considering one or more of these and have questions, get in touch. We’re happy to give you advice for what to ask a vendor or point you to similar institutions who may have already adopted what you’re considering.


Military and Veterans History on DigitalNC: Best Ways to Search

Group of Soldiers Posed with Firestone Officials, from the Gaston Museum of Art & History.

Group of Soldiers Posed with Firestone Officials, from the Gaston Museum of Art & History.

This Veterans Day, we thought we’d mention some best bets for finding and searching materials on DigitalNC related to military history. Some time periods and subjects have better representation than others, so we’ve focused on the five wars that have the most related materials.

Tip 1: Search by Subject

To isolate materials that are predominantly about a particular war, you can use the subject specific links listed below.

search_within_resultsAfter you click on one of the links above, if you’d like to search within the results, type your search term in the search box at the top of the page, leaving “within results” selected (see screenshot at right).

You can also do a full text search that combines (1) your research interest (perhaps a name, a topic, or an event) in conjunction with (2) the name of a particular war. This may yield a lot more results, depending on your research interest, but it could also zero in on your target faster. Here’s a link to an example that you can amend to your own use.

Only interested in photographs? Try this search, which is limited to photos that contain the word “military” or “soldiers” as a subject.

Tip 2: Search by Date Range

Another tactic is to search or browse items that were created during a particular war. These don’t always have that war as a subject term, but they often deal with wartime issues or society regardless. We’ve listed date specific links here:

A list of alumni and students killed or missing in action, from the 1944 UNC-Chapel Hill Yackety Yack yearbook, page 12.

A list of alumni and students killed or missing in action, from the 1944 UNC-Chapel Hill Yackety Yack yearbook, page 12.

Keep in mind that doing a full text search will be ineffective about 98% of the time when it comes to handwritten items on our site, as most do not have transcripts. This is just to let you know that you may need to read through handwritten items pulled up in one of the searches above if you believe they may contain information you’re interested in.

Our partners have shared a lot of yearbooks on DigitalNC and, while they may not be the first thing that comes to mind for military history, many colleges and universities recognized students who served. Especially for the Vietnam, Korean, Gulf, and Afghan wars, yearbooks document campus reactions and protests. You currently can’t search across all of the yearbooks available on DigitalNC, however if you’d like to browse through yearbooks published during a particular war, you can use this example link and just adjust the dates as needed. Currently, our site has high school yearbooks published up through the late 1960s, and college and university yearbooks and campus publications through 2015.

Tip 3: Newspapers!

Searching the student and community newspapers on DigitalNC can yield biographical information about soldiers, editorials expressing local opinions about America’s military action, as well as news and advertisements related to rationing and resources on the homefront.

The Newspapers Advanced Search is your friend here! You can target papers published during specific years. You can also narrow your search to specific newspaper titles.

advanced_search_wwi

Screenshot of the Newspapers Advanced Search page, with the search phrase “Red Cross” and limiting the results to papers published from 1914-1918.

 

We also wanted to call your attention to a couple of newspaper titles on DigitalNC that were published exclusively for service members or during one of these wars:

  • The Caduceus, published by the Base Hospital at Camp Greene (Charlotte, NC), 1918-1919
  • Cloudbuster, published at UNC-Chapel Hill to share news about the Navy pre-flight school held on campus, 1942-1945
  • The Home Front News, published by the Tarboro Rotary Club for servicemen from their city, 1943-1945
  • Hot Off the Hoover Rail, published by the community of Lawndale for servicemen from their city, 1942-1945
  • Trench and Camp, published by The Charlotte Observer for Camp Greene, 1917-1918

Bonus Resource: Wilson County’s Greatest Generation

One of the largest exhibits on our site is Wilson County’s Greatest Generation, an effort by the Wilson County Historical Association to document the service men and women of Wilson County, North Carolina who served in World War II. Documentation is organized by individual, and includes personal histories, photos, clippings, and other ephemera.

We hope this information can guide you through researching military history on DigitalNC. If you have any of your own tips or questions, please let us know by commenting below or contacting us.


Have Scans, Will Travel? Hosting Your Scans at DigitalNC

Moving Truck Transferring Family Possessions, from the Gaston County Museum of Art & History

Moving Truck Transferring Family Possessions, from the Gaston County Museum of Art & History

The Digital Heritage Center does a lot of scanning on some really versatile machines. It’s one of the practical sides to our mission, and we take pride in being able to provide that service.

What is perhaps less well known is that we also help cultural heritage institutions publish items they’ve scanned themselves. Many cultural heritage institutions have flatbed or book scanners as well as willing staff and volunteers, but lack the technical infrastructure to host those scans for the public.

We’ve helped institutions …

  • who needed to migrate from ailing databases or systems they can no longer support,
  • who wanted to be able to full-text search their materials, a function they couldn’t fulfill through their current website,
  • who offered their digital files to on-site users, but who were seeking a broader audience.

When we start this conversation, here are some of the questions we ask:

  • Tell us about the original physical objects* – does your institution still have them? are there any rights or privacy concerns to sharing these online? what kind of subject matter is represented?
  • Tell us about the digital files – who originally created them? how many are there? where do they live? what file types? how are they organized? is this an ongoing project? do you have any metadata already?

If the files are a good fit for DigitalNC, they get transferred to hard drives, metadata is created or amended, and items appear on the site alongside the scans we create here at the Center. If you work at a cultural heritage institution eligible to work with the Center, have or are currently creating scans, and are interested in adding these to DigitalNC, contact us. We may be able to give them a home.

* If there were any. We can help with born-digital items as well.


Suggestions for Viewing Scrapbooks on DigitalNC

Even for those of us who work at the Digital Heritage Center, browsing scrapbooks or other printed items on DigitalNC can be frustrating. The viewer for a single item, which displays yearbooks, photographs, and short booklets pretty well, can be cumbersome for longer and larger items. Here are a few features that may not be immediately apparent but that we hope might help.

This is a screenshot of the viewer, showing the page of a scrapbook.

Item page in CONTENTdmAt default, maybe about one third of the scrapbook page is showing (your screen may vary from mine). To the right, only a few thumbnails are visible at any one time. To move back and forth between pages, you’ll need to scroll through and click on each thumbnail one by one. If you want to see the full text for items, you have to toggle back and forth between tabs. So, what are your options?

Try Making the Scrapbook View Larger and Switching to “Content”

If you drag down the little toggle arrows at the bottom of the viewer, you’ll have more control over how much of the page is visible on your screen. You can also switch from “Thumbnails” to “Content” in the right-hand ribbon. This means more page links are visible at once, so you have to scroll less when moving from page to page.

Manipulate main CONTENTdm interface

Try “Page Flip View” for a Quick Browse

The second tip is to try Page Flip View. The button for Page Flip View is located over the top of the page image:

Page Flip View Button

We use this option if we want to browse an item fast. Sometimes the image quality isn’t that good (I won’t go into why here). However, Page Flip View can be helpful if you want to get a quick sense of what’s inside a scrapbook, or if you’re looking for something in particular. Here’s what Page Flip View looks like on my screen:

Page Flip View

To move back and forth, just click on the page you’d like to turn.

Try “View PDF & Text” for a Better Layout

A favorite way to view scrapbooks and similar items is to click the View PDF & Text button, located right next to the Page Flip View button. View PDF & Text brings up an alternative view that takes advantage of a lot more screen real estate. See below.

Viewing PDF image and text

With this view, you’re able to see more of each page. A lot more thumbnails are stretched out across the bottom of the screen, so you’ll scroll less. Full text (if it’s present) comes up on the left hand side with each page. If you’ve searched for text, as above, and there are hits on the page, you’ll see the highlight right away instead of having to switch back and forth between tabs. You can hide the full text by using the button in the upper left, if you’d like even more of the main image to show.

We hope these tips are helpful. If you have any questions about the interface or what we’ve mentioned, let us know.


North Carolina Newspaper Digitization Part 3: This is How We Do It

Greensboro Daily News Ad, March 2, 1934

Greensboro Daily News Ad, March 2, 1934

Like Jeopardy!, I want to tell you the answer before I get to the question.

Following a newspaper digitization and markup standard helps us plan for the future and makes it easier for us to work with vendors, open-source software, and other libraries and archives.

I say this up front, because when we explain how we digitize and share newspapers the frequent response is to ask why we do it the way we do. I think this is because our process is more labor intensive than people expect. It’s definitely not the only way, but we’re committed to this path for right now because it accommodates multiple formats (microfilm, print, born-digital), fits our current digitization capacity, and results in a system we think is flexible and extensible.

That standard I mentioned above comes out of the Library of Congress’ National Digital Newspaper Program (NDNP). All of our newspaper work is NDNP compliant, which means we follow that project’s recommendations for how to structure files, the type of metadata to assign to those files, and also the markup language that tells the computer where words are situated on each page (very helpful for full-text search).

I’ll give you a broad outline of our workflow and the tools we use. However, if you want more specific technical details, head over to our account on GitHub.

Screenshot of PaperBoy!

Screenshot of PaperBoy!

Let’s say one of our partners is interested in having us digitize a print newspaper. We’ll start by scanning each page separately on whichever machine works for the paper’s size. Because the NDNP standard requires page-level metadata, we’ve created a lightweight piece of software that helps us take care of some of that while we scan. Affectionately dubbed “PaperBoy,” this program allows the scanning technician to track page number, date, volume, issue, and edition for each shot. While it slows down scanning a little bit, it speeds up post-processing metadata work quite a lot.

Once the scanning’s complete, we process the files to create derivatives that serve different needs. We use ABBYY Recognition Server to get those multiple formats:

  1. a JPEG2000 image that’s excellent quality yet small in file size
  2. an XML file that includes computer-recognized text from the image along with coordinates that indicate the location of each word on that image
  3. a .pdf file that includes both the image and searchable text.

Now that we have the derivatives, we begin filling out a spreadsheet with page-level metadata. We first add the metadata created using Paperboy and then we run through the scans page by page, correcting any mistakes found in the Paperboy output and adding additional metadata. This also helps us quality control the scans and gives us a chance to find skipped pages.

How much metadata do we do? You can download a sample batch spreadsheet from GitHub, if you’re interested in the specifics, but it includes the PaperBoy output as well as fields like Title, our name (Digital Heritage Center) as batch-creators, and information about the print paper’s physical location. A lot of those fields stay the same across numerous scans or can be programmatically populated with a spreadsheet formula, to help make things go faster.

Once we have the spreadsheet and scans complete, scripts developed by our programmer (also available on GitHub) use those spreadsheets to figure out how to rearrange the files and metadata into packages structured just the way the NDNP standard likes them. The script breaks out each newspaper issue’s files into their own file folder, renaming and reorganizing the pages (if needed). The script also creates issue-level XML files, which tag along inside each folder. These XML files describe the issue and its relation to the batch, and include some administrative metadata about who created the files, etc.

Newspaper files before processing (left) and after (right).

Newspaper files before processing (left) and after (right).

The final steps are to load our NDNP-compliant batches into the software we use to present it online, and to quality control the metadata and scans.

If you think about it, newspapers have a helpfully consistent structure: date-driven volumes, issues, and editions. But there isn’t much else in the digital library world quite like them, so more common content management systems can leave something to be desired for both searching and viewing newspapers.  Because of this, and because there’s just so MUCH newspaper content, we use a standalone system for our newspapers: the Library of Congress’ open source newspaper viewer, ChronAm. It’s named as such because it also happens to be the one used for the NDNP’s online presence: the Chronicling America website.

While not perfect, this viewer does really well exploiting newspaper structure. It also allows you to zoom in and out while you skim and read, and it highlights your search terms (courtesy of those XML files created by ABBYY). Try it out on the North Carolina Newspapers portion of our site.

“Can’t you just scan the newspaper and put it online as a bunch of TIFs or JPGs?” Sure. That happens. But that brings me back around to the why question. We love newspapers (most of the time) and love making it as easy and intuitive to use them as we can. We think it’s important to exploit their newspapery-ness, because that’s how users think of and search them.

We also believe that standards like the one from NDNP are kind of like the rules of the road. While off-roading can be fun, driving en masse enables us to be interoperable and sustainable. Standards mean we have a baseline of shared understanding that gives us a boost when we decide we want to drive somewhere together.

This post’s bird’s eye view (perhaps a low-flying bird) doesn’t include more specific questions you may be asking (“What resolution do you use when you scan?” “You didn’t explain METSALTO!”) I also just tackled our print newspaper procedure, because it’s the most labor intensive. When we work with digitized microfilm and born-digital papers the procedure is truncated but similar.

I hope this post as well as part 1 and part 2 of this series give you a sense of what’s involved in our newspaper digitization process and why we do it the way we do. As always, we’re happy to talk more. Just drop us a line.


North Carolina Newspaper Digitization Part 2: The State of the State

Sign pointing microfilm users to different online resources. Taken in Wilson Library's North Carolina Collection Reading Room, UNC-Chapel Hill.

Sign pointing microfilm users to different online resources. Taken in Wilson Library’s North Carolina Collection Reading Room, UNC-Chapel Hill.

[This post updated July 2017.]

Newspaper digitization is challenging for a number of reasons (refer to our previous post). Although we’re biased, if you’re interested in accessing North Carolina newspapers online you’re actually pretty lucky; North Carolina is positioned well ahead of many other states. Below we’ve listed, in descending order of size, all of the major historic online newspaper databases sponsored by North Carolina institutions that are on our radar.

Name: Newspapers.com
Dates: 1751-2000
Coverage: Statewide
Amount Online: 3,500,000+ pages
Details: The North Carolina Collection at UNC-Chapel Hill Library recently partnered with Newspapers.com to digitize millions of pages of North Carolina newspapers. These are accessible for free at the State Archives of North Carolina or UNC-Chapel Hill’s Library, or you can view them anywhere at newspapers.com for a monthly fee. As of July 2017, NC LIVE also makes these papers available to member libraries and their card holders. While there are other vendors out there with historic North Carolina newspapers, this is the most comprehensive to date.

Name: The North Carolina Digital Heritage Center
Coverage: Statewide
Dates: 1824-2013
Amount Online: 640,000+ pages
Details: Each year we receive LSTA funding from the State Library of North Carolina to digitize newspapers. Part of that funding goes toward papers on microfilm, for which we ask for title nominations from libraries and archives. We also digitize some newspapers from print (mostly college and university student newspapers) as well as small runs of community papers that have not been microfilmed.

Name: The University of North Carolina at Chapel Hill, National Digital Newspaper Program Grant Award
Coverage: Statewide
Dates: 1836-1922
Amount Online: 100,000+ pages
Details: UNC-Chapel Hill is currently in its second round of providing selected historic newspapers for digitization and sharing through the Library of Congress’ Chronicling America website. These issues are searchable along with a selection of titles from other states.

Name: University of North Carolina at Greensboro Library / Greensboro Museum
Dates: 1826-1946
Coverage: Town of Greensboro and surrounding area
Amount Online: 5,000+ issues
Details: The Greensboro Historical Newspapers collection includes a variety of papers from that area, including World War II military base papers.

Name: The State Archives of North Carolina
Dates: 1752-1890s
Coverage: Statewide
Amount Online: 4,000+ issues
Details: The State Archives of North Carolina actively preserves, microfilms, and digitizes newspapers. While most of these are not currently available online, they have shared some of the earliest on their website.

Name: East Carolina University Library
Dates: 1887-1915
Coverage: Town of Greenville and surrounding area
Amount Online: 1,800+ issues
Details: ECU’s Digital Collections include The Eastern Reflector, a community paper published in Greenville.

While more focused, college and university papers (especially earlier issues) often included local community news. In addition to those featured on DigitalNC, here’s a list of other school papers online:

This isn’t to say others aren’t scanning their local newspapers – we know some heard of local entities (businesses and libraries) working toward that goal. But this post was intended to list the largest, statewide, and (mostly) freely searchable endeavors. Know of others? Tell us.

In Part 3 of this Newspaper Digitization series, we’ll get technical and describe how we digitize newspapers here at the Digital Heritage Center.


Two related notes:

  1. Looking for a newspaper that isn’t online (yet)? Through your local public library, you can most likely loan and view newspaper microfilm from the State Library of North Carolina. This Newspaper Locator may be helpful if you want to determine some of the titles published in a specific area.
  1. North Carolinians are heavily involved in efforts to preserve born-digital news. The Educopia Institute, located in Greensboro, is spearheading a conversation that brings in news producers and cultural heritage professionals to talk about our disappearing journalistic heritage.  At their website you can learn more about the Memory Hole events and read a white paper on Newspaper Preservation.

North Carolina Newspaper Digitization Part 1: Why Isn’t It All Online Already?

Carrier boy with newspapers. 1965, Courtesy East Carolina University Digital Collections

Carrier boy with newspapers. 1965, Courtesy East Carolina University Digital Collections

Here’s what we know:

  1. Researchers love newspapers.
  2. Libraries and archives love newspapers.
  3. North Carolina has produced a lot of newspapers.
  4. No, really. There. are. a. lot.

Well, we do know a little bit more than that, but those are the Cliff’s Notes of our newspaper story. Because we work with so many papers, we try and stay on top of what’s happening with newspaper digitization in the state and around the country. We thought we’d write a few blog posts to share some of what we’ve seen and are seeing in that area, and to help get the word out that there’s a lot happening in this space in North Carolina.

So, why is digitizing and sharing newspapers online so tough?

Quantity

There are a lot of them. We’re saying it once more simply because it is the most costly factor in digitization and preservation. Let’s take, for example, a weekly newspaper published from 1870-1920. That’s over 2,500 issues. Say each issue is 8 pages long. Now we’re up to 20,000+ pages. And let’s say there’s one of those types of papers in every county. We’re already at 2 million pages for the state, for only 50 years. This is hugely conservative, considering many counties had more than one paper. And we didn’t even talk about papers published by schools, companies, or ambitious individuals. Or about dailies…

By our estimation, digitization of just the microfilmed newspapers located in the North Carolina Collection at UNC-Chapel Hill would result in over 40 million pages, which means 40 million digitized images. That could be upwards of 180 TB of data. For JUST storage (not including serving this up to the web, maintenance, staff) you’d pay a paltry $6,000 per month*.

We kid you not.

Size

Beside quantity, the remaining challenges look petite. Broadside newspaper pages need a larger scanner than most institutions can afford, especially if the papers are bound. Tabloid sized pages won’t fit on typical flatbed scanners either, and we rarely recommend flatbeds for something like this because they’re just too slow.

Material

Although uniform, which is a plus, historic newspapers can be fragile, friable, and fiddly. The more carefully you have to handle material when you digitize, the more time you’re going to need.

Text-Heavy

Having images of newspapers is really helpful. It’s portable, physically compact, and easier to copy. But the true advantage of a digital version is when it’s full-text searchable. Full-text searchability across large quantities of files requires indexing and search software, and enough IT infrastructure to make that happen.

Rights

While most newspapers published before 1923 can be safely shared online, those published in the years since can have attendant rights issues (pun intended). The massive changes in newspaper ownership over the last 20 years can make institutions wary about publishing a paper from 1924 or 1994.

Oh My.

Hopefully it’s clearer now why more historic newspapers aren’t yet freely available online. Albeit daunting, the challenges mentioned above are all surmountable with enough resources (money and expertise) and time. In our next blog post we’ll highlight where you can find historic North Carolina newspapers online right this very minute.


* We’re quoting Amazon S3 storage here, but YMMV.


Planning a Digital Project that Works (Hint: Digitization is the Easy Part)

At the North Carolina Digital Heritage Center we work on digital projects with cultural heritage institutions around the state. We’ve been at it since 2010 and have completed projects with more than 180 different institutions. In most cases, we provide digital library services, but we also serve in an advisory role, sharing our thoughts and experiences with libraries and museums who are interested in developing their own digital projects. In these conversations, a lot of common themes emerge. There are plenty of guides online talking about best practices for digital projects, and we often refer our colleagues to these, but I thought it would be helpful to share a few essential steps in planning a digital project that I hope will help libraries avoid some of the pitfalls that can lead to incomplete or unsustainable projects.

1. Don’t Worry About Equipment or Specifications (Yet). We see this happen over and over again: a library wants to get started on a digital project and all of the questions we get are related to digitization: What scanner should we buy? What DPI should we scan at? These are important questions that need to be answered, but not at first. There’s no point talking about how materials will be digitized until you know what you’re going to do with the digital files.

2. Before You Do Anything, Figure Out How You’re Going to Get Your Content Online. If digitization is the easy part, this is the hard part. This is what prevents many libraries with limited resources from successfully completing digital projects on their own. Unless you’re scanning materials only for patrons to use in the building, you’re going to need to figure out how to share the digital images and metadata online. This requires access to a content management system (like CONTENTdm or Islandora), a catalog that enables the addition of images or other digitized content (like SirsiDynix Portfolio),  a partnership with non-profit hosting service (like the Internet Archive), or a willingness to share library materials on commercial sites (like Facebook or Flickr). Until you know how you’re going to do this, there’s no point in talking about scanning.

3. Before You Do Anything Else, Figure Out How You’re Going to Keep Your Content Online. You put a lot of work into finishing a digital project and getting everything successfully shared online. Naturally, you’re going to want to make sure that it stays online. It is important for librarians — and especially library administrators — to understand that digital projects require a regular ongoing commitment of resources and staff time. Like purchasing a house or a car, the biggest investment might come at the beginning, but there are going to be maintenance costs over time. This is why grant funding cannot be the only answer for funding digital projects. Grants will provide resources for a year or two, but your library has to be willing to assume ongoing costs for keeping the digital project updated and accessible.

4. If You Don’t Have Dedicated IT Support, Use Somebody Else’s. Small libraries and museums are often in a tough position with IT support. Either they have limited support or they have to rely on support from a larger agency (like county government) with many competing demands. Hosting your own digital project is going to require significant IT support. How much? It depends on how large and how complex your project is going to be, but as a rule of thumb I’d say that if you don’t have at least two full-time IT staff members who have experience with digital library projects and who have the time available to support your project, then you’ll need to look outside your institution for help.

5. There’s Nothing Wrong With Letting Somebody Else Host Your Collection. Without substantial IT support, digital projects used to be out of reach for smaller institutions. Not anymore! Many vendors now offer digital collection hosting services: OCLC hosts CONTENTdm collections for many libraries, Lyrasis hosts Islandora collections and facilitates projects with the Internet Archive, and there are a variety of companies that offer Omeka hosting. This is a great option for smaller institutions, enabling them to get a digital project online quickly without having to invest in servers or staff time. Of course, you’ll have to pay for these services, and they get more expensive the more content you post online, but it’s still likely to be much cheaper than trying to do everything yourself. Keep in mind that this is not just a problem that small libraries are grappling with. With the increasing availability of cloud-based servers, lots of companies are deciding to outsource hosting. Even Netflix does it.

6. Get Help. There’s a lot of help out there: use it. In North Carolina, we have a statewide digital library program and lots of outstanding digital library programs at universities and state agencies. There’s no reason for a smaller institution to go it alone. Established programs can provide lots of guidance and advice, and they may also be able to help with digitization, hosting, and funding.

7. Be Wary of Vendors Who Make it Sound Easy (Especially if They Haven’t Worked With Libraries Before). This is important to understand: digital library projects are complicated, but to somebody who hasn’t worked on one before, they can look pretty easy from the outside. “All you want to do is put some scans online?” says a local vendor eager to get your business. “No problem. We can do it way cheaper than that big company you got a quote from.” This almost never ends well. Vendors who haven’t worked with libraries rarely understand our concerns about metadata, the need to effectively search digitized content, and preservation. If it sounds too good (and too cheap) to be true, it usually is.

8. Metadata is More than Keywords. Although many digital collections include fantastic images, people will still find these by typing words into a search box. Good metadata will make it easier for patrons to discover, understand, and use the materials you put online. For some collections (like a box of unidentified photos), metadata can be a lot of work. For others (like a collection of postcards), it can be pretty straightforward. Before you start scanning anything, make sure you have a plan (and staff available) for describing the materials you’re planning to put online.

9. Plan to Share. Once you get your collection online, don’t keep it to yourself. More people will find and use your materials if you share your metadata. The Digital Public Library of America harvests and hosts metadata from libraries around the country (including North Carolina) and presents it in a simple, easy-to-use interface. This doesn’t replace your digital collection — links from the DPLA will lead users back to your website. Many libraries share digital collections information in their local catalogs, or with national resources like WorldCat. Figuring out how you’ll share your metadata beyond what you present online on your site should be a part of your planning process.

Now, once all those questions are answered and you have an achievable and sustainable plan in place (and know how you’re going to pay for it), it’s time to get down to the details and finally answer those questions about equipment and scanning. Good luck!

 

 

 


More Than Portraits: Possibilities High School Yearbooks have for Historical Research

As the school year comes to a close across the state, it seems like a good time to take a more in-depth look at the wealth of information that can be found in the more than 1,600 high school yearbooks that we have scanned and made accessible on DigitalNC in the past year.  While the most obvious use of these yearbooks is for genealogical purposes, they contain much more than just portraits and can tell a lot about the towns and time periods they come from.

As our high school yearbooks are only available through the year 1964, there is not a lot of integration of North Carolina schools evident in the yearbooks.  However, the yearbooks available in DigitalNC do come from both white and black schools, often in the same towns, dating back to the early 1900s.  This can allow comparison of how the schools operated and a view into life in segregated schools in North Carolina.  For example, in Tarboro, there was Tarboro High School, the white school, and Pattillo High School, the black school.  Our yearbooks from both cover the 1940s-1950s.

from 1949 Chapel Hill High School yearbook "Hillife"

from 1949 Chapel Hill High School yearbook “Hillife”

In many of the yearbooks in the North Carolina High School Yearbooks collection there are extensive sections dedicated to both the clubs and the athletics at the school.  These sections, with many group portraits, action shots, and sometimes even added explanation, provide a glimpse into what extracurricular activities students participated in throughout the years.  For example in the 1949 Chapel Hill High School yearbook  there is a babysitter’s club pictured, and in the 1929 R.J. Reynolds High School Black and Gold yearbook, there is a photograph of the “Salemanship club”.  Beyond being interesting in their own way, this information shows how priorities for school age children and the expected responsibilities they have shift over time.

from 1929 R.J. Reynolds High School yearbook "Black and Gold"

from 1929 R.J. Reynolds High School yearbook “Black and Gold”

Most of the yearbooks contain information on the teachers at the school and the courses and subjects they taught.  Again, like the clubs, this information provides insight into how subject emphasis in school has changed over time.  The page below from the 1963 Lion yearbook from P.W. Moore Junior-Senior High School in Elizabeth City includes photographs from classes that are not often seen anymore, including agriculture, typing, and guidance class.

Some of the classes offered at P.W. Moore Junior-Senior High School in 1963

Some of the classes offered at P.W. Moore Junior-Senior High School in 1963

The yearbooks also contain a lot of images of events that occurred at the schools.  A few weeks ago we pointed out the wonderful May Day images from across the decades.  Other events such as prom, homecoming, and school specific traditions are included in the yearbooks.  Below is a schedule of events from the 1941-1942 school year at Hickory High School.

1941-1942 Hickory High School schedule, from the "Hickory Log."

1941-1942 Hickory High School schedule, from the “Hickory Log.”

Current events of the day are also featured in these yearbooks.  For example, those published during World War II often have heavy patriotic themes and some, such as the High Point High School yearbook from 1945, have whole spreads dedicated to those lost from High Point, particularly fellow classmates, in the war.

Dedication page to those killed in World War II from High Point High School, from the 1945 Pemican

Dedication page to those killed in World War II from High Point High School, from the 1945 Pemican

The advertising section at the back of the yearbooks offer a glimpse at the businesses of the town the school is in, which can be particularly useful for small towns that may not have had their own city directories.  The listings usually include addresses for the businesses, and sometimes, as is the case in the 1960 Pittsboro High School yearbook, photographs of the businesses themselves.  These photographs can be the only images of businesses that shut down years ago.

 

City Electronics Shop ad in Pittsboro High School's 1960 The Dragonian

City Electronics Shop ad in Pittsboro High School’s 1960 The Dragonian

henrysrestaurant_pittsborohighschool

Henry’s Restaurant ad, in Pittsboro High School’s 1960 The Dragonian

C.E. Jones Co. Bridal Headquarters ad, in the Pittsboro High School 1960 The Dragonian

C.E. Jones Co. Bridal Headquarters ad, in the Pittsboro High School 1960 The Dragonian

As graduation approaches for high-schoolers across the state, spend some time looking through our high school yearbook collection  and take a peek into life as a high school student fifty years or more ago.  If you know of high school yearbooks at a local institution in North Carolina that are not currently included in our collection, go here to learn more about how to get them included on DigitalNC.