Viewing entries tagged "behind the scenes"

Scanners and Content Management Systems in North Carolina Cultural Heritage Organizations

three adults sitting around a table; one with white gloves sits behind a laptop near an open scanner while the other two look on
An NCDHC staff member works with two individuals during an on-site scanning event.
  • This post shares information about scanners and content management or online platforms used by some North Carolina cultural heritage organizations.
  • The lists are current as of this post and they are not exhaustive.
  • For more information, get in touch with us.

People frequently ask us to recommend digitization equipment as well as content management systems* or ways to display their files online.  To help connect more people to their peers, we sent our partners a survey asking them the following:

  • List the make and model of any equipment you have that scans print materials, photographs, slides, and/or negatives.
  • What local or remotely hosted software does your organization use to keep track of and/or share your digital images?

Thanks to the 45 institutions who responded, we now have a great list on hand. If you contact us we can connect you directly with those who said they’d be happy to share experiences and information. 

Keep reading for lists of all of the equipment and software mentioned listed in alphabetical order. If you work in a cultural heritage organization in NC and don’t see your digitization equipment and/or system mentioned below, leave a comment and we will add it.

*A content management system is software that will store and organize files, usually with functionality that helps people make use of the files like search, online display, etc.

Here’s the list of platforms mentioned:

Content Management Systems / Online Platforms

  • Alma Digital/ Primo VE
  • CONTENTdm
  • Cumulus
  • DigitalNC (which uses TIND, WordPress, and Open ONI)
  • DSpace**
  • Drupal
  • Ex Libris Alma Digital
  • Fedora + Hyrax**
  • Flickr
  • Internet Archive
  • Islandora
  • JSTOR Forum
  • KeepThinking Qi**
  • Laserfishe
  • LibGuides
  • Omeka
  • Pass It Down
  • Past Perfect
  • PTFS Knowvation
  • Quartex**
  • Re:discovery Proficio
  • WordPress

** Not represented in the survey responses but we know folks who use these.

Comments: Some of these are hosted by vendors; others are hosted by the organization. There are also sites listed here that might not be considered content management systems but that organizations use for online sharing. This list does not include social media sites where files might be shared, like Facebook, Instagram, Twitter. It also doesn’t get into details for those who have built their own systems (typically very well-resourced institutions). If you’d like some more guidance about choosing, check out this post: What Should You Do With Your Scanned Photos?

Here’s the list of equipment mentioned:

Flatbed Scanners

  • Epson 10000XL, 11000XL, 12000XL
  • Epson DS-50000
  • Epson Perfection V19, V39, V370, V550
  • Epson Perfection V600, V700, V800
  • HP Scanjet G4050
  • HP ScanJetPro 2500 f1

Large Format Sheet-fed Scanners

  • HP DesignJet T2500

Overhead Scanners / Book Scanners

  • Bookeye 3, 5
  • Czur ET18 Pro
  • Fujitsu ScanSnap 600
  • ST600 Book Scanner
  • Zeutschel OS 12000 A1, Q1

Overhead Camera Systems

  • Phase One iXH 100MP camera + digital back
  • Sony A7R IV camera with mount

Negative and Slide Scanners

  • Hasselblad Flextight X5
  • Nikon Super CoolScan 9000 ED
  • PowerSlide 5000
  • ZONOZ FS-3 22MP All-in-1 Film & Slide Converter Scanner w/Speed-Load Adapters for 35mm

Microfilm Scanners

  • ST ViewScan 3, 4

Multi-Function Devices

  • Epson-WF-3540
  • Hewlett Packard Color LaserJet M476 copier with scanner
  • HP Officejet Pro X576dw
  • Konica Minolta bizhub 227 copier/scanner
  • Kyocera Taskalfa 4053ci
  • Savin IM 2500, MP 2004ex
  • Sharp MX-4071, MX-C304W
  • TASKalfa 3051ci
  • Xerox Documate 3220 desktop scanner
  • Xerox Workcentre 6655i, 7535

Comments: Some of these are staff use, some are available to the public, and some serve both groups. If you’re interested in what we use, take a look at this page: Scanning and Digitization Equipment


Mid-1800s Chatham County Superior Court Minute Docket Now Online

DigitalNC is proud to host the entire contents of a Chatham County Superior Court Minute Docket that spans from October 10, 1839 to December 3, 1866. This minute docket was provided by our partners at Chatham County Historical Association.

This minute docket is a primary source of legal cases from Chatham County, N.C. in the mid-1800s, including names of those who were called to court and what the disputes covered. Notably, this record was saved from the Chatham County Courthouse fire that occurred on March 25, 2010.

Also, you may be wondering what the object at the left corner of the docket image is; it’s a bone folder! We use bone folders to assist with digitization. In this case, you’ll find it gently holds back the pages that wouldn’t stay flat. You’ll also find weighted strings doing the same work on several other pages.

To look at the entire Chatham County Superior Court Minute Docket, click here. To learn more about the Chatham County Historical Association, you can view their homepage here.


DigitalNC from Home: Oral History Transcription

As all of us at the Digital Heritage Center carry on our work from home, we are continuing to utilize the time outside of our regular duties to enhance DigitalNC. One such project is adding transcriptions to our collection of North Carolina Oral Histories.

Transcriptions are the written text of audio files, which are, in our case, recordings of oral histories. The oral histories on DigitalNC vary in length, ranging from two minutes, to two hours, and beyond. Typing out transcriptions from scratch takes time- a lot of time. To help us out, we use the transcription software, Sonix. Once an audio file has been uploaded to Sonix, the software “listens” to and creates text of what it heard.

Screenshot of a Sonix transcript without edits. The text reads "(speaker): Okay, actually, I'd rather you sit there cuz that swings squeak squeaks. I want to show you so you got to cut off. No, it's running and we'll look at that when we finished. I will okay. (speaker): Okay, do you cook collard? Yes, I do. I want you to sit there soon. It's Queen this you tell me from start to finish exactly how you cook your collards. Okay. I'll get my put my meat in the pot. Yes, but if I got a smoked meat I put that in there and then I put a little Lord in them. Then I put a little sugar and salt and red pepper."

An example of a Sonix transcript before editing. This transcript came out relatively coherent, but needs speakers and will be assessed for faithful translation. For example, did the narrator say “Lord” or could it be “lard”?

Unfortunately, audio transcription software does not produce a faultless transcript. After Sonix creates the new text, we listen to the original audio and edit the errors. Edits include replacing or removing incorrectly heard words, adding in missed punctuation and paragraph spacing, and attributing the various speakers. We also remove speech fillers (think “um” and “er”) and note when speech is unclear with a bracketed question mark ([?]).

Editing also requires consistency. Here are some of the guidelines we follow to create dependable transcripts:

  • If the speaker does not stick to formal standards of grammar throughout the conversation, we do not correct it, but non-standard contractions are written fully (as in, goin’ becomes going)
  • If one speaker talks over another, we try to put them in order as it makes sense in the conversation
  • If a speaker expresses laughter, we enter that into the text using brackets ([laughs]).

This is where transcription work gets tricky. These guidelines may prompt questions during the editing process such as, How much laughter is enough to allow for [laughs]? or, What if the speaker has a regional accent that represents much of their personality and culture as expressed in the recording and I would like to point to it through non-standard contractions? There are no hard and fast answers to either question. Both rely on what the transcriptionist feels is most appropriate to faithfully represent the narrator’s story. This makes the transcription a participatory product, not just an automatic copy.

Respect for both the narrator’s speech and intent is the primary focus for a transcriptionist. In a perfect world, the interviewer would ask the narrator to look over the final document to approve of the content. However, because the Digital Heritage Center obtains all of our oral histories through our partners, plus the fact they are often recorded over 20 years ago, we are not able to consult either the interviewer or the narrator.

This leaves us to follow best practices, making sure to keep in mind our biases. Respecting the intersectionality of the narrator is an important dimension to this work. Many of the narrators in our Oral History Collection are Black and use African-American Vernacular English. Others speak with strong regional Southern dialects. As we draw up the final transcript, we have to take into consideration our own positionality and watch for editorializing and over-interpreting.

Screenshot of an edited Sonix transcription. The text reads Mary Lewis Deans: Tell me how you heard about it. Kermit Paris: I was working in the bakery in [?]. I don't know, just before Carolina Theater opened up [?], once before I used to live right there. We was on the railroad tracks when I heard it. And, sure enough, I reckon ten or fifteen minutes after then, some artillery had come down the train, I remember that, going north. Artillery and some tanks was going down. They had guards on the flat cars, I'd seen some soldiers on the flat cars at that time. I do remember that.

An example of a Sonix transcript after editing.

Why are we transcribing oral histories? Not only does adding text to the audio make the record accessible, but researchers are now able to scroll through interviews for relevant information without having to listen to the entire recording. The text within the transcripts is fully searchable when doing a full text search on DigitalNC, which makes them appear in many more searches than they would have with just a basic description. That being said, accessibility is a first step and we are looking forward to continually refining our transcripts and supplemental description work with an eye to equitability and transparency.

To take a look at all of the oral histories we have online, click here. And if you’re interested in glancing through the many oral histories with either original or newly made transcripts, click here.


DigitalNC Works from Home: Closed Captions

While at home, the NCDHC staff has been working on increasing accessibility to users through the addition of closed captions. Closed captions provide audiences with the text version of what is being spoken as well as relevant sound information–such as music, applause, and laughter–written out and synchronized with the audio of the video. Unlike open captions that are always present on a video, closed captions can be turned on and off by the viewer. The use of captions is not limited to those who have difficulty hearing, but encompass a large percent of the population who use them for diverse reasons which include helping people to focus, retain information, being in a sound-sensitive environment (e.g. a library), and more.

Creating captions from scratch for videos, even short ones, can take several hours. In an effort to generate captions for a larger amount of moving images in our collection more quickly we use Happy Scribe, an automatic subtitling tool. These autogenerated subtitles do not include sound description and are never 100% perfect for reasons such as heavy accents, no knowledge of North Carolina history, mumbling, and bad sentence formatting, which requires staff to double check words and spelling, text spacing, punctuation, accurate synchronization between text and audio, to add sound description, and to sometimes engage in research about North Carolina. To remain transparent to our users and those involved in the material, the content of the videos are not censored. We write exactly what we hear in the videos. Captioning is an ongoing process and we are doing our best to make sure that our closed captions are as accurate and easy to read for all viewers as possible.

Screenshot showing the different work areas of Happy Scribe, an automatic subtitling tool.

Different work areas of Happy Scribe

The image above shows what it is like to edit captions using Happy Scribe. The area to the left is used for changing the text and spacing. If we have too many characters per line a box near the timestamps will turn red to alert us that we need to change the spacing or create a new caption that will make it easier for the viewer to read.
 
The area along the bottom of the screen (grey strip with white boxes) is used to synchronize captions with the audio. The sound waves at the very bottom of the screen help us to accurately line up the text to audio. To move an entire box we can click and hold the middle of the white box and drag it where it needs to go. To change the length of time that the captions are on the screen we click on the beginning or end of the boxes to change their length. 

The right side of the screen shows us how the captions appear on the moving images. Below it there are several options to go forward or backwards in the video, edit subtitle limits (e.g. how many characters per line we will allow), and formatting of the captions such as color, size, font, alignment, and more. 

Shows captioning in Happy Scribe, an automatic subtitle tool, at work.

Captioning in action

 

Once the autogenerated subtitles have been fixed and sound description added, the closed captions are uploaded to the video. In addition, a copy of the transcription is uploaded to the video’s record for those who do not wish to watch the whole video or want to quickly search for specific information. 

To turn captions on or off for a video, mouse over the video area and click on the icon box with “CC” (outlined by a yellow box in the picture below). A menu will pop up where you can click on “Captions” (outlined by the blue box) to turn closed captions on.

To view moving images on our website with closed captions, please click here

To view all of the moving images on our website, please click here.


Moving Forward With Equitable Metadata: Changing Exclusive Terminology

To continue the steps taken to promote equal representation throughout DigitalNC’s collections, as initially brought up in the recent blog post We Can Do Better: Making Our Metadata More Equitable, the NCDHC staff is becoming more committed to inclusivity through changing exclusive terminology. For this update, we’re specifically looking at the gendered and presumptive terms used in the title and description metadata categories of our visual collections. These changes, while perhaps small in effort, are a big step towards reimaging how we can be better stewards of history, especially to those individuals who are brought into our collections without an identity.

As alluded to, many of the images we digitize and upload to DigitalNC come with little to no background information. This means that we have to decide how to both title and describe the people or events depicted. Unfortunately, inherent bias always presents complications when it comes to description. Language holds enormous power and influences how we perceive an image, whether we realize it or not. While we have to use words to make images searchable, we, in doing so, use our own subjective viewpoint to give that image meaning.

For example, here is an image along with the accompanying title and description on DigitalNC:

Image of a screenshot of a tin type photo in paper a paper frame of an unidentified adult. The description included partially reads "Tin type portrait in paper frame of a man."

Old title and description of what is now “Unidentified Adult”.

As you may have noticed, the title and description state that this individual is a man. If you were writing this description, would you also assume that this person is a man?

Here is another image on DigitalNC:

If you were to add a description of the individuals in this photo to illustrate their age, how would you phrase it? Do you see four children or do you see three children and one adult? Does the title of the image influence your opinion?

Here’s one final image:

Screenshot of a black and white group portrait of a group of adults, mostly in military uniform. The old title that accompanies the image is "Group of Military and Civilian Men Posed with Woman on Steps" and the old description is "Black and white group portrait of a group of men, mostly in military uniform, one woman in front."

Old title and description of what is now “Group of Military Personnel and Civilians Posed on Steps”.

From the caption, it is assumed that the woman is not in military uniform and is separate from the group in some way. Would you agree with this assumption? Do you think she might have a role in this photograph that is misrepresented in this title or description?

As you can tell, describing an image that everyone would agree to is tricky when given little to no context.

On top of that, we also have to make sure that these words are purposeful; purposeful means that, in addition to thinking about how you would search our collections, we have to acknowledge how the individuals or events in an image want to represent themselves. We may never know how these unidentified people would describe themselves or what gender they identify with, but deciding to use inclusive terminology is the most respectful way of making sure we are not misinterpreting, and therefore misaligning, the people or events in our collections.

In an effort to diminish the subjective viewpoint, here are the new changes you will be seeing in the visual collections on DigitalNC:

  • If family relationships are given information about the individuals in the image, gendered terms are used.
  • When there is no information given about the individuals in the image, gender neutral terms are used.
  • We will strive not to make assumptions about the individuals in the photograph when there is no contextual verification.

For example, in this photo of the Westbrook Family, we are aware of the names and relationships of the family members, so we have the description arranged as:

Left to right: Geneva Elenor (Woodall) Westbrook (wife of Eldridge Troy Westbrook); Geneva Louise Westbrook (Cox) (daughter); Mary Elizabeth Westbrook (Flowers) (daughter); Annie Maude Westbrook (daughter); Ivan Earl Westbrook (son).

We can indicate the individuals as mother, daughter, and son because this information was given with the photo.

However, in this photo to the left, no identifying information was given with the photo, therefore the title and description remain gender neutral:

A color photograph of two adults standing side by side, holding beverages, in front of an indoor fireplace.

This photo also provokes the third point- assumptions without context. At first glance, an image like this could bring to mind a couple; that, because they are standing close together and could be presumed a man and woman, they are married. With no way to prove that beyond our own biases, we choose to only note what is actually occurring in this photo.

So, are we losing anything in this process? Hopefully not much, if anything. Images will still be searchable by major subjects- you can browse through all of them here. Plenty of images are also attributed to collections and exhibits, such as the Asheville YWCA Photograph Collection, to make research easier. And, of course, you can always use the advanced search to rifle through all our images.

Subjectivity can’t be ruled out completely and we will still be making choices that will affect the viewing of an image. This update is a work in progress and you will no doubt see some inconsistencies on DigitalNC, but we hope that this explanation gives some insight into our equitable metadata mission. And if you ever see a familiar face or have information on an image, don’t hesitate to comment (there’s a comment box at the bottom of every image on DigitalNC) or reach out!


DigitalNC works from home: expanding photograph descriptions

As work from home continues for all of us at the Digital Heritage Center, we are getting the opportunity to dive into some long shelved cleanup projects from our migration into the TIND content management system.

One that we are excited to work through right now is creating better, individualized, description on sets of photographs that previously were only described in a single record.  In our previous content management system, ContentDM, there was a hierarchy built into the system that supported parent and child records that had different metadata.  So for example, a batch of photographs that one wanted to title at the parent level as “Wilson, NC Businesses” could also have individual child records that had titles such as “Food Lion, 1975.” 

screenshot of a content management system

 The object description (minimized here) is the child level record and applies only to the main image seen above, while the description is the parent level record and applied to every image on the right.

 

 View of a “compound object” photograph set in the new system – the separated out descriptions are mostly lost here.

When we moved into our new content management system, those individual titles were dropped down to a file description that did not go in the main record or was easy to view.  As a result, we made the decision to break up those batches of photographs so that each one shows up individually in a search with its own set of metadata.  That has required pulling down a spreadsheet of the parent level metadata and then converting it to apply individually to each photograph and re-uploading it into TIND.  This has also allowed us to add useful metadata such as geolocation coordinates to images of particular places which could be useful someday if we enable mapping technology in our content management system. While a bit tedious, we believe this is broadening access to some really great photographs from our partners and made them more accessible on our site. 

Search results

Search results view for a group of photos now individually listed – previously were all grouped under one vague title

Screenshot of a metadata record for a tobacco warehouse

This photograph now has more specific metadata describing it, including geo-coordinates, which makes it more useful to users.

Projects like this keep us busy working from home despite being a digitization shop – maintenance is always an important part of this work and this unexpected time away from our scanners is giving us the ability to focus on our existing materials a lot closer. 

Want to see all our image collections in DigitalNC?  Visit Images of North Carolina here.


We Can Do Better: Making Our Metadata More Equitable

Over the last few months I’ve been working on a pilot project that looks at how NCDHC staff have portrayed women through metadata (the information that accompanies the images on DigitalNC) over time. This is a small step towards finding unconscious bias in our work and making our metadata more equitable. I’ve accumulated some interesting examples, and I thought I’d share them here.

Anyone who’s ever tried to trace a matrilineal line knows the frustration of women being referred to only in the context of marriage. This was the convention in historic American culture – you’ll see it in newspapers, books, correspondence – and special collections are no exception. It was pretty easy for me to start looking at bias in our metadata with a simple search on Mrs., which netted me over 2,000 results.

Screenshot of the top 3 search results on DigitalNC.org when searching "Mrs."

If you browse that search yourself, you’ll see how many records don’t include the woman’s first name. The information that’s been written on or passed down with a photograph often inherited that cultural bias towards a woman’s married state. When NCDHC staff set out to describe a photograph, if all we have is “Mrs. Lewis Dellinger” then that’s what gets transferred to our metadata. Even if we had time to do research to try to locate Mrs. Lewis Dellinger’s given name, in most cases we couldn’t be positive it was the correct identification. So there are a lot of records that can’t be improved given the reliable information we have on hand.

Still, after browsing through DigitalNC, I started seeing places where a simple and quick change could make a difference. Here’s one example:

Black and White Image of white woman smiling and facing the camera

A screenshot of how this record looked initially, with the photograph entitled “Governor Scott’s Wife.”

Unlike many individuals in our collection, I knew this woman’s name and identity would be easy to confirm. Jessie Rae Osborne Scott was a graduate of what is now UNC-Greensboro. She taught high school, helped run a farm, raised five children, and was active in a number of charities and social causes. Other verified photographs of her are available online because she also happened to marry a governor. That fact is notable, but I’ve amended the record so that her own name is foremost while retaining the information originally included with the photograph in the description. 

When I first searched our website for the word “wife” I received 221 results; “husband” yielded 54. Because of ingrained bias, even if a woman’s name is available in the metadata her relationship to the man or men in the picture is privileged instead. Conversely, unless the woman was particularly well known or the overt focus of a photograph, husbands aren’t named as such. Here’s an example: 

Black and white family portrait with the man seated and holding a young child, and a woman standing to his left.

This photograph is entitled “Eppie N. Clifton, wife Melissa Honeycutt, and daughter Mettie.”

Note that the man is mentioned first, and the woman and child are described in relation to him. Here’s how I amended the photo’s metadata:

Black and white family portrait with the man seated and holding a young child, and a woman standing to his left.

This photograph is entitled “Mettie, Eppie N. Clifton, and Melissa Honeycutt.” The Description reads “L-R Mettie (daughter), Eppie N. Clifton (husband), and Melissa Honeycutt (wife).”

In the updated version I’m just going left to right and taking each person in turn, communicating what was written on or with the photograph. Their family relationship is still given, so that information isn’t lost, but it’s recorded in a way that’s more equal across the group.

Here’s another example I found interesting:

Black and white photo of five family members standing in front of a house.

This photo is entitled “Eldridge Troy Westbrook family and home, Bentonville Township, N.C.”

Note that the house is named after the male head of household and his name is noted in the title, but he isn’t in the photo. (The original description we were given even mentions that “ETW was living at time of photo; he doesn’t just happen to be in photo.”) I don’t want to remove the entire name of the house – it might have been identified that way among those who lived in the area – but I can easily improve the equity shown to the individuals who are actually shown in the photo without losing any important information. See what you think. All I did was keep the surname, and move the male’s name down to the description. I also put the familial relationships in parentheses instead of having them precede each name. I think this might subtly shift how people see this photograph and those pictured within. To me they seem less like they’re just hanging around waiting for ETW to arrive.

To sum it up, here are the types of changes we will regularly make to help improve the equity of our metadata:

  • We’ll note the full known identity of all of the photograph’s subjects in the title, moving from left to right, as in the example above.
  • When a couple’s only known information is a surname, we’ll record the honorifics for individuals from left to right. (In other words, we won’t default to always placing Mr. first.) Example: Mrs. and Mr. Detweiler
  • If a familial relationship is recorded about those in the photograph, we’ll note that in parentheses within the description. We’ll give equal consideration to noting relationships of all genders. 

Why is this work worth doing? How we name things influences power. It changes who gets noticed in a crowd. It shifts who gets resources when they’re scarce. Every individual has a right to their own identity; we don’t believe that the fact that a woman who lived in a time when she was considered secondary because of her gender should endure the same condition today. Why should we sustain a bias that’s been proven to do harm to society as a whole?

I’m sure I’m not doing a perfect job. I’ll miss my own biases as I make corrections. But with just a few small changes researchers will be able to find people they might not have found in the past. Even more, people viewing these photographs won’t have social conventions keeping them from really seeing all of the individuals in the pictures.


Durham Urban Renewal Records Have Been Renewed

In the early days of the North Carolina Digital Heritage Center, we digitized thousands of records created during the Durham Urban Renewal Project. Recently, we revisited these records with the intention of making them more accessible and useful to our partners and the public.

The Durham Redevelopment Commission was established in 1958 with the intention eliminating “urban blight” and improving the city’s infrastructure as more and more personal vehicles filled the city’s streets. Durham Urban Renewal targeted seven areas — one in Durham’s downtown district and six in historically Black neighborhoods including Hayti and Cleveland-Holloway. The projects in these six neighborhoods impacted approximately 9,100, or  11.7%, of Durham citizens at the beginning of the project in 1961. Although the initial timetable for the project was ten years, the project efforts went on for nearly 15 years and was ultimately never completed. By the end of the urban renewal efforts, more than 4,000 households and 500 businesses were razed and a new highway — NC 147 —  stretched through the heart of Durham.

A public library building, two stories tall with ornate columns.

Some structures included in the collection, such as the second home for the main branch of the Durham Public Library, outlived the urban renewal project and still stand today. This building is located at 311 East Main Street.

The Durham Urban Renewal Collection contains studies, reports, appraisals, property records, photographs, brochures, and clippings that span the nearly 20 years of urban renewal projects. These materials are artifacts of Durham before, during, and after urban renewal dramatically altered the city.

In an effort to make these materials as accessible and accurate as possible, we recently completed a major cleanup of the collection. Properties are now listed by complete street address. Many of the residential properties — and some commercial properties — were appraised more than once during the urban renewal process. We have consolidated all appraisals, photographs, and other records for individual properties into single listings, and text in these records are full-text searchable. We also used historical maps of the city from the years of urban renewal to provide additional information for unaddressed or mislabeled appraisals and records. In addition to the changes made to improve accessibility by address, we made efforts to ensure that the names of property owners are complete, accurate, and consistent across the collection, so that records may be located more easily in searching by the owners’ names.

The materials in the Durham Urban Renewal Collection came from Durham County Library’s North Carolina Collection and are only a portion of the materials contributed by the library to date. To learn more about the Durham County Library, visit their website or partner page.


Digital Collections OCR: What it is, and what it isn’t.

  • “I can see the word on the page, but when I search for it, no matches are found.”
  • “This item is searchable. Why can’t I read it with a screen reader?”

We get a lot of great questions like the ones above: the answer to all of them, in some way, is “OCR.”

What OCR Is

Optical Character Recognition (OCR) is amazing technology; with OCR software we are able to search image files for groups of pixels that look like text, guess what that text might be, and save the output in a way that we can feed into our search indexing systems. Even better, we’re sometimes able to overlay that text output on top of an image so that we can show you where we think a word might appear.

At the North Carolina Digital Heritage Center, we scan and store digital heritage materials as images. When we notice that an image contains printed text–documents, posters, ledgers, scrapbooks, and more–we also run it through OCR software. Without OCR, text shown in images is “locked” inside them; with OCR we can leverage the power of full text search to help people discover relevant images a little better than before.

What OCR Isn’t

No OCR method is without limitations. Whether OCR software can correctly “read” the text in an image depends on a few things:

The longer OCR takes, the better it is

The longer the OCR engine is allowed to puzzle over the pixels in an image, the better its output can be. At NCDHC we try to find the right balance between giving the OCR software enough time to produce useful results, and scanning more materials: letting OCR take too long would significantly reduce the amount of materials we’re able to add to DigitalNC each day.

OCR is less accurate with historic materials

Most of the materials we work with are difficult for OCR engines to interpret: compared with more modern materials, historic documents use fuzzier printing methods, display a lot of variation in letter forms, are deteriorating, or contain a mixture of printed and handwritten text.  All of these things are likely to confuse even the best OCR software, producing text output that can differ from what’s visible on the screen.

OCR isn’t the same as a transcription

Without human intervention, it can be difficult for OCR software to interpret the layout of a document. By default, OCR software attempts to “read” an image from left to right. Even if it’s able to recognize all of the words on a page, it may not recognize the order in which the words were intended to be read; for example, the software might not be able to differentiate where one column ends and another begins in a newspaper clipping, or it might include the text of an advertisement in the middle of an article:

Example of OCR text challenges

In contrast, transcriptions represent the text in an image as it’s meant to be read, and requires some amount of human labor to produce.

Summary, and a look ahead

OCR is a fantastic tool that enhances the way users are able to interact with the images available in DigitalNC collections, but its limitations prevent it from producing full, traditionally-readable transcriptions of image materials.

Even so, NCDHC looks forward to next-generation tools and methods for recognizing and searching for text within images. OCR software is constantly improving; the software we use today is faster and more accurate than it was five years ago, and OCR technology benefits from recent advances in machine learning and artificial intelligence.

If you have questions or concerns about searchable content on DigitalNC, or would like information on obtaining a copy of materials that is accessible to screen readers, please don’t hesitate to contact us.


2018’s Most Popular Items on DigitalNC.org

Today we’re taking a look at the most-viewed items on DigitalNC.org for 2018. Yearbooks and newspapers are the most populous and popular items on our site, so it’s no surprise that they took four of the five slots. What rose to the top and why? Take a look below.

#1 Pertelote Yearbook, 1981

Contributing Institution: Brevard College

This year our most viewed single item on DigitalNC was the 1981 Pertelote yearbook from Brevard College.

The Pertelote was popular due to the apprehension of a mailbombing suspect in October of this year and his ties to several North Carolina schools. Cesar Sayoc was a student at Brevard College in the 1980s and his photograph can be found in several locations within the 1981 yearbook, including this club photo from page 134.

A group photo of ten members of the Brevard College Canterbury Club

#2 The Outer Banks Fisherman

Contributing Institution: University of North Carolina at Chapel Hill

On a lighter note, the second most popular item on our site was a film from the early 1980s entitled “The Outer Banks Fisherman.” It features Freshwater Bass Champion Roland Martin fishing on the Outer Banks. This film had a few particular days of internet popularity when it was mentioned on a couple of North Carolina hunting and fishing forums.

Man in a yellow slicker fishing on the beach, smoking a pipe

#3 North Wilkesboro Journal-Patriot Newspaper, December 8, 1941

Contributing Institution: Wilkes County Public Library

The third most popular single item on DigitalNC was the December 8, 1941 issue of the North Wilkesboro Journal-Patriot newspaper. You can tell from this striking headline that it was published the day after the attack on Pearl Harbor during World War II. This paper generally received referrals via Google all year, but we’re not sure which search terms were leading users to this page so consistently.

#4 The Franklin Press and Highlands Maconian Newspaper, April 23, 1953, page 9

Contributing Institution: Fontana Regional Library

Many of our referrals come from Facebook, and that was the case with this fourth most popular item. It was featured in the Facebook Group “You May Be From Franklin NC If…” The original poster stated that Group members had looked for photos of the Old County Home over the years, and that they had recently uncovered this newspaper page which includes pictures of the Home’s state in 1953. Top half of the april 23 1953 Franklin Press and Highlands Maconian, page 9

#5 The Daily Tar Heel Newspaper, September 2, 1986

Contributing Institution: University of North Carolina at Chapel Hill

Facebook sharing also boosted this item’s rating, after the UNC-Chapel Hill University Archives asked for memories of the legal drinking age being raised to 21 in 1986 and the “send-0ff” on Franklin Street before the law came into effect. They shared a quote from a police officer as well as a link to the article below, which documents the damage and disgruntlement caused by the downtown party.

Top half of Daily Tar Heel front page from September 2, 1986, with photo of crowd on Franklin Street at night

 

Thanks for coming on our tour of the top DigitalNC items from this year. For the curious, we topped 4 million pageviews and 400K users in 2018! We’re looking forward to working with partners to share even more of North Carolina’s cultural heritage in 2019. 


DigitalNC Blog Header Image

About

This blog is maintained by the staff of the North Carolina Digital Heritage Center and features the latest news and highlights from the collections at DigitalNC, an online library of primary sources from organizations across North Carolina.

Social Media Policy

Search the Blog

Archives

Subscribe

Email subscribers can choose to receive a daily, weekly, or monthly email digest of news and features from the blog.

Newsletter Frequency
RSS Feed