What’s another word for Thesaurus?—Steven Wright, comedian
Story and photos by David Riecks
Who cares about keywords? Well, unless you have a, pardon the expression, photographic memory, you should. If you or your agent or a client ever wants to find a particular image from the thousands in your library, good keywords make it easy.
Since document files are made up of words, a researcher can do a search using some of those words to find the files closest to what is needed.
With photos, there are no words unless ones have been assigned. It’s possible to deduce some aspects of images via automated means, like using the date and time or possibly the location in the digital-camera-captured GPS information. But that’s about it unless you are interested in subscribing to one of the new “machine learning” options—a discussion for another day.
Simply digitizing or adding images to a database doesn’t add text information or clues that are needed to find those images. Only if captions, keywords and other descriptive information have been added as embedded metadata will images matching a search be easily found.
A successful image search depends entirely on the quality of the captions, keywords and other descriptive information that has been added. Misspellings and inconsistencies can translate in the images not being retrieved or, possibly, being lost in your image management database forever.
What Is a Controlled Vocabulary?
One way to improve search efficiency with text-based search databases is to use something called a controlled (or managed) vocabulary (CV). The primary use of a CV is to limit the number of terms used in order to improve communication between those searching for specific content and those describing it. In some photo management applications, a CV may be called a keyword catalog or hierarchy or, simply, structured keywords. The strategy is that these are ways to store organized terms used to add descriptive terms to your images in a consistent manner.
Overview of Key Concepts:
- A “term” in a CV can be a single word or a phrase.
- Each term represents a single concept (for example: cats).
- In a simple hierarchical taxonomy, you typically try to represent each level with a single term.
- Depending on how the information is stored/arranged, you may need to designate a “preferred” term, and if there are others, designate them as “non-preferred.”
- The terms are arranged to go from broad concepts to more narrow ones. (Example: animals >> pets >> cats >> calico cats.)
Controlled Vocabulary Basics
The idea behind a CV is to reduce the number and variety of terms being used to describe what is going on or what appears within the image. For example, by using a CV you may be able to avoid the need to include synonyms (words that have the same meaning) and plural spellings while at the same time removing ambiguous homonyms (words that are pronounced or spelled the same way but have different meanings).
Once you begin using a CV, you should, in principle, only use terms from the list when keywording. If you are a photographer working alone, and a relevant term is missing from your existing CV, you might simply add the new term to the existing list. Within larger organizations, this task would typically fall to the image librarian or indexer who will decide which new terms should be formally accepted and then add them to the CV used by everyone.
Why Use a Controlled Vocabulary?
CVs have a number of benefits:
- Consistency: they provide uniformity in terms of spelling and in assuring that the same set of broader and narrower terms are applied each time.
- Reveal relationships: their use can reveal semantic relationships among terms.
- Retrieval: they serve as a searching aid to gather together all related images.
- Translation: they provide a means for converting the natural language of users into a vocabulary that will provide more accurate retrieval.
What Are Hierarchical Keywords?
In a nutshell, hierarchical (structured) keywords are those that are part of an arrangement of terms. These terms are typically stored or represented in a treelike structure where the broader terms are near the base of the tree, and each “branch” represents a further narrowing of the subject matter.
For instance, “animals >> pets >> cats>>calico cats” goes from a broad subject area to a narrow one. You would expect to find “dogs” at the same level as “cats” and share the same broader level terms, “animals >> pets >> dogs>>Labradors.” Using a CV stored in a hierarchical fashion ensures that these broader terms are included in the images a picture source sends to clients. Doing so makes images easier for a client to find by including the broader context.
Bear in mind that search results will only be as good as the information you have available to search on. While it may not be difficult to add keywords to an image, you are limited by the search mechanism as it has to determine whether a search term or phrase is satisfied by the keywords you enter. The problem with current keywording standards is that there is no syntax available to describe the semantics of a word. The term is not just its actual meaning. For example, a “turkey” is both a type of bird or poultry and the name of a country, but the term is also the words that modify its meaning. For example, if it were a young turkey, then “young” should be added to the list of keywords. However, with present image management systems, all keywords have the same weight, and there is no interaction between terms.
Further, if the photo depicts a scene showing an old man standing behind a meat-market counter selling the young turkey referred to above, the image might also have the keywords, “old” and “man” as well. However, this presents a problem as that image may now be a valid match for the search phrase “young man” in many systems thus creating a false positive result!
In addition, keywords are typically only applied using a single language (often English), and any searcher whose native language is not the default in the given system may have difficulty finding particular images.
Building a Controlled Vocabulary
A list of controlled keyword terms can be fairly simple starting with only a couple of levels, (e.g.— animals: cats) or expanding on that list to four or more levels (e.g.—animals: pets: cats: calico cats). If you are adding keyword terms to images in your own collection, for your own purposes, then a shallower list (fewer branches) will be easier to manage. If you are preparing images for a broader, undefined online audience, then anything you understand about the audience may help you to use terms that will make sense to them. If you are dealing with an academic audience of veterinarians, for example, then you might want to use a term like “canine,” but if the audience is more general, then grouping those images under the label “dogs” probably makes more sense.
If you are dealing with a specialized collection of images, your first step may be to consider “card sorting,” a technique used by web developers who devise navigation menus. Start with a stack of index cards, and put each keyword term or phrase on a single card. Continue by putting all of the terms you can think of to describe the images in your collection or the subjects you typically photograph. After you have a few hundred, start shuffling through the deck and sort them into piles that make sense to you. After an initial sort, pick up one of the piles and consider if there are any further subdivisions. Continue sorting until you can’t think of any other way to divide them up. Then go back and review the piles thinking of a category heading under which you can group them.
At this point you might have some broad categories like animals, people, sports, buildings, travel and more. Begin by typing these into a word processor or spreadsheet file so you can organize them more easily. Alphabetize your list to remove any duplicates. Go through your list to see if there are any terms that mean essentially the same thing and decide if they can be combined. As a simple means to visually organize them, try placing them into an order that looks like this:
In a word processing document you can use the Tab key to move the terms in a sublevel over one or more levels. If you are using a spreadsheet, like Excel, then assign column A to the broadest terms, column B to the subdivisions, etc. Some database programmers use the nomenclature of parent-child relationships, and a 4-level hierarchy might be described as below:
Having this electronic or printed document saved (and backed up) for reference can help ensure that you always apply the same terms to similar images.
Once you progress past a certain level it may be time to use a specialized piece of software for managing your CV, especially if you haven’t decided on the specific image management application you intend to use. There are now nearly a dozen applications that will allow you to import or use these simple tab-separated text files like the one we have created above.
While it is tempting to simply start typing your keywords into the user interface—such as the Structured Keyword panel in Photo Mechanic, or the keyword panel in Adobe Bridge or Lightroom—investigate first to make sure there is a means of exporting to a file format you can later modify. If you cannot get the list back out in similar form, then you may have to do a lot of cutting and pasting; or worse yet, you might end up retyping all of the terms into another document file at some point in the future.
Adapt, Buy or Hire?
In the beginning, you may find it easiest to continue adding to your list of terms. Depending on the number of images, you may get to a point where you feel you need a more extensive collection of terms. If so, go to: http://www.controlledvocabulary.com/examples.html for links to resources you might be able to adapt for free.
Those that are available for purchase may be worth considering too, especially if you don’t want to spend hours researching the relationships of various terms and analyzing semantics. While these off-the-shelf CVs may not meet all of your needs, they may well provide an excellent framework on which to add specialized terms to build your own.
Another option is to hire a librarian or specialist who is well versed in CV (or thesaurus) development to organize your set of terms into a coherent structure. If you are working in a company that has a librarian on staff, you may be able to enlist their help, or have them help you find an outside consultant.
Are We There Yet?
Work on a CV is a work in progress. As language evolves, word relationships change, and there will always be a need to add more terms. As your photography subject matter expands or specializes, you will find there are new ways to describe your images that you hadn’t thought of before. It may be best to develop or adopt a means of systematically updating your CV. At minimum consider maintaining a change log to note the terms you have added and when.
David Riecks is professional photographer who has traveled all over the world to photograph unique people and places. Riecks is a frequent speaker at industry conferences. He is a member of the IPTC Photo Metadata Working Group and was one of the principals behind the “Universal Digital Imaging Guidelines.” He founded “Controlled Vocabulary,” a resource to help others learn how best to build controlled vocabulary lists, thesauri, and keyword hierarchies for describing images in databases. Visit his websites at www.controlledvocabulary.com and www.riecks.com. His email: email@example.com