Why Should Humans Index Books?

Computers are so much faster than people at processing information. They can list instances of terms that appear in a book in a flash, and have the capability to perform full-text searches on electronic documents. So why on earth would anyone pay a human to index a book?

Language is very complex, so indexes prepared by computer or text searches are seriously lacking when compared to those written by a professional human indexer. The rest of this article points out some of the differences you’ll see between automatic indexes or searches and human-prepared ones.

Automatic indexes can’t distinguish between homographs (words that are spelled the same way but have different meanings). An example given by the Society of Indexers is searching on the Internet for the pop stars Madonna and Prince. You’d find millions of unwanted references to religious art and royalty.

Also, text searches will not handle synonyms (words that have the same meaning) properly. For instance, a recent index we worked on used the words “bidding” and “pricing” interchangeably. An automatic search would have listings for each with only the pages where that specific word was used. We humans were able to see that these words were used in discussions about determining pricing for clients, so we listed them all under “pricing” but also made an entry for “bidding” with a “See pricing” reference so that someone looking up “bidding” could find all of the relevant information.

Computers also, unlike humans, can’t pick up concepts contained in the writing where your specific search term isn’t used. So you might look up “dog grooming” but miss a discussion called “Clipping Your Pet’s Nails.” They aren’t able to read the content of graphics, either, so you’d miss pictures that demonstrate brushing a dog’s hair but don’t have relevant captions.

Finally, the main problem with a search box or computer-generated index: they can’t tell whether a hit for a certain term gives you relevant information or just a passing mention. So they really give you concordances, lists of pages where a word appears in the text. There’s a big difference between a concordance and an index. Most of the terms we search for are not rare; they’re used very frequently. So they’d appear lots and lots of times in a concordance. To illustrate this, you might be interested in learning more about using Adobe Photoshop to edit images. So let’s say you look up “Photoshop” in a book on photography. You might find 100 places where that word was used, but many would lead you to sentences where the author says something like, “I like using Adobe Photoshop.” That’s all well and good, but is it really useful? Another example is in this article – in the third paragraph, I mention everyone’s favorite 80’s pop star, Madonna. But that paragraph isn’t really relevant to someone who wants to know more information about her, say, her first Top 40 single or her favorite brand of corset.

The point of an index is to help readers find the word they’re looking for and be taken to a useful discussion of it. So while a computer or search box can find that specific word really well, a human indexer can think from the reader’s point of view – what concept will they be searching for and what words would they use to find it? Writers put so much thought into carefully wording their books because they want to communicate the knowledge and experience they’ve worked so hard to obtain. The indexes for those books should be just as thoughtfully prepared so readers can get to that important information.

-by Danielle Easler, Indexer at BIM

photo credit: mrsdkrebs

Authors, get tips on writing your own index

How We Created an Index for Over 10,000 Pages of Content

At BIM we are used to indexing lots of pages every month. In fact, we usual index over 100,000 pages of content each year. However, when we received a phone call asking us to create an index for a single project that was over 10,000 pages we knew that we were in for quite a ride!

You might wonder how an indexing team goes about writing an index for so much information. The following is a brief summary of the basic process that we used.

Working off an Index Shell

Usually when we are asked to write an index, we read through the information and decide which concepts to include in the index and how to word the index entries. We also decide whether some index entries should contain subentries and if so what they should be.

However, for this project we were given an index “shell” which contained most of the terms and concepts that we were to find within the text. Although it might seem that this would make it easier to index the information, it actually made the project more challenging. Instead of reading the paragraphs of information and forming phrases that communicate the idea expressed by the information, we had to work in reverse; reading the existing index entries and figuring out which information they applied to.

Training the Team

Because of how different this project was from conventional indexing, our indexing team had to be trained on how to index in this unusual way. To do this, we gathered some of the indexers together at Mojo Coworking in downtown Asheville, utilizing their conference room and giant monitor to instruct the indexers. Besides providing guidance on the indexing, we did some of the work together so that we could pose and answer questions as they came up. These initial meetings proved invaluable as they helped the team to think as one.

Ongoing Guidance

Despite our initial meetings, our team of seven indexers needed constant guidance throughout the project, being that we kept running across new terms and phrases that we hadn’t encountered in previous pages of content. Also, many times indexers would suggest additions to the terms in the index as well as indexing certain concepts under existing index entries. On top of that, the client would add new terms in the middle of the project!

To provide ongoing guidance and to ensure consistency in term usage, one of our staff, Christa, was assigned the job of Project Coordinator. Christa wrote up a total of eight pages worth of indexing guidelines, which she amended quite frequently.

Managing the Files

It goes without saying that managing over 10,000 pages of text is no easy task. What made file management even more challenging was the fact that the files given to us were of high resolution and there were only one to two pages per file! That meant somewhere around 8000 heavy downloads. After downloading them to a central location, we then had to make the files available to our indexers, most of whom worked out of their homes.

We didn’t want our indexers to download so many individual files nor have to work on one or two pages and then open more files so frequently in order to continue work. To avoid all of that, we decided to compile files into batches of 25 to 100 pages each.

We then opened up an account with Box.com. Box account owners can install a folder, called My Box Files, on their desktops and choose to sync the folder with a corresponding folder on the Box site.

Our File Manager, Ruti, was in charge of downloading files and compiling them into batches. She created Box folders labeled with each of our indexers’ names and installed them onto her desktop. After creating the batches, she dragged and dropped them into the indexers’ Box folders, all the while tracking which files were assigned to each indexer.

Once the indexers synched their Box folders, they didn’t have to actually go to the site to download the files. They simply appeared in the indexers’ folders on their individual computers. Sometimes they would be busy working on some files and by the time they were finished there were more files in their folders. Other times, if Ruti put the files in their folders at night, they would wake up in the morning and find their files on their computer, waiting for them. This process greatly streamlined the workflow.

Merging the Indexes

Once all of the indexers were finished with their pages, the individual indexes had to be merged. Using a special importation tool, we converted our Word index documents into a format that could be imported into Sky indexing software. The indexing software took care of sorting the various reference locators in numerical order.

After checking for errors and fixing them, the index was ready for delivery. The entire project was completed in just under four months, while we simultaneously handled indexing for our regular clients. Our hard-working indexers and well-planned strategy made it all possible.

Photo credit: Horia Varlan

To embed or not embed an index in Word, InDesign or Quark Xpress

You want to get the index to the book that you are working on done super-fast, as in a few days. You also want to be able to reuse the index if you later publish another edition of the book. And since you plan on producing both a print and an eBook version of the book, you want the index to work in both.

If you are trying to achieve any or all of these objectives, than by now you have probably already heard about embedding indexing. Here’s the skinny on what it is and a few words on the advantages and disadvantages that you’ll want to consider.

What is embedded indexing?

Embedded indexing is the process of inserting hidden tags, which contain index entries, into the text of a book. This can be done in most publishing software such as InDesign, Quark Xpress, FrameMaker and even in Microsoft Word. The tags can be viewed by clicking on an option within the application, but obviously would not be viewable in the final output files, such as PDFs.

Advantages to embedded indexing

There are several advantages of embedding index entries into the text.

One benefit is it allows indexing to start before the book is even finished being written. You can send the chapters to the indexer one at a time and she can embed the index entries and send the chapters back to you. Even if you edit the chapter, deleting some sentences and moving some paragraphs around, the index tags get deleted with the text that you deleted and move with the text that you moved. Also, since the index entries are embedded in the text, it doesn’t matter that you haven’t formatted the book yet, or inserted all of the images, etc. When  the book has been copyedited, proofread and is ready to go, you generate the index and the page numbers after each index entry will reference whichever page where the corresponding hidden tag is located. The indexer will need to do an edit of the index, but that should only take a few days in comparison to the few weeks she would need to start indexing toward the end of the production cycle.

The other benefit is the point we mentioned at the beginning of this- you can reuse the index if you wish to also produce the book as an eBook or in some other format. Since ePUB books don’t have page numbers per se, the index would also not have page numbers, but it would link directly to the text, taking the reader to the exact paragraph referenced by the index entry.

Disadvantages of embedding indexing

Despite the benefits, there is a downside to embedding indexes in publishing software. One of them is that it is harder for the indexer to create a good-quality index. Since she is receiving the chapters one (or a few) at a time (and turning them in that way too), she’s sort of indexing with tunnel vision. She can view the index for each chapter that she is working on by generating it for that specific file, but she cannot see the index entries to chapters that she has already indexed to see how each entry should be adjusted to blend with the others. If she were working off of final PDFs, she could compare an index entry that she is writing to one that she created for a previous chapter, evaluate whether they relate to the same thing and then edit the old or the new entry so that they don’t conflict.

As a simple example, perhaps chapter 1 discusses “cooks” while chapter 7 uses the term “chefs”. Are they the same? Perhaps they are used to refer to the same profession in this book or maybe there is a distinction. If they are the same, the reader should not find two separate entries (one for “cooks” and the other for “chefs”) with completely different indented subentries under each of them. Instead, the most commonly used term (we’ll say its “chefs”) should have all of the corresponding subentries and the other term (“cooks”) should have a cross-reference reading “See chefs.” If they are different, than both entries should have their own, distinct subentries. However, each entry should have a “See also” reference pointing to the other entry, since they are so closely related.

You can see how hard it would be to determine how to treat related entries in an embedded index. Obviously, there are much more complicated relationships and many synonymous or seemingly-synonymous terms throughout the text of most books, especially in highly technical works.

Editing the index, which needs to be done when the book is almost finished, is also much more difficult when the index has been embedded. Instead being able to index as she goes along, comparing one entry with another, the indexer must clean up what by now has surely gotten rather messy. The best quality indexes are usually produced when the indexer can make many small adjustments during the indexing process, not by doing major edits all at once.

That being said, there are methods of saving index entries from previous chapters and then comparing them with new index entries upon creating them. Such an indexing system can help the indexer to avoid scattering information within the index and thus create a neater, more-organized index that needs less editing. But not all indexers know how to do this. (More on that in another post).

Making the decision

So what did you decide? To embed or not to embed? If you are still undecided or have more questions, click on the button below and I’ll gladly answer your questions (at no charge) about embedded indexing.