Search vs Index

What's special about an index?

First, let's agree on what an index is not. It isn't a list of words and the places where they're used. That's a concordance. A computer can make one much faster so you'd be mad to pay an indexer to do it.

An index is a structured list of concepts, not words, linked to the position of every significant discussion. It has to be constructed by the indexer on the basis of their understanding and subject knowledge: it can't simply be extracted from the text.

The indexer works by reading the text, analysing the subjects, imagining how the intended reader is likely to think of them (in many cases, this involves empathizing with a problem-oriented approach), providing alternative access routes and links to related subjects and applying vocabulary control to draw together everything on each subject. Compared with a word-list, an index has subentries (to save the reader time) and cross-references (to ensure they miss nothing) but, less obviously, it's selective (omitting mere passing mentions but highlighting significant discussion) and interpretative (catering for the reader's predictable queries, rather than relying on the terms the author selects at each point).

About word search

Searching for words works fine with Google. If you just need something on the topic but not everything: to find a cheap flight to Rome, say, or the lyrics of a song or the meaning of 'albedo' or the symptoms of Lyme's disease. It fails with a book because it overwhelms the reader with 'noise' yet still doesn't find everything important and its performance actually falls off, the closer you get to the main subject! There are examples at the end of this section that should convince you.

If your book is intended to be understood by people, it needs to be indexed by people. And it has to be, because computers can't read so they can't index. They can only look for words and phrases, which isn't how authors convey or how readers apprehend meaning.

Three examples will show why simply searching for terms fails your readers:

  1. Finding the real subject

    With more and more non-fiction books being multi-author compilations, only a competent human indexer can link, say, 'farm waste pollution' in chapter 3 with 'agricultural runoff problems' in Chapter 14. They link the two treatments, because they are reading, not searching. Any keyword search must fail to retrieve either one or the other, because none of the words are the same!

    That shouldn't surprise anyone; we all know we don't recognise that Hamlet's soliloquy is about suicide by looking for the word suicide; it isn't even there. Similarly, we can understand when something is called 'an egregious folly', without knowing exactly what the word 'egregious' means. Reading doesn't depend on word occurrences: it depends on following whole arguments, propositions and explanations.

  2. Subdividing coverage

    A popular mathematics book recently contained 138 references to Isaac Newton or his work on over 50 pages (some 27% of the whole book). What use is a list of over 50 page numbers to the reader who looks up 'Newton', hoping to find the story of the apple, the differences between Newtonian dynamics and General Relativity or the details of Newton's feud with Robert Hooke? The index allowed access to each of these with no more than three location references beside each subentry.

    This is why we say any word search will perform worse the closer you get to the main subject. Important topics are likely to be mentioned more often, so they need to be analysed more carefully into sub-topics, not presented as a long, unwieldy list of page numbers or links.

  3. Vocabulary control

    In fact, keyword retrieval pulls off the difficult trick of retrieving too much while missing what might be crucial.

    Synonyms and hierarchies aren't occasional oddities; they're omnipresent. Ask yourself whether, looking for Society of Indexers members, you should start with 'UK' or 'British', whether you'd find ASI members described as 'American', 'North American' or 'US'. All good writers use figures of speech, elliptical and allusive references to maintain interest. Several co-operating authors will rarely agree their terminology first.

    Looking for reference to 'lions', you might miss the key anecdote where one is called a 'maneater', but you still get all the worthless occurrences like 'except lions' or 'as we saw when discussing lions' as well as the false drops: 'the defenders fought like lions'.

    If you're not convinced, just try reading any book and evaluating how many passages would be missed by searching for a term the reader might use but the author hasn't. Or search the PDF for the number of times a key word appears, and ask yourself if you'd be prepared to follow every location reference until you found what you wanted.

Indexes are the Rolls-Royce solution for your readers, and they can work with any delivery format, because the intellectual operation of indexing is independent of markup or location indicators. All that said, the best of all worlds will often be a combination of searching the text and searching the index, the kind of enriched usability choice already beginning to emerge in a few cases; the future should deliver a fusion of several retrieval approaches, some of them impossible in a print medium.

