No, not embedded indexing

CUP-XML, Elsevier, OUP et al

(An edited version of Maureen MacGlashan's presentation at the Society of Indexers 2011 Keele Conference)

Introduction

True embedded indexing (i.e. indexing in which the index terms are embedded in the text, and the index itself generated from the text at the end of the process) is discussed elsewhere. What follows looks at the alternative standalone, linked (or 'tagged') index approach now being adopted by a number of publishers including CUP, Elsevier and OUP with a number of smaller publishers following suit. The indexer supplies a separate index file, conventional except that page numbers are replaced by location codes. What follows is a quick introduction to help keep minds clear on exactly what is going on if a publisher asks us to prepare an index with 'tags' or 'codes' rather than page numbers.

Different publishers use different terminology for these 'tags'. For example, Elsevier currently call them 'IDs', CUP 'unique numbers'. The underlying system is the same though there are differences of approach. The indexer will normally receive a PDF (though with CUP there is also a Word option.) In essence, the working file is rather like the old galley proof (or today's web page) in that any page numbers it carries are irrelevant for indexing purposes since they will not survive finalization of the text in the chosen medium or media. Indeed, for an e-publication they could vanish altogether, and could vary between different print versions even where the text remains the same.

In the Elsevier and OUP systems, the link or 'tag' is supplied, as a marginal entry on the PDF, and this is what the indexer must use. The indexer is not required to annotate the PDF file.

Elsevier/OUP

Elsevier IDs attach to:

chapters in the form c00005
paragraphs - p0010, p0020, p0030 etc
sections (or 'subchapters' in Elsevier parlance) - s0010, s0020 etc
figures - f0010 etc
tables - t0010 etc
boxes - b0010 etc
order list items - o0010 etc
unordered list items - u0010 etc
definition list items - d0010 etc

OUP follow much the same system as Elsevier with the same rules about ranges, but a rather simpler approach to IDs which always start with the chapter number (say 'C3'), followed by section, 'C3.52', followed by the sub-section - 'C3.52.1'.

With the approach adopted by Elsevier and OUP the range is integral to the ID. If you choose, for example, a paragraph ID, that is sufficient to cover the whole paragraph. Ditto, if you want to include a whole chapter as your 'locator'.

The CUP system

CUP have taken a different route. At an early stage in the process, the author's ms is sent to the setters for 'normalization' (the creation of the initial XML file). This is converted for the benefit of proof-readers, copy-editors, authors and indexers into either a PDF or a Word file (the Word file alternative being introduced some years ago to reflect the fact that this was an easier medium for authors, who are responsible for 80% of CUP indexes). The choice is ours, some of us preferring to work on a PDF, others on the Word version (as it emerges from the setters) into which they can insert the required IDs 'manually' or using a programme such as WordEmbed or DEXembed.

In the CUP system, the indexer is given total freedom of choice as to what unique number they choose provided it is unique. This is placed at the correct point (or points if it's a range) in the PDF or Word file supplied to the indexer, and used as the locator in the index. CUP still accepts hard-copy mark up, and also accept the use of paragraph or section numbers as unique numbers. If the paragraph option is available, it means that the indexing can begin as soon as the ms enters the production process rather waiting for the arrival of the 'normalized' file giving the indexer a real headstart. (Using paragraph or section numbers has the advantage, of course, that these will remain constant whatever medium the publication appears in.)

For more information on the CUP system see:

Indexing, Cambridge University Press
James Lamb's CUP-XML Unique numbers and MS Word.

Other publishers provide the necessary information at time of commissioning.

Caveat

To repeat the point ad nauseam, these systems are not 'embedded indexing' since the index stands in its own right and the XML codes are inserted by the setters. Nor are any of them 'XML indexing', not even the CUP system commonly known as 'CUP-XML'. The indexer need know nothing whatsoever about XML to engage in this sort of indexing.

Author: Maureen MacGlashan
Date: March 2013