images/contents.gifimages/index.gif

Compound Words

Additional processing is performed on compound words (words containing hyphens or apostrophes) to allow each portion of the compound to be examined as a candidate for keyword indexing. (Only the first four parts of a compound word will be considered; any additional parts will not be considered for indexing in whole or in part.) Compound words are processed for potential indexing according to the following steps:

1. The punctuation is removed, and the entire compound is considered as a candidate keyword according to the validation procedures described in the previous section. Example: by-product would be examined as byproduct.

2. Then, each individual part of the compound is considered separately as a candidate keyword: Hence, by and product.

3. If there is more than one punctuation character (as in seven-year-itch), then there will be three or more parts to deal with; these parts will be considered in various groupings by first combining each sequential pair, and then each sequential triplet, and so on.

To illustrate this process, seven-year-itch would yield, in order of examination, the following indexable keywords:

sevenyearitch [from step 1]
seven, year, and itch [from step 2]
sevenyear and yearitch [from step 3]