by

Suggestions for improvement are also included, and further

reader participation is welcomed.

(1) Esko Ukkonen points out that reference [U92a] would better have been

(2) Kangmin Fan finds two typos in Chapter 12:

And on p. 350, conjecture (3) should say "string *x[1..n]*" so that a length is specified;
while at the bottom of p. 355, the factor ** w_3** should be

(3) On page 91, in line 2 of the second paragraph under "Freedom from Backtracking",
the range should be *i-k+1..i*.

On p. 98, instead of ">" in equation (4.8), there should be "<".

On p. 100, Exercise 4.2.8, a stronger result actually holds: the factor "alpha" can be removed from both the time and space complexity expressions.

(4) In 2003 four papers have been published [KS03,KSPP03,KA03,SKPP03] that collectively
seem to establish the superiority of the suffix array over the suffix tree as a
data structure. Thus, if I were writing Chapter 5 today instead of in 2000/2001,
I believe I would take a completely different approach: presenting suffix arrays
as the main data structure, constructible and searchable in linear time, with
basically historical mention of suffix tree algorithms. It is striking that all
four of the papers mentioned above were inspired by Farach's linear time suffix
tree construction algorithm [F97], but while Farach's algorithm is very complicated,
at least three of the new suffix array algorithms turn out to be relatively simple --
in fact, the main algorithm in [KS03] takes scarcely more than a page to explain.
Of the four new algorithms, no fewer than three [KS03,KA03,KSPP03] are linear time
suffix array *construction* algorithms, while the fourth [SKPP03] permits *search* of
a suffix array in time linear in pattern length.

Also of great interest in this context is [AKO04], in which it is shown that "*every*
algorithm that uses a suffix tree as data structure can systematically be replaced
with an algorithm that uses an enhanced suffix array and solves the same problem in
the same time complexity".

(5) There is a *caveat* to this apparent advantage of suffix arrays, however. Experimental
evidence has been presented [ARSTY04] that the new linear-time algorithms provide
little or no advantage over algorithms such as Sadakane's [Sa98] that have supralinear
worst-case time but are nevertheless linear in cases of practical interest. On the
other hand, it may be that careful implementation of the new algorithms [LP04] can
make enough difference to be worthwhile. A recent study [PST05a] indicates however
that the recursion required by the linear-time algorithms may be the determining and
unavoidable factor in keeping them slower in practice than the direct algorithms based
on the Burrows-Wheeler transformation [BW94]. An even more recent survey [PST06] provides
an overview of all suffix array construction algorithms known as of June 2006, including
asymptotic time requirements, actual space requirements (in bytes), and comparisons of
time and space usage on strings that typically arise in practice.
As discussed further in [ARSTY04], it also seems that the use of suffix arrays (and of
course also suffix trees) for pattern-matching is not efficient except in cases where
many patterns need to be matched: the methods developed in [SKPP03] require preprocessing
that is much less efficient than the bit-map processing of **agrep**,

(6) In the displayed text on page 396, of course "approxi-mate" should be "approximate".

(7) On page 264, the summary of distance/LCS algorithms could have included mention of the "obverse" to the LCS problem introduced in

(8) Thierry Lecroq notes that on page 126, in the second paragraph of the proof of Theorem 5.2.6, "represenation" should be "representation".

(9) At the time of publication of the book, I was unaware of a paper [AKO02] that described an efficient and simple algorithm for computing all the repeats in a given string in time linear in the string length. Unlike the method described in Subsection 13.2.2, it requires only the construction of a single enhanced suffix array and thus constitutes a major improvement.

(10) Gerth Stolting Brodal points out that the binary sort, over which so much fuss was made on page 150, in fact does not work! I was trying to make some use of the property that no two elements in the array are distinct, but got it wrong. Jon Bentley [B88] would rightly be ashamed of me. I believe the easiest way to fix Algorithms 5.3.2 & 5.3.3 is to make the following changes to both of them: * in the first line, L := 1; R := n to L := 0; R := n + 1 * in the final line, L = M to R - L = 1 My apologies to all right-thinking computer scientists.

(11) On page 351 I have three times (in lines 5, 12 & 14) used u_R when I meant to use r_R.
Ten pages later, in the table at the middle of page 361, the strings *x, beta, gamma*
should be boldface: ** x, beta, gamma**.

(12) Interesting extensions of the approximate periods algorithms discussed in Section 13.4 have been made available for inspection and use at

(13) Exercise 1.2.11 has been solved! Shu Wang located the following interesting paper:

(14) We now turn to a collection of errors and recommendations for improvement kindly contributed by Thomas Mailund & Christian Pedersen of BRICS, an institute of the Universities of Aarhus & Aarlborg, Denmark. From October 2004 Mailund & Pedersen are giving a course "String Algorithms" based on my book. They have combed carefully through Chapters 5-9 and found many interesting things. I deal with their comments below, chapter-by-chapter (occasionally adding in one or two comments by others as they occur in page order):

(14.1) **Chapter 5**

(14.2) **Chapter 6**

(14.3) **Chapter 7**

(14.4) **Chapter 8**

(14.5) **Chapter 9**