# Solving the String Statistics Problem in Time *O*(*n*log *n*)

Technical Report, ALCOMFT-TR-02-55, ALCOM-FT, 12 pages, May 2002.

## Abstract

The string statistics problem consists of preprocessing a string of
length *n* such that given a query pattern of length *m*, the
maximum number of non-overlapping occurrences of the query pattern
in the string can be reported efficiently. Apostolico and Preparata
introduced the minimal augmented suffix tree (MAST) as a data
structure for the string statistics problem, and showed how to
construct the MAST in time *O*(*n*log^{2} *n*) and how it supports
queries in time *O*(*m*) for constant sized alphabets. A subsequent
theorem by Fraenkel and Simpson stating that a string has at most a
linear number of distinct squares implies that the MAST requires
space *O*(*n*). In this paper we improve the construction time for
the MAST to *O*(*n*log *n*) by extending the algorithm of Apostolico
and Preparata to exploit properties of efficient joining and
splitting of search trees together with a refined analysis.

**Online version**
alcomft-tr-02-55.pdf (173 Kb)

**BIBT**_{E}X entry

@techreport{alcomft-tr-02-55,
author = "Gerth St{\o}lting Brodal and Rune Bang Lyngs{\o} and Anna \"Ostlin and Christian N{\o}rgaard Storm Pedersen",
institution = "ALCOM-FT",
month = "May",
number = "ALCOMFT-TR-02-55",
pages = "12",
title = "Solving the String Statistics Problem in Time $O(n\log n)$",
year = "2002"
}