In * Proc. 22nd Annual ACM-SIAM Symposium on Discrete Algorithms*, pages 390-400, 2011.

We study the following problem: Given an array *A* storing *N* real
numbers, preprocess it to allow fast reporting of the *K* smallest
elements in the subarray *A*[*i*, *j*] in sorted order, for any triple
(*i*, *j*, *K*) with 1 ≤ *i* ≤ *j* ≤ *N* and 1 ≤ *K* ≤ *j* - *i* + 1.
We are interested in scenarios where the array *A* is large,
necessitating an I/O-efficient solution.

For a parameter *f* with 1 ≤ *f* ≤ log_{m} *n*, we construct a data
structure that uses *O*((*N*/*f*) log_{m} *n*) space and achieves a query
bound of *O*(log_{B} *N* + *f* *K*/*B*) I/Os, where *B* is the block size, *M*
is the size of the main memory, *n* := *N*/*B*, and *m* := *M*/*B*. Our main
contribution is to show that this solution is nearly optimal. To be
precise, we show that achieving a query bound of *O*(log^α *n* +
*f**K*/*B*) I/Os, for any constant α, requires Ω(*N*
(*f*^{-1}log_{M} *n*)(log(*f*^{-1} log_{M} *n*))) space, assuming *B* =
Ω(log *N*). For *M* ≥ *B*^{1+ε}, this is within a
loglog_{m} *n* factor of the upper bound. The lower bound assumes
indivisibility of records and holds even if we assume *K* is always
set to *j*-1+1.

We also show that it is the requirement that the *K* smallest
elements be reported in sorted order which makes the problem hard.
If the *K* smallest elements in the query range can be reported in
any order, then we can obtain a linear-size data structure with a
query bound of *O*(log_{B} *N* + *K*/*B*) I/Os.

Copyright © 2011 by the Association for Computer Machinery, Inc., and the Society for Industrial and Applied Mathematics.

**Online version**

soda11.pdf (241 Kb)

**DOI**

www.siam.org/proceedings/soda/2011/SODA11_031_afshanip.pdf

@inproceedings{soda11, author = "Peyman Afshani and Gerth St{\o}lting Brodal and Norbert Zeh", booktitle = "Proc. 22nd Annual ACM-SIAM Symposium on Discrete Algorithms", doi = "www.siam.org/proceedings/soda/2011/SODA11_031_afshanip.pdf", pages = "390-400", title = "Ordered and Unordered Top-K Range Reporting in Large Data Sets", year = "2011" }