UNIVERSITY OF AARHUS
DEPARTMENT OF COMPUTER SCIENCE

External Memory Algorithms and Data Structures (EMADS), Fall 2003

Lecturers · Time and place · Course description · Schedule · Literature · Projects · Evaluation · Prerequisites · Course language · Credits
External Memory Algorithms and Data Structures, Fall 2001, Fall 2000, Fall 1999

Lecturers

Gerth Stølting Brodal and Rolf Fagerberg.

Time and place

Monday 13.15-15.00 and Wednesday 11.15-12.00 (Weeks: 35 - 41 and 44 - 50) in Auditorium D4.

Course description

Computer systems usually have an entire hierarchy of memory levels, with each level having its own performance characteristics. Typical memory levels are: CPU registers, several levels of memory cache, main memory (RAM), and secondary memory (disk). Traditionally, the design of algorithms does not take this memory hierarchy into account, but assumes a model with a single level of main memory.

In an increasing number of problems, the amount of data to be processed is far too massive to fit into internal memory. Examples include data collections in astronomy, geology, meteorology, and finance, as well as web search engines, VLSI verification, computer graphics, and geographic information systems (GIS). In such applications, the amount of data may be measured in terabytes.

When dealing with data sets of sizes exceeding main memory, communication between the fast internal memory and the slow external memory is often the performance bottleneck, and the analysis of algorithms under the assumption of a single level of memory may be meaningless.

Instead, a more realistic measure of the efficiency of an algorithm is the number of I/O-operations performed between internal memory and disk. Algorithms designed to minimize this number are termed external memory algorithms.

In this course, we will study the design and analysis of efficient external memory algorithms and data structures. Different paradigms for efficiently solving problems in external memory will be presented, and a number of specific algorithms from areas like sorting and searching, computational geometry, strings, and graphs will be covered.

Schedule

WeekDateTopicsReading
3525/8 Introduction (slides: ps.gz).
I/O model.
Project 1 - Identifying Memory Hierarchies (slides: ps.gz | pdf).
[A97, Sect. 1-3.2]
27/8Sorting and Permuting (slides: ps.gz | pdf). [KL92]
361/9Lower bounds for sorting and permuting (slides: ps.gz | pdf).
Lower bounds by I/O-comparison trees (slides: ps.gz | pdf).
Optimal merging (slides: ps.gz | pdf).
[KL92, Sect. 1-3.2 (excl. p.8-9), Sect. 4.1-4.2]
3/9B-trees (slides: ps.gz | pdf). [BF00]
378/9 Amortized analysis of B-trees.
Trade-offs for external memory dictionaries (not in curriculum; slides: ps.gz | pdf).
Introduction to buffer trees (slides: ps.gz | pdf).
[BF03],[A03]
10/9Discussion of project 1. 
3815/9Buffer trees: basic structure, used as priority queues, and used as range trees. [A03]
17/9Buffer trees used as range trees and as segment trees.[A03]
3922/9External Memory Interval Trees.
Deadline for project 1.
[AV96]
24/9Weight Balanced B-trees.[AV96]
4029/9External Memory 3-Sided Orthogonal Range Searching.[ASV99]
1/10External Memory 4-Sided Orthogonal Range Searching.
Project 2 - Sorting.
[ASV99]
416/10 R-trees (slides: ps.gz | pdf).
Connected Components.
[AHVV02],[A83, Sect. 2.3],[ABW02]
8/10Connected Components (slides: ps.gz | pdf).
Minimum Spanning Forest.
[ABW02]
42-43Fall break
4427/10More algorithms for MST.[ABT00, Sections 1-2]
29/10DFS[BGVW00]
453/11BFS[MR99, Section 5.1],[MM02]
5/11Single Source Shortest Path (SSSP).
Deadline for project 2.
[KS96]
4610/11The String B-tree[FG99, Sect. 1, 2.1]
12/11String Sorting[AFGV97]
4717/11Project 3 - A GIS system
Suffix Array Construction
[KS03]
19/11Suffix Array Construction[KS03]
4824/11Cache oblivious algorithms: model, matrix multiplication, search trees (slides: ps.gz | pdf).
[FLPR99, Sect. 1-2],[BFJ02]
26/11Funnel-Sort[FLPR99, Sect. 4][BF02]
491/12Cache oblivious distribution sweeping[BF02]
3/12Cancelled ("Åben dag") 
508/12Streaming: counting distinct elements, point queries, range queries[CM04],[FM85]
10/12Beer & softdrinks 

Literature

  1. [A97] External-Memory Algorithms with Applications in Geographic Information Systems, Lars Arge. In Algorithmic Foundations of Geographic Information Systems, M. van Kreveld, J. Nievergelt, T. Roos and P. Widmayer (Eds.), LNCS Tutorial, Lecture Notes in Computer Science Vol. 1340, Springer-Verlag, 1997. (ps)
  2. [KL92] I/O-complexity of comparison and permutation problems. Mikael Knuden and Kirsten Larsen. Master Thesis, DAIMI, November 1992.
  3. [BF00] Amortized Analysis of (a,b)-Trees, Gerth Stølting Brodal and Rolf Fagerberg, Note, 2000. (ps.gz | pdf)
  4. [A03] The Buffer Tree: A Technique for Designing Batched External Data Structures, Lars Arge. Algorithmica, 37(1):1-24, 2003. (pdf)
  5. [AV96] Optimal Dynamic Interval Management in External Memory, Lars Arge and Jeffrey Scott Vitter. In Proceedings of the 37th Annual Symposium on Foundations of Computer Science, 1996, 560-569.
  6. [ASV99] On-Two Dimensional Indexability and Optimal Range Search Indexing, Lars Arge, Vasilis Samoladas, and Jeffrey Scott Vitter. In Proceedings of the Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 1999
  7. [AHVV02] Efficient Bulk Operations on Dynamic R-trees, Lars Arge, Klaus Hinrichs, Jan Vahrenhold and Jeff Vitter. Algorithmica, 33(1):104-128, 2002.
  8. [ABW02] A Functional Approach to External Graph Algorithms, J. Abello, A. L. Buchsbaum, and J. R. Westbrook. Algorithmica 32(3):437-458, 2002.
  9. [ABT00] On External Memory MST, SSSP and Multi-way Planar Graph Separation, L. Arge, G.S. Brodal, and L. Toma. In proceedings of 7th Scandinavian Workshop on Algorithm Theory (SWAT), 2000, LNCS 1851. Submitted to Journal of Algorithms.
  10. [MR99] I/O-Complexity of Graph Algorithms, Kameshwar Munagala and Abhiram Ranade. In Proceedings of the Tenth Annual ACM-SIAM Symposium on Discrete Algorithms, 687-694, 1999.
  11. [MM02] External-Memory Breadth-First Search with Sublinear I/O, Kurt Mehlhorn, Ulrich Meyer. Proc. 10th Ann. European Symposium on Algorithms (ESA), LNCS 2461, pp. 723-735, 2002.
  12. [BGVW00] On External Memory Graph Traversal, Adam L. Buchsbaum, Michael Goldwasser, Suresh Venkatasubramanian, and Jeffery Westbrook. Proceedings of the Eleventh Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 859-860, 2000.
  13. [KS96] Improved Algorithms and Data Structures for Solving Graph Problems in External Memory, Vijay Kumar, Eric J. Schwabe. In Proc. 8th IEEE Symposium on Parallel and Distributed Processing (SPDP), pp. 169-177, 1996.
  14. [FG99] The String B-tree: A New Data Structure for String Search in External Memory and its Applications, Paolo Ferragina, and Roberto Grossi. Journal of the ACM, 46(2), 1999, 236-280.
  15. [AFGV97] On Sorting Strings in External Memory, Lars Arge, Paolo Ferragina, Roberto Grossi, and Jeff Vitter. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing, 1997, 540-548.
  16. [KS03] Simple Linear Work Suffix Array Construction, J. Kärkkäinen and P. Sanders. In 30th International Colloquium on Automata, Languages and Programming, number 2719 in LNCS, pages 943-955. Springer ©, 2003.
  17. [FLPR99] Cache-Oblivious Algorithms, Matteo Frigo, Charles E. Leiserson, Harald Prokop, and Sridhar Ramachandran. In Proc. 40th Annual Symposium on Foundations of Computer Science, FOCS '99, October 17-18, 1999, New York, NY, USA.
  18. [BFJ02] Cache-Oblivious Search Trees via Trees of Small Height, Gerth Stølting Brodal, Rolf Fagerberg, and Riko Jacob. In Proc. 13th Annual ACM-SIAM Symposium on Discrete Algorithms, pages 39-48, 2002.
  19. [BF02] Cache Oblivious Distribution Sweeping, Gerth Stølting Brodal and Rolf Fagerberg. In Proc. 29th International Colloquium on Automata, Languages, and Programming, volume 2380 of Lecture Notes in Computer Science, pages 426-438. Springer Verlag, Berlin, 2002.
  20. [FM85] Probabilistic Counting Algorithms for Data Base Applications, Philippe Flajolet and G. Nigel Martin. Journal of Computer and Sytem Sciences, 31, 182-209, 1985.
  21. [CM04] An improved data stream summary: The Count-Min sketch and its applications, G. Cormode and S. Muthukrishnan. Proceedings of the 6th Latin American Theoretical Informatics (LATIN), 2004. Also DIMACS TR 2003-20.

Additional literature

Links to additional resources

Projects

Newsgroup for the course:
daimi.emads
Evaluation
Implementation projects.
Prerequisites
dADS and dSøgOpt.
Course language
Danish or English.
Credits
10 ETCS.

Page maintained by
Gerth Stølting Brodal <gerth@cs.au.dk>.