Tutorials on Algorithms for Massive Data Sets
ICALP 2003 Satellite Event
Eindhoven, The Netherlands, June 29, 2003
* * * * The tutorials have been CANCELLED * * * *
Scope and Topics
Data sets in large applications are often too massive to fit
completely inside the computer's internal memory. The resulting data
transfer (or I/O) between fast internal memory and slower external
memory (such as disks) can be a major performance bottleneck. During
the last decade a major body of research has been devoted to the
development of efficient external memory algorithms, where the goal is
to exploit locality in order to reduce the I/O costs.
As a pre-workshop for ICALP 2003, S. Muthu Muthukrishnan and Erik
D. Demaine will give tutorials on recent research developments
within the following two areas deadling with massive data sets.
Algorithms for data streams
S. Muthu Muthukrishnan,
Rutgers University and AT&T Shannon Labs
Telecommunication networks generate massive streams of traffic
logs. Monitoring, auditing and mining such streams is an integral part
of network management. Research over the past few years in this area
has lead to certain key insights. It is difficult to find a needle in
the haystack within the constraints faced in the network
application. Fortunately, several interesting network traffic
phenomena have the ``few good terms'' property, i.e., summarizing a
few trends in the traffic - deviants, wavelet terms,
histograms,correlations - is both feasible and insightful. Finally,
data quality problems in network datafeeds present many new challenges
and principled approach to confronting data quality problems is an
implicit concern for data stream mining. In this tutorial, I will survey
the application, discuss these insights and present novel algorithmic
solutions that have been developed.
Cache-oblivious algorithms and data structures
Erik D. Demaine,
A recent direction in the design of cache-efficient and disk-efficient
algorithms and data structures is the notion of cache obliviousness,
introduced by Frigo, Leiserson, Prokop, and Ramachandran in 1999.
Cache-oblivious algorithms perform well on a multilevel
memory hierarchy without knowing any parameters of the hierarchy,
only knowing the existence of a hierarchy. Equivalently, a single
cache-oblivious algorithm is efficient on all memory hierarchies
simultaneously. While such results might seem impossible, a recent body
of work has developed cache-oblivious algorithms and data structures
that perform as well or nearly as well as standard external-memory structures
which require knowledge of the cache/memory size and block transfer size.
Here we describe several of these results with the intent of elucidating
the techniques behind their design.
Perhaps the most exciting of these results are the data structures,
general building blocks immediately leading to several algorithmic results.
The two tutorials take place on Sunday June 29 as a satellite event of ICALP 2003 (June 30-July
4, 2003), at the Technische
Universiteit Eindhoven, The Netherlands.
tutorials are aimed at graduate students and researchers.
Partially supported by the Future and Emerging
Technologies programme of the EU under contract number IST-1999-14186
Registration and travel information
Is available at the ICALP 2003 webpage.
- Lars Arge, Duke University
- Gerth Stølting Brodal (co-chair), University of Aarhus
- Erik D. Demaine, MIT
- Rolf Fagerberg (co-chair), University of Aarhus
- Paolo Ferragina, University of Pisa
- Giuseppe F. Italiano, University of Rome, "Tor Vergata"
- Ulrich Meyer, Max-Planck-Institut, Saarbrücken
- Rajeev Raman, University of Leicester
- Jeff Vitter, Purdue University
- Norbert Zeh, Dalhousie University
Page maintained by
Gerth Stølting Brodal