PhD Course on Streaming Algorithms (Spring 2011, Q4)
Messages
- 2/3 2011: Webpage created
Objectives
After attending this course, the participants will be
familiar with the state of the art of streaming algorithms/complexities and will have an overview of research
directions to pursue in this area.
Learning outcomes and competences
After attending this course, participants should be able
to design streaming algorithms using the
currently known techniques and
analyze their space complexities.
Contents
This course studies data stream algorithms, namely, algorithms that solve a problem by making one pass over the data set while using small memory. These algorithms are important in many application areas such as databases and networking, where data arrives at a high speed and there is no time and/or need to store it for offline processing. We will start with the classical work in this area, followed by a potpourri of recent developments.
Students will be introduced to many data streaming techniques and how they can be applied to solve various problems. The course will focus on the design and theoretical analysis of the algorithms.
Participants are expected to have a basic background in
randomized algorithm.
The course will be held in an informal lecture style, meaning
that the lecturers have an interactive teaching style that
involves the students in discussion and that students are
strongly encouraged to initiate discussions themselves, in
order to clarify questions they may have.
The evaluation will be based on a set of theoretical
exercises and/or individual presentations. The list of questions will be handed out in the middle of the course, and answers are to be
submitted by email at the end of the course. The grading is a
simple PASS or FAIL.
Detailed list of topics is available in the course plan below
(subject to adjustments as we go along).
Lecturers
Qin Zhang
(MADALGO) and
Lap-Kei Lee
(MADALGO).
Place and Time
Turing 014. Every Thursday during April 7 - May 26, 13:00-16:00 (except April 21, which is a holiday)
Course plan
1 |
|
Introduction: sampling, distinct elements, heavy hitters |
[LN], [Mcg], [BYJKST04] |
Qin Zhang |
Lessons 1-4 |
2 |
|
Introduction: frequency moments, graph algorithms, geometry |
[LN], [Mcg], [AMS96], [Guh09], [Ind06] |
3 |
|
Edit distance, Ulam distance |
[GJKK07], [EJ08], [AndThesis] |
4 |
|
Ulam distance. (Intro) lower bounds. |
[AndThesis] Chapter 5, [LN] |
5 |
|
Graph algorithms: Maximum matching, Spanners |
[Mcg05], [Bas08] |
Lap-Kei Lee |
|
6 |
|
Maximum matching (cont'), Sliding window: Basic counting, Lower bound (compressibility argument) |
[Mcg05], [DGIM02] |
7 |
|
New directions: Annotations, Distributed streaming |
[CCG09], [SLN09], [YiZ09], [CLLT10] |
Optional |
|
Lower bounds for frequent moments |
[BYJKS02] |
|
Literature
- [Mut] Data Streams: Algorithms and Applications by S. Muthukrishnan.
- [LN] Lecture notes by A. Chakrabarti and his students.
- [Mcg] Crash Course on Data Stream Algorithms [Part 1] [Part 2] by A. McGregor.
- [AMS96] The space complexity of approximating the frequency moments by N. Alon, Y. Matias and M. Szegedy
- [Ind06] Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation by P. Indyk
- [Guh09] Tight results for clustering and summarizing data streams by S. Guha
- [BYJKST04] Counting distinct elements in a data stream by Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar and L. Trevisan
- [BYJKS02] An information statistics approach to data stream and communication complexity by Z. Bar-Yossef, T. S. Jayram, R. Kumar and D. Sivakumar
- [GJKK07] Estimating the Sortedness of a Data Stream by P. Gopalan, T. S. Jayram, R. Krauthgamer and R. Kumar
- [EJ08] On distance to monotonicity and longest increasing subsequence of a data stream by F. Ergun and H. Jowhari
- [ABIW09] Efficient sketches for Earth-Mover Distance, with applications by A. Andoni, K. Do Ba, P. Indyk and D. Woodruff
- [Ind07] A Near Linear Time Constant Factor Approximation for Euclidean Bichromatic Matching (Cost) by P. Indyk
- [IW05] Optimal Approximations of the Frequency Moments of Data Streams by P. Indyk and D. Woodruff
- [KNW05] An Optimal Algorithm for the Distinct Elements Problem by Daniel M. Kane, J. Nelson and D. Woodruff
- [AndThesis] Nearest Neighbor Search: the Old, the New, and the Impossible by A. Andoni, PhD Thesis
- [FKMSZ04]
On Graph Problems in a Semi-Streaming Model
by J. Feigenbaum, S. Kannan, A. Mcgregor, S. Suri, and J. Zhang
- [Mcg05]
Finding Graph Matchings in Data Streams
by A. McGregor
- [FKMSZ05]
Graph Distances in the Data-Stream Model
by J. Feigenbaum, S. Kannan, A. Mcgregor, S. Suri, and J. Zhang
- [Bas08]
Streaming Algorithm for Graph Spanners - Single Pass and Constant Processing Time per Edge
by S. Baswana
- [DGIM02]
Maintaining Stream Statistics over Sliding Windows
by M. Datar, A. Gionis, P. Indyk, and R. Motwani
- [GiT02]
Distributed Streams Algorithms for Sliding Windows
by P. B. Gibbons and S. Tirthapura
- [CCG09]
Annotations in Data Streams
by A. Chakrabarti, G. Cormode, and A. McGregor
- [SLN09]
Best-Order Streaming Model
by A. D. Sarma, R. Lipton, and D. Nanongkai
- [CMY08]
Algorithms for Distributed Functional Monitoring
by G. Cormode, S. Muthukrishnan, and K. Yi
- [YiZ09]
Optimal Tracking of Distributed Heavy Hitters and Quantiles
by K. Yi and Q. Zhang
- [CLLT10]
Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window
by H. L. Chan, T. W. Lam, L. K. Lee, and H. F. Ting
Quarter
4th (Spring 2011)
Credits
5 ETCS points
Prerequisites
Course language
English
Compulsory programme
Homework hand-in / presentation
Evaluation
Based on mandatory hand-in / presentation and attendance, pass/fail