- 2/3 2011: Webpage created

After attending this course, the participants will be familiar with the state of the art of streaming algorithms/complexities and will have an overview of research directions to pursue in this area.

After attending this course, participants should be able
to *design* streaming algorithms using the
currently known techniques and
*analyze* their space complexities.

This course studies data stream algorithms, namely, algorithms that solve a problem by making one pass over the data set while using small memory. These algorithms are important in many application areas such as databases and networking, where data arrives at a high speed and there is no time and/or need to store it for offline processing. We will start with the classical work in this area, followed by a potpourri of recent developments.

Students will be introduced to many data streaming techniques and how they can be applied to solve various problems. The course will focus on the design and theoretical analysis of the algorithms.

Participants are expected to have a basic background in randomized algorithm.

The course will be held in an informal lecture style, meaning that the lecturers have an interactive teaching style that involves the students in discussion and that students are strongly encouraged to initiate discussions themselves, in order to clarify questions they may have.

The evaluation will be based on a set of theoretical exercises and/or individual presentations. The list of questions will be handed out in the middle of the course, and answers are to be submitted by email at the end of the course. The grading is a simple PASS or FAIL.

Detailed list of topics is available in the course plan below (subject to adjustments as we go along).

Qin Zhang (MADALGO) and Lap-Kei Lee (MADALGO).

Turing 014. Every Thursday during April 7 - May 26, 13:00-16:00 (except April 21, which is a holiday)

Week | Date | Content | Literature | Lecturer | Notes |
---|---|---|---|---|---|

1 | Introduction: sampling, distinct elements, heavy hitters | [LN], [Mcg], [BYJKST04] | Qin Zhang | Lessons 1-4 | |

2 | Introduction: frequency moments, graph algorithms, geometry | [LN], [Mcg], [AMS96], [Guh09], [Ind06] | |||

3 | Edit distance, Ulam distance | [GJKK07], [EJ08], [AndThesis] | |||

4 | Ulam distance. (Intro) lower bounds. | [AndThesis] Chapter 5, [LN] | |||

5 | Graph algorithms: Maximum matching, Spanners | [Mcg05], [Bas08] | Lap-Kei Lee | ||

6 | Maximum matching (cont'), Sliding window: Basic counting, Lower bound (compressibility argument) | [Mcg05], [DGIM02] | |||

7 | New directions: Annotations, Distributed streaming | [CCG09], [SLN09], [YiZ09], [CLLT10] | |||

Optional | Lower bounds for frequent moments | [BYJKS02] |

- [Mut] Data Streams: Algorithms and Applications by S. Muthukrishnan.
- [LN] Lecture notes by A. Chakrabarti and his students.
- [Mcg] Crash Course on Data Stream Algorithms [Part 1] [Part 2] by A. McGregor.
- [AMS96] The space complexity of approximating the frequency moments by N. Alon, Y. Matias and M. Szegedy
- [Ind06] Stable Distributions, Pseudorandom Generators, Embeddings and Data Stream Computation by P. Indyk
- [Guh09] Tight results for clustering and summarizing data streams by S. Guha
- [BYJKST04] Counting distinct elements in a data stream by Z. Bar-Yossef, T. S. Jayram, R. Kumar, D. Sivakumar and L. Trevisan
- [BYJKS02] An information statistics approach to data stream and communication complexity by Z. Bar-Yossef, T. S. Jayram, R. Kumar and D. Sivakumar
- [GJKK07] Estimating the Sortedness of a Data Stream by P. Gopalan, T. S. Jayram, R. Krauthgamer and R. Kumar
- [EJ08] On distance to monotonicity and longest increasing subsequence of a data stream by F. Ergun and H. Jowhari
- [ABIW09] Efficient sketches for Earth-Mover Distance, with applications by A. Andoni, K. Do Ba, P. Indyk and D. Woodruff
- [Ind07] A Near Linear Time Constant Factor Approximation for Euclidean Bichromatic Matching (Cost) by P. Indyk
- [IW05] Optimal Approximations of the Frequency Moments of Data Streams by P. Indyk and D. Woodruff
- [KNW05] An Optimal Algorithm for the Distinct Elements Problem by Daniel M. Kane, J. Nelson and D. Woodruff
- [AndThesis] Nearest Neighbor Search: the Old, the New, and the Impossible by A. Andoni, PhD Thesis
- [FKMSZ04] On Graph Problems in a Semi-Streaming Model by J. Feigenbaum, S. Kannan, A. Mcgregor, S. Suri, and J. Zhang
- [Mcg05] Finding Graph Matchings in Data Streams by A. McGregor
- [FKMSZ05] Graph Distances in the Data-Stream Model by J. Feigenbaum, S. Kannan, A. Mcgregor, S. Suri, and J. Zhang
- [Bas08] Streaming Algorithm for Graph Spanners - Single Pass and Constant Processing Time per Edge by S. Baswana
- [DGIM02] Maintaining Stream Statistics over Sliding Windows by M. Datar, A. Gionis, P. Indyk, and R. Motwani
- [GiT02] Distributed Streams Algorithms for Sliding Windows by P. B. Gibbons and S. Tirthapura
- [CCG09] Annotations in Data Streams by A. Chakrabarti, G. Cormode, and A. McGregor
- [SLN09] Best-Order Streaming Model by A. D. Sarma, R. Lipton, and D. Nanongkai
- [CMY08] Algorithms for Distributed Functional Monitoring by G. Cormode, S. Muthukrishnan, and K. Yi
- [YiZ09] Optimal Tracking of Distributed Heavy Hitters and Quantiles by K. Yi and Q. Zhang
- [CLLT10] Continuous Monitoring of Distributed Data Streams over a Time-based Sliding Window by H. L. Chan, T. W. Lam, L. K. Lee, and H. F. Ting

4th (Spring 2011)

5 ETCS points

English

Homework hand-in / presentation

Based on mandatory hand-in / presentation and attendance, pass/fail