Advanced Data Structures (Q1+Q2, 2015) - Projects

During the course you must handin three projects. The default projects are projects 1-3 below. These are all implementation projects and should be done in groups of 2-3 people. The projects should be documented in reports describing your theoretical considerations, implementation, experiment design and present your experimental results. The evaluation of the reports and implementation will be part of the final grade.

Alternatively, one or two (but not all three) of the projects 1-3 can be replaced by any of the theory projects 4-6 below. Theory projects must be documented by individual reports, i.e. no group reports (you can discuss the theory projects in groups, but the report must be indivual).

Please a.s.a.p. send the names of the people in your group to gerth@cs.au.dk.

Hints for implementation projects

Project 1 (Implementation)

The topic of this project is to compare the practical performance of Fibonacci heaps and binary heaps.

The project consists of the following tasks:

  1. Implement
    1. the Fibonnacci Heaps of Fredman and Tarjan,
    2. the binary heaps of Williams (Algorithm 232, CACM 21(4), 1964).
    As a minimum the following opeartions should be supported: MakeHeap, FindMin, Insert, DeleteMin, and DecreaseKey.
  2. Argue about the worst-case (as opposed to amortized) time bounds for each of the operations of Fibonnacci Heaps and binary heaps.
  3. Design and perform experiments where you try to measure the running time and number of comparisons of the operations for both priority queue implementations.
  4. Implement Dijkstra's algorithm for the single-source shortest path problem for a weighted graph with non-negative edge-weigths, such that is can switch between the two priority queue implementations.
  5. Describe families of graphs where Dijkstra's algorithm performs many respectively few DecreaseKey operations.
  6. Perform experiments on your implementation of Dijksta's algorithm - in particular, try to observe if the version using Fibonacci heaps achieves an improved performance because of the amortized O(1) time DecreaseKey operation.

The project should be documented in a report. The report should document your theoretical considerations, implementation, experiment design and present your experimental results.

Implementations should preferably be done in C or C++.

Project 2 (Implementation)

In this project you should select one of the below projects.

van Emde Boas Trees

The topic of this project is to compare the practical performance of van Emde Boas Trees with the priority queue implementations from Project 1 and with Red-Black search trees.

The project consists of the following tasks:

  1. Implement van Emde Boas trees based on the description in the note van Emde Boas trees (lecture note by Gudmund S. Frandsen). As a minimum the following opeartions should be supported: FindMin, Insert, DeleteMin, Delete, PredecessorSearch
  2. Design and perform experiments where you compare a van Emde Boas tree based priority queue with the priority queues from Project 1.
  3. Design and perform experiments where you compare a van Emde Boas based search tree with red-black trees, using e.g. the implementation from Chapter 13 of Robert Sedgewick's book "Algorithms in C++, Parts 1-4 (Fundamental Algorithms, Data Structures, Sorting, Searching)" - the code is available at Robert Sedgewick's homepage.

Implementations should preferably be done in C or C++. Elements stored should be 24 bit integers.

Range Minimum Queries

The topic of this project is the trade-off between the space usage and query time of different range minimum queries.

The project consists of the following tasks:

  1. Implement some of the approaches covered in the lecture on "Discrete Range Searching" using space O(nlog n) words and O(n) words.
  2. Measure the the query time and the total space usage of your data structures (e.g. by keeing simple counters for the total space allocated).
  3. (Optional) Implement and experiment with some of the ideas from the paper by Johannes Fischer: Optimal Succinctness for Range Minimum Queries, LATIN 2010: 158-16. This structure achieves 2n+o(n) bits and O(1) query time. What is the overhead in the o(n) term?

Implementations should preferably be done in C or C++. Elements stored should be integers.

Advanced Data Structures 2013

Project 3 (Implementation)

In this project you should select one of the below possible projects. Alternatively, if you can define another project related to the course content, you are also welcome to do so. If you wish to do another project than one of the topics listed below, this must be approved before starting the project. Please send the suggestion for an alternative project to gerth@cs.au.dk for approval.

Retroactive search tree

In this project retroactive search trees should be implemented - both partially and fully retroactive search trees. The update operations to the (non-retroactive) search tree should be Insert(x) and Delete(x), and the query operation should be Pred(x) that returns the largest element ≤x stored in the tree. The tasks of the project are to:

Unified dictionaries

In this project the unified dictionary data structure from [BCDI07] should be implemented and compared with other dictionary data structures. The tasks of the project are to:

Functional queues

In this project you should do an experimental study of a functional data structure. Implementations should be done in the lazy functional language Haskell (see www.haskell.org) using the ghc compiler. The tasks of the project are to:

Note: Since Haskell is a lazy functional language, the timing experiments might not be as expected, since the functions called might actually never have been evaluated. To ensure that functions always get evaluated, ensure that you somehow output a result using the result of your computation.

Project 4 (Theory)

Exponential search

Exponential search traditionally requires 2log d+O(1) comparisons, where d is the number of elements in the range from the finger where the search starts to the position of the searched key. Describe how to perform an exponential search with fewer comparisons. How few comparisons can you perform the search with?

Merging binary heaps

Assume binary heaps are represented by pointers. Describe how to merge two binary heaps of size n and m respectively in polylogarithmic time. What is the best merging time you can achieve?

Finger searches in treaps

Prove that the expected number of nodes in a treap on the path from the finger to the searched key is O(log d).

Counting unique elements in subarrays

Assume given an array A[1..n] of n elements and k pairs (i1,j1)...(ik,jk). Describe an algorithm that for each of the k pairs (i,j) computes the number of unique elements in A[i..j]. State the running time of your algorithm.

Optional: 1) Consider the static version of the problem where the array A has to be preprocessed so that queries can be answered online. State space, preprocessing time, and query time of your solution. 2) Consider the dynamic version of the problem where the entries of array A are updated online interleaved with queries. State space, update time, and query time of your solution.

Project 5 (Theory)

Range minimim queries in two dimensions

Let A be a two dimensional array of size n x m. Describe how to preprocess A into a data structure that given a rectangular query range given by (i1,i2,j1,j2) reports the minimum element in A[i1..i2]x[j1..j2]. State the space, preprocessing time, and query time of your data structure.

Optional: What is the best space bound you can achieve with O(1) query time? What is the best query time you can achieve with O(nm) space?

Nearest common ancestors under leaf insertions and deletions

Describe a data structure to maintain a tree T under the insertion and deletion of leafs (to insert a new leaf we are given a pointer to the parent node), while supporting nearest common ancestor queries of two arbitrary nodes in T. State the update and query time of your data structure.

What are the best update and query bounds you can achieve?

Inplace merging

Describe an algorithm that given an array containing n+m elements x1x2...xny1y2...ym, where x1x2≤...≤xn and y1y2≤...≤ym, inplace merges the two sorted sequences. State the running time of your algorithm.

What is the best running time you can achieve?

Prefix counting

Let A be an array of size n, where each A[i] is either 0 or 1. Describe a data structure that supports updating an entry of A to either 0 or 1, and the query sum2(i) that returns (A[1]+A[2]+...+A[i]) mod 2, ie. decides if the sum of the first i entries of A is odd or even.

What is the best query time and update time you can achieve? A lower bound of Ω(log n/log log n) is known for the operations on the RAM.

Optional: Generalize the construction to return the sum A[1]+A[2]+...+A[i], ie. the number of entries equal one among the first i entries.

Project 6 (Theory)

Fully retroactive queues

Prove that fully retroactive queues must require time at least time Ω(log m/loglog m) on a RAM. Hint: give a reduction from the parity problem (considered in Project 5) to the fully retroactive queue problem ([Fredman & Saks, STOC 1989]).

Array initialization

Describe a data structure supporting the following three operations in worst-case constant time:

The operations may use a procedure alloc(n) that in worst-case constant time returns an uninitialized array of length n, where each cell contains O(log n) bits, i.e., all entries of the allocated array contain random information.

Functional lists with index lookup

Describe a (strict) functional data structure supporting the operations push(x), pop(), and lookup(d), i.e. a functional stack supporting the lookup of the dth element. A normal list will support the operations in time O(1), O(1) and O(d) time respectively, whereas a balanced tree will achieve O(log n) time for all three operations, where n is the size of the list. Can you achieve the best of both worlds?

Maxiphobic heaps

In the description of maxiphobic heaps [Okasaki 2005, Section 3] it is stated:

Finally, note that "size" in maxiphobic heaps can be interpreted as either number of nodes or height of the tree. Either interpretation leads to a successful solution

Describe and analyse a solution of maxiphobic heaps based on height.