Advanced Data Structures (Q1+Q2, 2015) - Projects

During the course you must handin three projects. The default projects are projects 1-3 below. These are all implementation projects and should be done in groups of 2-3 people. The projects should be documented in reports describing your theoretical considerations, implementation, experiment design and present your experimental results. The evaluation of the reports and implementation will be part of the final grade.

Alternatively, one or two (but not all three) of the projects 1-3 can be replaced by any of the theory projects 4-6 below. Theory projects must be documented by individual reports, i.e. no group reports (you can discuss the theory projects in groups, but the report must be indivual).

Please a.s.a.p. send the names of the people in your group to gerth@cs.au.dk.

Hints for implementation projects

There are quite a number of UNIX commands to time the execution of a program, among these time and timex. Read the man page for their usage. In some shells (e.g. csh), there is also a built-in version of time - read the man page for the shell for information on this. There are also corresponding C library routines called times, getrusage, and gettimeofday (which also have man pages). The availability may differ from machine to machine, though. It is part of the project to consider how to best time the execution of your programs.
Computing random numbers is expensive.
Make graphs of your results, e.g. using gnuplot.
Run the program a suitable number of times for each input size and use the average running time. This should compensate fluctuations due to other processes.
To perform several different executions, it is useful to have a script or meta-program that performs all the executions.
Try to write programs that automatically checks if your algorithms and data structures work correctly.

Project 1 (Implementation)

The topic of this project is to compare the practical performance of Fibonacci heaps and binary heaps.

The project consists of the following tasks:

Implement
1. the Fibonnacci Heaps of Fredman and Tarjan,
2. the binary heaps of Williams (Algorithm 232, CACM 21(4), 1964).
As a minimum the following opeartions should be supported: MakeHeap, FindMin, Insert, DeleteMin, and DecreaseKey.
Argue about the worst-case (as opposed to amortized) time bounds for each of the operations of Fibonnacci Heaps and binary heaps.
Design and perform experiments where you try to measure the running time and number of comparisons of the operations for both priority queue implementations.
Implement Dijkstra's algorithm for the single-source shortest path problem for a weighted graph with non-negative edge-weigths, such that is can switch between the two priority queue implementations.
Describe families of graphs where Dijkstra's algorithm performs many respectively few DecreaseKey operations.
Perform experiments on your implementation of Dijksta's algorithm - in particular, try to observe if the version using Fibonacci heaps achieves an improved performance because of the amortized O(1) time DecreaseKey operation.

The project should be documented in a report. The report should document your theoretical considerations, implementation, experiment design and present your experimental results.

Implementations should preferably be done in C or C++.

Project 2 (Implementation)

In this project you should select one of the below projects.

van Emde Boas Trees

The topic of this project is to compare the practical performance of van Emde Boas Trees with the priority queue implementations from Project 1 and with Red-Black search trees.

The project consists of the following tasks:

Implement van Emde Boas trees based on the description in the note van Emde Boas trees (lecture note by Gudmund S. Frandsen). As a minimum the following opeartions should be supported: FindMin, Insert, DeleteMin, Delete, PredecessorSearch
Design and perform experiments where you compare a van Emde Boas tree based priority queue with the priority queues from Project 1.
Design and perform experiments where you compare a van Emde Boas based search tree with red-black trees, using e.g. the implementation from Chapter 13 of Robert Sedgewick's book "Algorithms in C++, Parts 1-4 (Fundamental Algorithms, Data Structures, Sorting, Searching)" - the code is available at Robert Sedgewick's homepage.

Implementations should preferably be done in C or C++. Elements stored should be 24 bit integers.

Range Minimum Queries

The topic of this project is the trade-off between the space usage and query time of different range minimum queries.

The project consists of the following tasks:

Implement some of the approaches covered in the lecture on "Discrete Range Searching" using space O(nlog n) words and O(n) words.
Measure the the query time and the total space usage of your data structures (e.g. by keeing simple counters for the total space allocated).
(Optional) Implement and experiment with some of the ideas from the paper by Johannes Fischer: Optimal Succinctness for Range Minimum Queries, LATIN 2010: 158-16. This structure achieves 2n+o(n) bits and O(1) query time. What is the overhead in the o(n) term?

Implementations should preferably be done in C or C++. Elements stored should be integers.

Project 3 (Implementation)

In this project you should select one of the below possible projects. Alternatively, if you can define another project related to the course content, you are also welcome to do so. If you wish to do another project than one of the topics listed below, this must be approved before starting the project. Please send the suggestion for an alternative project to gerth@cs.au.dk for approval.

Retroactive search tree

In this project retroactive search trees should be implemented - both partially and fully retroactive search trees. The update operations to the (non-retroactive) search tree should be Insert(x) and Delete(x), and the query operation should be Pred(x) that returns the largest element ≤x stored in the tree. The tasks of the project are to:

Define an appropriate interface to a partially and a fully retroactive search tree.
Implement a search tree, a partially retroactive search tree, and a fully retroactive search tree [DIL, Section 4].
Test if your retroactive solutions are correct by comparing them with simple rollback solutions.
Compare experimentally the performance of the different data structures.
Compare experimentally the performance of the fully retroactive search tree with a simple rollback solution. What are the thresholds where the complicated solution is superior to the rollback solution?

Unified dictionaries

In this project the unified dictionary data structure from [BCDI07] should be implemented and compared with other dictionary data structures. The tasks of the project are to:

Design a set of different experiments for the unified data structure of [BCDI07] with various distances to recently accessed elements.
Implement the unified data structure from [BCDI07].
Experimentally compare the performance of the unified data structure with the red-black trees and optionally also with splay trees.

Functional queues

In this project you should do an experimental study of a functional data structure. Implementations should be done in the lazy functional language Haskell (see www.haskell.org) using the ghc compiler. The tasks of the project are to:

Setup a framework for repeatedly running and timing a Haskell program (does not need to be implemented in Haskell).
Implement various queue implementations in Haskell:
- Using a standard Haskell list as a queue.
- Using a pair of lists (left-part,right-part) with an amortized O(1) time guarantee provided the same queue will never be the argument to repeated queue operations (i.e. expensive operations cannot be repeated).
- Using O(1) lists with a worst-case guarantee of O(1) time per enqueue and dequeue operation (if executed strictly).
- Using a pair of lists (left-part,right-part), exploiting the properties of lazy evaluation of list concatenation to guarantee amortized O(1) per queue operation.
Design and perform experiments comparing the different implementations. The experiments should cover both the case where a queue is only used once as an argument to a queue operation (i.e. expensive operations cannot be repeated) and the case where queues can be used in the general way functional programming allows (i.e. the case where the same queue can be used as an argument an unlimited number of times and an expensive operation can be repeated arbitrarily often).

Note: Since Haskell is a lazy functional language, the timing experiments might not be as expected, since the functions called might actually never have been evaluated. To ensure that functions always get evaluated, ensure that you somehow output a result using the result of your computation.

Project 4 (Theory)

Exponential search

Exponential search traditionally requires 2log d+O(1) comparisons, where d is the number of elements in the range from the finger where the search starts to the position of the searched key. Describe how to perform an exponential search with fewer comparisons. How few comparisons can you perform the search with?

Merging binary heaps

Assume binary heaps are represented by pointers. Describe how to merge two binary heaps of size n and m respectively in polylogarithmic time. What is the best merging time you can achieve?

Finger searches in treaps

Prove that the expected number of nodes in a treap on the path from the finger to the searched key is O(log d).

Counting unique elements in subarrays

Assume given an array A[1..n] of n elements and k pairs (i₁,j₁)...(i_k,j_k). Describe an algorithm that for each of the k pairs (i,j) computes the number of unique elements in A[i..j]. State the running time of your algorithm.

Optional: 1) Consider the static version of the problem where the array A has to be preprocessed so that queries can be answered online. State space, preprocessing time, and query time of your solution. 2) Consider the dynamic version of the problem where the entries of array A are updated online interleaved with queries. State space, update time, and query time of your solution.

Project 5 (Theory)

Range minimim queries in two dimensions

Let A be a two dimensional array of size n x m. Describe how to preprocess A into a data structure that given a rectangular query range given by (i₁,i₂,j₁,j₂) reports the minimum element in A[i₁..i₂]x[j₁..j₂]. State the space, preprocessing time, and query time of your data structure.

Optional: What is the best space bound you can achieve with O(1) query time? What is the best query time you can achieve with O(nm) space?

Nearest common ancestors under leaf insertions and deletions

Describe a data structure to maintain a tree T under the insertion and deletion of leafs (to insert a new leaf we are given a pointer to the parent node), while supporting nearest common ancestor queries of two arbitrary nodes in T. State the update and query time of your data structure.

What are the best update and query bounds you can achieve?

Inplace merging

Describe an algorithm that given an array containing n+m elements x₁x₂...x_ny₁y₂...y_m, where x₁≤x₂≤...≤x_n and y₁≤y₂≤...≤y_m, inplace merges the two sorted sequences. State the running time of your algorithm.

What is the best running time you can achieve?

Prefix counting

Let A be an array of size n, where each A[i] is either 0 or 1. Describe a data structure that supports updating an entry of A to either 0 or 1, and the query sum₂(i) that returns (A[1]+A[2]+...+A[i]) mod 2, ie. decides if the sum of the first i entries of A is odd or even.

What is the best query time and update time you can achieve? A lower bound of Ω(log n/log log n) is known for the operations on the RAM.

Optional: Generalize the construction to return the sum A[1]+A[2]+...+A[i], ie. the number of entries equal one among the first i entries.

Project 6 (Theory)

Fully retroactive queues

Prove that fully retroactive queues must require time at least time Ω(log m/loglog m) on a RAM. Hint: give a reduction from the parity problem (considered in Project 5) to the fully retroactive queue problem ([Fredman & Saks, STOC 1989]).

Array initialization

Describe a data structure supporting the following three operations in worst-case constant time:

Init(n) create a "list" L of length n with all entries set to nil.
Set(i,v) set the ith entry of L to v.
Get(i) return the ith entry of L.

The operations may use a procedure alloc(n) that in worst-case constant time returns an uninitialized array of length n, where each cell contains O(log n) bits, i.e., all entries of the allocated array contain random information.

Functional lists with index lookup

Describe a (strict) functional data structure supporting the operations push(x), pop(), and lookup(d), i.e. a functional stack supporting the lookup of the dth element. A normal list will support the operations in time O(1), O(1) and O(d) time respectively, whereas a balanced tree will achieve O(log n) time for all three operations, where n is the size of the list. Can you achieve the best of both worlds?

Maxiphobic heaps

In the description of maxiphobic heaps [Okasaki 2005, Section 3] it is stated:

Finally, note that "size" in maxiphobic heaps can be interpreted as either number of nodes or height of the tree. Either interpretation leads to a successful solution

Describe and analyse a solution of maxiphobic heaps based on height.