Introduction to Programming and Scientific Applications (Spring 2018)

Final project

The goal of the final project is to do some concrete Python implementation and documenting it in a short report. The project can be done in groups with 1-3 students. There are three deliverables for the final project that should be handed in:

  1. Python code. Either plain Python 3 code in one or more .py files, or Python 3 code embedded into a Jupyter file.
  2. A report - maximum 4 pages. The front page must contain the information listed below (the front page does not count towards the page count).
  3. Visualization of data. E.g. matplotlib output. The plots should be included in an appendix to the report, that does not count towards the page count, or alternatively embedded in a Jupyter file.

Evaluation of handin

The project deliverables will be scored according to the following rubric:

Deliverable Not present
0%
Many major shortcomings
25%
Some major weaknesses
50%
Minor weaknesses
75%
Solid
100%
Code
50%
No code, or code where no part computes the required Some elementary parts of the code work, but significant parts do not work Most essential parts of the code work, but the code does not completely solve the problem addressed Overall good code, but space for improvement, e.g. missing documentation, code could be made clearer Complete code, well structured, good use of modules, clear variable names, good documentation, PEP8 compliant
Visualization
25%
No visualization Very few or very unclear visualizations Some visualizations illustrating basic idea Overall good visualization but some details could have been more clear Clear visualizations documenting the problem being solved and supporting the conclusions drawn
Report
25%
No report Very unpolished report, missing structure, no clear conclusions, major misunderstandings The basic ideas are described but details are missing Overall good report but for some aspects the level of detail/precision could be higher Solid report, containing reflection upon the implementation done, discussion of the Python modules used, the design choices made, discussion of the structure of the code, what were the hardest parts, what is missing, and discussion of testing and visualization

Project report

The project report must contain the following front page (not counted in the 4 pages). Make sure you read and understand the note on plagiarism.

Project descriptions

You can choose any of the below projects as your final project.

Project I - Geometric medians

Motivation: Assume you want to build a drone tower from where to deliver parcels to costumers, and you know the destinations (points) where to deliver parcels. What is the optimal placement of the tower, if each drone can at most carry one parcel and needs to return to the tower between each delivery?

The geometric median of a point set S = { p1, ..., pn } in the plane is a point q (not necessarily a point in S), that minimizes the sum of distances to the points, i.e. minimizes ∑i=1..n |pi - q|, where |p - q| = ((px - qx)2 + (py - qy)2)0.5.

  1. Create a function geometric_median that given a list of points computes the geometric median of the point set.
    Hint. Use the minimize function from the module scipy.optimize.
  2. Make representative visualizations that show both the input point sets and the computed geometric medians. Examples should include point sets with two and three points, and a random point set.
  3. Use the matplotlib plotting function contour to plot a contour map to illustrate that it is the correct minimum that mas been found (the z-value for a point (x, y) in the contour map is the sum of the distances from (x, y) to all input points).

Next we want to find two points q1 and q2 such that the perpendicular bisector of q1 and q2 partitions the point set into the points closest to q1 and those closest to q2. We want to find two points q1 and q2 such that the sum of distances to the nearest point of q1 and q2 is minimized, i.e. ∑i=1..n min(|pi - q1|, |pi - q2|) is minimized.

To solve this problem one essentially has to consider all possible bisection lines, and to find the geometric median on both sides of the bisector, e.g. using the algorithm from question a). Assuming no three points are on a line, it is sufficient to consider all n(n+1)/2 bisector lines that go through two input points, and for each bisector line to consider the two cases where the two points on the line both are considered to be to the left or to the right of the line.

  1. Create a function two_geometric_medians that give a list of points p1, ..., pn computes two points q1 and q2 minimizing ∑i=1..n min(|pi - q1|, |pi - q2|). It can be assumed that no three points are on a line.
  2. Visualize your solution, showing imput points, q1 and q2, and the perpendicular bisector between q1 and q2.
  3. [optional] Extend your solution to handle the case where three or more points can be on a line. E.g. what is your solution if you have four points on a line?
  4. [optional] Try to use some real geometric data, e.g. addresses stored in a file and using the geocoder module to convert addresses to longitude & latitude coordinates (be aware that there are usage limitations).
    > import geocoder
    > geocoder.google('aarhus universitet nordre ringgade 1').latlng
    [56.1681384, 10.2030118]
    
    Use the pyproj library to convert longitude & latitude to a points in a plane. For Denmark a good approximation is to use the UTM32 projection.

Project II - Portfolio optimization

We consider the problem of choosing a long term stock portfolio, given a set of stocks and their price over some period under risk aversion parameter γ > 0.

Assume there are m stocks to be considered. The portfolio will be represented by a vector w ∈ ℝm, such that ∑i=1..m wi = 1. If wi > 0, you use a fraction wi of your total money to buy the i'th stock, while wi < 0 represent shorting that stock. In both cases we assume the stock is bought/shorted for the entire period.

Let pj,i represent the price of the i'th stock at time step j. If there are n time steps, then p ∈ ℝn×m is a matrix.

Let rj,i represent the fractional reward of stock i at time step j, i.e. rj,i = (pj,i - pj-1,i) / pj-1,i for j > 1.

We make the (unrealistic) assumption that we can model r by a random variable, distributed as a multivariate Gaussian with estimated means,

μ = E[r] ≃ 1/n · ∑j=1..n rj

and estimated variances,

Σ = E[(r - μ)(r - μ)T] ≃ 1/n · ∑j=1..n [(rj - μ)(rj - μ)T]

The distribution of returns using some w is then

Rw = Nw, σw2)
μw = wTμ
σw2 = wTΣw

Now, we want to maximize for a balance between high return μw and low risk σw2. This leads to the following optimization problem, where we ant to find the value w* of w maximizing the following expresion:

maximize     wTμ - γwTΣw

subject to     ∑i=1..m wi = 1

where γ controls the balance between risk and return. A high value of γ indicate we are willing to take low risk and vise versa.

In this project you should find w* for different values of γ and using real stock values of your choice. The project consists of the following three questions.

  1. We need a module for collecting stock values, for this we will use the module googlefinance. Using this you should write a function get_prices([stock1, ..., stockk], step_size, period, exchange), that returns a tuple (stocks, p), where p[j, i] represents the opening price of stock i at time step j and stocks[i] is the name of the i'th stock. Make a plot of p, where each stock is labeled with its name, e.g. MSFT or GOGL. You should use at least five stocks.
  2. Calculate r, μ and Σ using the formulas above and the p calculated in question (a). Plot the probability density function (pdf) of the return of each stock.
    Hint. The method normpdf from the module matplotlib.mlab might become convenient.
  3. Solve the optimization problem defined above for different values of gamma, e.g. gammas = (np.arange(10) / 5) + 1, and plot the pdf of each solution to a single plot with appropriate legends. Finally create a scatter plot of how w* changes as γ changes: For each value of γ plot the fraction of each stock in the portfolio.

Project III - Open topic

You can also define your own project. There are some aspects that needs to be addressed.

  • Some minimal amount of Python 3 code must be produced.
  • Use at least two different Python modules.
  • Some data must be visualized, e.g using matplotlib, but any visualization module or domain specific library, like for chemistry, can be used.
You need to get your project description approved by sending an e-mail to gerth@cs.au.dk.