Solved-Assignment 3- Solution

$30.00 $19.00

Questions non-credit. Download the coalescent simulator msms from http://www.mabs.at/ewing/msms/ and learn how to simulate samples with speci c values of the parameters n; ; . Note that you do not need to specify the population size if you work with scaled parameters. non-credit. Download and obtain an academic license for the Gurobi ILP solver, and…

You’ll get a: . zip file solution

 

 
Categorys:
Tags:

Description

5/5 – (2 votes)

Questions

  1. non-credit. Download the coalescent simulator msms from http://www.mabs.at/ewing/msms/ and learn how to simulate samples with speci c values of the parameters n; ; . Note that you do not need to specify the population size if you work with scaled parameters.

  1. non-credit. Download and obtain an academic license for the Gurobi ILP solver, and read their quick start manual to solve ILPs.

  1. Given a binary SNP matrix that is taken from a population evolving with recombination and mutation, describe an ILP to eliminate a minimum number of mutations so that the remaining matrix admits a perfect phylogeny.

  1. Using the msms simulator, generate 100 simulations each with n = 100; = 40, and 2 f0; 1; 20; 40g. In each case, solve the ILP described in the problem above and report the fraction of SNPs that had to be deleted to construct a perfect phylogeny. Can that fraction be used as a statistic to predict the scaled recombination rate from the data?

  1. Given a complete graph Gn on n vertices, and real weights w(u; v) on edges, the weight of a path (an ordered chain of vertices) is given by the sum of the weights of the edges in the path. A path is simple when no vertex is reused in the path. Our goal is to nd the heaviest simple path (with maximum weight).

    1. Describe an ILP for solving the heaviest path problem. For extra credit, describe an LP formulation (no integer constraints) for the same problem, or give arguments for why that might be hard.

    1. Describe a Simulated annealing, or any other stochastic iterative optimization formulation for the same problem. Use pseudo-code, but be precise.

    1. Describe a greedy, or any fast heuristic for the problem. The nal solution is not guaranteed to be optimal, but the algorithm must be guranteed to nish ‘fast’.

  1. Generate a random weighted graph Gn with n vertices as follows:

(a) Choose an ordered subset of k

= minfn; 2dlog2 neg vertices P = fx1; x2; : : : ; xkg, where

xi 6= xj for all i; j, and 1 xi n for al i.

(b)

For each xi; xi+1 2 P set w(xi; xi+1) = 10 with probability

1

, and w(xi; xi+1) = 3 other-

2

wise.

(c)

For each 1 u 6= v n, s.t.

u; v are not consecutive in P, set w(u; v) to be a random

integer between 10 and +3.

Output examples of Gn for n = 3; 5; 10.

  1. Implement the algorithms from Q1(b) and Q1(c) to solve instances of Gn. Do not use any knowledge of the simulated data (such as edge weight values) in your code. Generate many instances of Gn with increasing values of n up to n = 100000 (or, whatever is possible on your computers). Run your code on these examples, and give scatter-plots of (a) the running time as a function of n and, (b) the weight of the maximum path computed for each example.

  1. Given a SNP matrix, and fraction f, design and implement an algorithm to identify a set C of columns , such that there is a subset of at least f n rows, that the matrix restricted to columns in C is all ones, and that jCj is maximized. Run your code on the matrix provided, for f 2 f0:3; 0:5; 0:7g. [Note. While there is a similarity to the clique nding problem, we are talking about a SNP matrix here].