Dynamic Programming

As in the previous chapter about greedy algorithms, we are interested in solving optimization problems. Recall that in such kind of problems, we have

  • a space of feasible solutions, i.e. answers satisfying the constrains of the problem, and

  • a function associating a weight (or score, or error…) to each feasible solution.

The goal is then to find a solution whose weight is minimum (or maximum): this is called an optimal solution. There might be several optimal solutions and we usually just want to find any one of them.

We have seen that greedy algorithms provide simple and efficient solutions to some problems. However, we have also seen that the greedy approach fails to provide an optimal solution for other problems. In this chapter, we will see another algorithm technic called dynamic programming which somehow generalizes the greedy approach and allows us to solve more problems.

Change-making problem revisited

In the previous chapter, we have seen that for some coin sets like \(\{1,2,5,10,20,50\}\), finding the minimum number of coins that add up to a given amount of money (the change-making problem) can be solved optimally with a greedy algorithm. The greedy strategy is simply to iteratively pick the largest coin value which is not greater than the remaining amount to give back. However, for some other coin sets like \(\{1,2,5,10,20,40,50\}\) the greedy approach may fail to find an optimal solution. For example, to make the change over 86, the greedy method tells to use the following coins \(\{50,30,10,5,1\}\) (5 coins) while the optimal solution is \(\{40,40,5,1\}\) (4 coins). We will see how to solve this problem with a method called dynamic programming.

Recursive top-down approach

To simplify the problem, let consider an even simpler coin set: \(\{1,4,5\}\). Note that the greedy approach doesn’t work on this coin set, consider for example the change proposed for the value 8. Let’s try to better understand the problem.

If we start from the value 8, as our coin sets contains 3 different values, we have 3 possibilities:

  1. We give a coin of value 1 and we still have to make change on 7.

  2. We give a coin of value 4 and we still have to make change on 4.

  3. We give a coin of value 5 and we still have to make change on 3.

Assume that we chose option number 1, we gave a coin of value 1, and we still have to make change of 7, we have again 3 possibilities:

  1. We give a coin of value 1 and we still have to make change on 6.

  2. We give a coin of value 4 and we still have to make change on 3.

  3. We give a coin of value 4 and we still have to make change on 2.

And so on. If we put this in a Figure, we obtain the following depicting all possible ways to decompose the value 8 with the coin set \(C=\{1,4,5\}\).

../_images/change_making_ex1.svg

Fig. 30 We start from value 8: this is represented by the root node. At any step we consider any coin value \(c\), is \(c\) is smaller than or equal to the curent node value, ie. to the amount of change we still have to make, this creates a new possibility, we create a new child for the current node whose value is equal to the current value minus the coin value \(c\). When we reach a node with a value equal to 0, we have a feasible solution. The weight of this solution is equal to the depth of this node in the tree. In this tree, the weight of an optimal solution is equal to the smallest depth among all leaf nodes.

If we try to understand the problem in a recursive manner, let \(w(n)\) be the weight of an optimal solution to make the change on he value \(n\), we can observe that:

  1. we need 0 coin to make the change for the value 0: thus the base case is \(w(0)=0\); and

  2. in the general case, \(n>0\), for a coin \(c\) of value smaller than or equal to \(n\), assume that by recurrence, we know \(w(n-c)\), the number of coins needed to make the change on the value \(n-c\) (ie. the coin \(c\) is part of the solution). Then \(w(n)\) is equal to 1 plus the minimum weight \(w(n-c)\) for all possible coin values \(c\).

This means that we can indeed compute the weight \(w(n)\) of an optimal solution to make the change for a value \(n\) with the following formula:

\[\begin{split}w(n) = \begin{cases} 0 \textrm{ if } n=0 \textrm{ }\\ 1 + \min_{c \in C, c \leq n}w(n-c) \end{cases}\end{split}\]

In the previous example, the general case of the recurrence equation is thus: \(w(n)=1+\min(w(n-1), w(n-4), w(n-5))\). This can simply be implemented as:

def change_making_cost(coin_list, value):
    if value == 0:
        return 0:
    else:
        min_cost = infinity
        for c in coin_list:
            if c <= value:
                var v =  change_making_cost(coin_list, value - c)
                min_cost = min(min_cost, v)
        return 1 + min_cost

Note that this pseudo-code only returns the cost of the solution, we will see later how to get the actual solution.

This algorithm works, however its time complexity is high. It’s not easy to get a tight bound on its complexity but we can try to find an upper bound. For a given value \(n\) and a list of coins of size \(k\), the recursion depth is at most equal to \(n\) (if we have a coin of value 1) and there is at most \(k\) recursive calls at each step. The overall worst case time complexity is thus \(O(k^n)\): it is exponential with respect to \(n\) and it is thus not usable in practice.

To better understand what is happening, let’s look back at Fig. 30. The problem is that we are indeed computing the solutions to the same sub-problems over and over again: we have overlapping sub-problems. For example, the computation of the sub-problem for making change for the value 2 appears 6 times.

Recursive top-down approach with memoization

We can indeed avoid re-computing several times the same solution by using a lookup array to store the solutions of sub-problems. This memoization strategy used to solve problems with overlapping sub-problems is called dynamic programming. Then, anytime we ask the solution of a sub-problem, we look in the array if the result for this sub-problem has already been computed, if so we return the memoized result, otherwise we do the work normally. The procedure in then split into two parts:

  • the entry point which initializes the lookup array;

  • the actual procedure which computes the result recursively and uses the lookup array.

This gives:

def change_making_cost_memoized(coin_list, value, lookup_array=None)

    # initialization, executed only by the first call
    if lookup_array is None:
        lookup_array = new array of size value + 1                    # => lookup array to store subproblems results
        for i = 0 to of size value + 1:                               # => at start we don't the subprobems solutions
            lookup_array[i] = None

    # actual result computation
    if lookup_array[value] is not None:                               # => check if memoized solution exists
        return lookup_array[value]
    elif value == 0:
        return 0:
    else:
        min_cost = infinity
        for c in coin_list:
            if c <= value:
                cost = change_making_cost(coin_list, value - c, lookup_array)
                min_cost = min(cost, min_cost)
        lookup_array[value] = min_cost + 1 # memoize solution        # => memoize solution in the lookup array
        return min_cost + 1

Now, for a given value \(n\) and a list of coins of size \(k\), we can only hit the general case of the algorithm once for each number between 0 and \(n\). Which means that in the worst case, the time complexity of change_making_cost_memoized is \(O(nk)\). A possible call tree for change_making_cost_memoized(8) is shown below:

../_images/change_making_ex1_memoize.svg

Fig. 31 In this call tree, the sub-problems that were solved thanks to memoization (use of saved result from a previous call) are depicted in green.

Iterative bottom-up approach with memoization

The recursive approach seen above works but it’s not very elegant and the recursive calls will generate a significant computation overhead in practice. We can indeed simplify all this, by computing iteratively the solution to the sub-problems starting from the smallest.

def change_making_cost_bottom_up(coin_set, value):
    var cost_array = new array of size value + 1    # => stores solutions to all sub-problems
    cost_array[0] = 0                               # => initialization: we need 0 coin to make the value 0
    for i = 1 to of size value + 1:                 # => for all sub-problems
        min_cost = infinity
        for c in coin_set:
            if c <= i:
                cost = cost_array[i - c]    # => use previously computed solutions to smaller sub-problems
                min_cost = min(min_cost, cost)
        cost_array[i] = 1 + min_cost                # => store solution to current sub-problem
    return cost_array[value]                        # => return solution to the actual problem

Here the solutions to the sub-problems are constructed iteratively starting from the smallest until we reach the complete problem. For a given value \(n\) and a list of coins of size \(k\), the time complexity is \(\Theta(nk)\).

Let’s see how this work with the coin set \(\{1,4,5\}\) and the value 13. The following figure depicts how the cost array is filled along the main for loop:

../_images/change_making_dyn_ex.svg

Fig. 32 At each iteration \(i\), the value of the \(i\)-th cell (in blue) of the cost array is computed. Its value is obtained by adding 1 to the minimum of the values of the cells \(i-1\), \(i-4\), \(i-5\) (yellow). At the end of the algorithm, we have the optimal cost for the value 13 which is equal to 3.

Sub-problem graph

If we look how the algorithm access to the value of the cost array in Fig. 32, we see a clear pattern drawn by the blue and yellow cells showing how the algorithm uses the solutions to the sub-problems to solve the current problem. This pattern can be summarized in a so called sub-problem graph:

../_images/change_making_dyn_graph_ex.svg

Fig. 33 Sub-problem graph for the coin set \(\{1,4,5\}\) and the value 6. Each node represents a sub-problem. An arrow between two sub-problems \(x\) and \(y\) indicates that the solution of sub-problem \(x\) depends of the solution of sub-problem \(y\).

Somehow, the sub-problem graph is a compressed version of the call tree graph of the naïve recursive version of the algorithm (see Fig. 30) where identical sub-problems have been merged. The sub-problem graph is useful to understand the dependencies between sub-problems. The number of arrows is also a good indication of the time complexity of the algorithm.

Note also that the direction of the arrows in the subproblem graph constrains the processing order of the subproblems. In a top-down implementation (recursive algorithm), the sub-problems are processed in an order that is compatible with the direction of the arrows: this is called a topological order. In a bottom-up implementation (iterative algorithm), the sub-problems are processed in reverse topological order. With a top-down implementation the ordering is natural, it is done automatically by the function call stack. But, when we do a bottom-up implementation we have to chose in which order to process the sub-problems and we must ensure that, when processing a given sub-problem, all the subproblems on which it depends have already been processed.

The fibonacci sequence is defined as follows:

\[\begin{split}\forall n\in\mathbb{N}, fib(n) = \begin{cases}1 \textrm{ if } n\leq 1 \\ fib(n-1) + fib(n-2) \textrm{ otherwise.} \end{cases}\end{split}\]
  1. Propose and implement a naive recursive algorithm for computing \(fib(n)\) for a given number \(n\) in the python notebook .

  2. Draw the call tree of \(fib(5)\).

  3. The time complexity of this naive algorithm is O().

  4. Draw the sub-problem graph for \(fib(5)\).

  5. Propose and implement an optimized iterative algorithm for computing \(fib(n)\) for a given number \(n\) in the python notebook .

  6. The time complexity of this optimized algorithm is O().

Consider the following problem: given a positive integer \(n\), find the number of different ways, denoted \(A(n)\), to write \(n\) as the sum of 1, 3, and 4.

Example. We have \(A(5)=6\) as:

\[\begin{split}\begin{align} 5 & = 1 + 1 + 1 + 1 + 1 \\ & = 1 + 1 + 3\\ & = 1 + 3 + 1\\ & = 3 + 1 + 1\\ & = 1 + 4\\ & = 4 + 1 \end{align}\end{split}\]

Note that the order of the additions matters: \(1 + 1 + 3\) and \(1 + 3 + 1\) are 2 different ways of writing 5.

In order to write \(A(n)\) as a recurrence equation, notice that:

  • Base case: \(A(0)=1\): there is a single way to write 0 (write nothing)

  • General case: for a value \(n>0\), assume that \(k\) is a value in \(\{1,3,4\}\) such that \(k \leq n\): if we chose that \(k\) is the first element in the decomposition (we want to write \(n = k + \ldots\)), then there is \(A(n-k)\) ways to complete this decomposition. In order to get \(A(n)\) we must then consider all possible first choice \(k\) and add their respective number of decompositions \(A(n-k)\).

This gives:

\[\begin{split}A(n) = \begin{cases} 1 \textrm{ if } n=0\\ \sum_{k\in\{1,3,4\}, k\leq n} A(n-k)\end{cases}\end{split}\]
  1. Propose and implement a naïve recursive algorithm to compute \(A(n)\) in the python notebook .

  2. The time complexity of this naive algorithm is O().

  3. Draw the sub-problem graph of \(A(6)\).

  4. Propose and implement an optimized iterative algorithm for computing \(A(n)\) for a given number \(n\) in the python notebook .

  5. The time complexity of this optimized algorithm is O().

Reconstructing a solution

The efficient algorithm we have proposed for the change-making problem computes the cost of an optimal solution but, it does not compute an actual solution. Hopefully, this issue can be solved efficiently with a small modification of the algorithm. The idea is following one:

  1. At each step of the algorithm, store the choice that led to the optimal cost for the given sub-problem.

  2. At the end of the algorithm, go up the list of choices starting from the end.

For the change-making problem, this gives:

def change_making_cost_bottom_up(coin_list, value):
    var cost_array = new array of size value + 1
    var choice_array = new array of size value + 1  # => New array to store the choices
    cost_array[0] = 0
    choice_array[0] = None                          # => Special value indicating: there is no choice made here
    for i = 1 to of size value + 1:
        min_cost = infinity
        min_cost_choice = None                      # => The choice giving to the min cost
        for c in coin_list:
            if c <= i:
                cost =  cost_array[i - c]
                if cost < min_cost:
                    min_cost = cost
                    min_cost_choice = c             # => New best choice found, remember it
        cost_array[i] = 1 + min_cost
        choice_array[i] = min_cost_choice           # => Store the best choice found

    # Recontructing the solution
    var solution = []
    var current_subproblem = value                  # => The sub-problem currently considered: final one at first
    var c = choice_array[current_subproblem]        # => Choice that led to the current sub-problem
    while c != None:
        solution.insert_back(c)                     # => Add the choice to the solution
        current_subproblem -= c                     # => Go to the previous sub-problem
        c = choice_array[current_subproblem]        # => Choice that led to the new sub-problem

    return cost_array[value], solution              # => Return cost and solution

The reconstruction of the solution is illustrated in the following figure:

../_images/change_making_dyn_backtracking.svg

Fig. 34 cost_array and choice_array for the coin set \(\{1,4,5\}\) and the value 13. For each sub-problem, the cost array indicates the cost of an optimal solution, and the choice array indicates from each sub-problem this cost was deduced. For example, the sub-problem for the value 6 has an optimal cost of 2 and the choice that led to this cost is 1, meaning that it was obtained by adding the coin of value 1 to an optimal solution obtained for the sub-problem for the value 5 (6-1). Note that this choice is not necessarily unique, in the previous example, another possible choice was 5: adding the coin 5 to an optimal solution for the value 1 also produces a solution of cost 2.

The final solution for this example is then constructed by following the choices that gave the solution of the last problem in reverse order:

  • the optimal cost for 13 was obtained with the coin 4, meaning that it was deduced from the sub-problem for 9 (13-4)

  • the optimal cost for 9 was obtained with the coin 4, meaning that it was deduced from the sub-problem for 5 (9-4)

  • the optimal cost for 5 was obtained with the coin 5, meaning that it was deduced from the sub-problem for 0 (5-5)

  • the sub-problem for 0 is solved trivially without any coin: the solution is thus \(\{4, 4, 5\}\).

You work in a rod factory. The foundry produces long rods of length \(n\) that can be cut before being sold to consumers. The price of the cut rods depends on their length, but there is no strict proportional relationship between length and price; some lengths are more in demand than others, which increases the price.

Rod lengths are always an integral number of centimeters. We assume that we know, for \(i=1,\ldots,n\), the price \(p_i\) for a rod of length \(i\) centimeters. For example a possible price table for \(n=8\) is:

length \(i\)

1

2

3

4

5

6

7

8

price \(p_i\)

1

5

8

9

10

17

17

20

The rod cutting problem is to find how to cut a rod in order to maximize the expected profit. We denote by \(E(n)\) the optimal expected profit for a rod of length \(n\) and by \(C(n)\) the solution leading to \(E(n)\).

For example, with the price table given above, the maximum profit \(E(8)\) is 22 obtained by cutting the rod in two pieces \(C(n)=\{2,6\}\).

We can write \(E(n)\) as a recurrence equation by noticing that,

  • \(E(0)=0\), the rod of length 0 cost 0.

  • if we choose to first cut the rod at a length \(k\leq n\), then we can construct a solution of price \(p_k + E(n-k)\) (the optimal price for cutting a rod of size \(n-k\) plus the price of a rod of length \(k\)). We can thus consider every possible cut length \(k\leq n\) and take the one that maximizes \(p_k + E(n-k)\).

We thus have:

\[\begin{split}E(n) = \begin{cases}0 \textrm{ if } n=0\\ \max_{k\leq n} p_k + E(n-k)\end{cases}\end{split}\]
  1. Propose and implement a naïve recursive algorithm to compute \(E(n)\) in the python notebook .

  2. Draw the sub-problem graph for \(E(5)\).

  3. Propose an optimized iterative algorithm for computing \(E(n)\) for a given number \(n\).

  4. The time complexity of this optimized algorithm is O().

  5. Modify your optimized algorithm to also return an optimal solution \(C(n)\).

  6. Implement the optimized algorithm in the python notebook .

You are a professional robber planning to rob houses along a street. Each house has a certain amount of money stashed, the only constraint stopping you from robbing each of them is that adjacent houses have security system connected and it will automatically contact the police if two adjacent houses were broken into on the same night.

Given an array \(a\) of non−negative integers representing the amount of money of each house, determine the maximum amount of money you can rob tonight without alerting the police.

We denote by \(M(i)\) the maximum amount of money you can rob within the \(i\) first houses. We can notice that:

  • \(M(0)=a[0]\): if there is a single house, we can simply rob it;

  • \(M(2)=max(a[0], a[1])\): if there are two houses, we can rob only one of them, so chose the one with the maximum amount of money;

  • In the general case, to compute \(M(n)\) with \(n>2\), we can note that we have two possibilities:

    1. either we rob the \(n\)-th house, in wich case, we can’t rob the \(n-1\)-th house, the maximum of money we can rob is then \(a[n]+M(n-2)\) (the maximum amount of money we can make in the \(n-2\) first houses plus the mount of money in the \(n\)-th house);

    2. or we don’t rob the \(n\)-th house, in wich case the maximum of money we can rob is then \(M(n-1)\) (the maximum amount of money we can make in the \(n-1\) first houses).

  1. Give a recurence equation defining \(M(i)\) and implement a naive recursive algorithm to compute \(M(n)\) in the python notebook .

  2. Draw the sub-problem graph for \(M(5)\).

  3. Propose an optimized iterative algorithm for computing \(M(i)\) for all \(i\) between 0 and \(n\).

  4. The time complexity of this optimized algorithm is O().

  5. Modify your optimized algorithm to return the list of houses giving the maximum amount of money.

  6. Implement the optimized algorithm in the python notebook .

Optional: What is the solution if the houses are in a circle?

Let \(a\) be an array of numbers of size \(n\). A subsequence of \(a\) is an array composed of elements of \(a\) taken in order (we can remove elements from \(a\) but we cannot change their order). For example, [3,6,2,7] is a subsequence of the array [0,3,1,6,2,2,7], but [1,3,2,7] is not.

The longest increasing subsequence problem is to find the longest subsequence \(s\) of \(a\) such that for any \(i<j\), \(s[i]<s[j]\).

We denote by \(L(i)\) the length of the longest subsequence ending at index \(i\). Note that, for any index \(n\) and \(i<n\), if \(a[i]<a[n]\), then the longest subsequence ending at index \(i\) can be extended with the element \(a[n]\), giving a subsequence of length \(L[j] + 1\). If we consider all possible indices \(i<n\), we can find \(L(n)\) the length of the longest increasing subsequence.

Note that contrarily to previous problems, the final optimal solution is not necessarily in the last cell of \(L\): we have to look for the max value in \(L\) to get the length of the longest increasing subsequence.

  1. Give a recurence equation defining \(L(i)\)

  2. Draw the sub-problem graph for \(L(5)\).

  3. Propose an optimized iterative algorithm for computing \(L(i)\) for all \(i\) between 0 and \(n\).

  4. The time complexity of this optimized algorithm is O().

  5. Modify your optimized algorithm to return the longest subsequence.

  6. Implement the optimized algorithm in the python notebook .

Longest common subsequence

In the previous sections, we have seen an algorithmic method called dynamic programming for solving optimization problems. The general approach is:

  1. Write the solution as a recurrence equation involving overlapping sub-problems.

  2. Deduce the graph of sub-problems from the recurrence equation.

  3. Write an algorithm computing the cost of an optimal solution by iteratively (bottom-up) or recursively (top-down) computing the cost of sub-problems and memoizing the results in a look-up array.

  4. Reconstruct the final solution by tracing back, in reverse order, the choices that gave the optimal cost.

In all the examples we have seen so far, there was a single variable involved in the recurrence: we talk about 1d dynamic programming. We will now see more complex examples and exercises involving 2 input variables.

Recall that a subsequence of an array \(A\) is an array composed of elements of \(A\) taken in order (we can remove elements from \(A\) but we cannot change their order). For example, [3,6,2,7] is a subsequence of the array [0,3,1,6,2,2,7], but [1,3,2,7] is not. In the longest common subsequence problem, we have two arrays \(A\) and \(B\) and the goal is to find the largest array \(C\) that is both a subsequence of \(A\) and of \(B\). In the following, \(LCS(A, B)\) denotes a common longest subsequence of the arrays \(A\) and \(B\). The common longest subsequence problem is frequently used to compare two strings in text editors or in revision control systems such as Git.

For example, consider the two arrays [abcd] and [acbad]. They have:

  • 5 length-2 common subsequences [ab], [ac], [ad], [bd], and [cd];

  • 2 length-3 common subsequences: [abd] and [acd];

  • and no longer common subsequence.

So [abd] and [acd] are both optimal solutions to the problem.

Recurrence equation

It is, at first, not obvious how the common longest subsequence problem can be formulated in a recursive manner. Let’s see how this can be done using two properties of this problem based on the notion of prefix.

For a given array \(A\) of size \(n\), let \(A_k\), with \(k\leq n\) be the prefix of size \(k\) of \(A\), i.e. \(A_k\) is the array composed of the first \(k\) elements of \(A\).

For example, with \(A=[abdbc]\), we have:

  • \(A_0=[]\);

  • \(A_1=[a]\);

  • \(A_2=[ab]\);

  • \(A_3=[abd]\);

  • \(A_4=[abdb]\);

  • \(A_5=A=[abdbc]\);

Given an array \(A\) and an element \(e\), we denote by \(A\hat{}e\) the array composed of the elements of \(A\) and \(e\) appended at the end. For example \([abdbcd]\hat{}e\) is the array \([abdbcde]\).

Same ending property: If two arrays \(A\) and \(B\) end with the same element \(e\), then \(e\) is the last element of a longest common subsequence of \(A\) and \(B\). Formally, for any two arrays \(X\) and \(Y\), and any element \(e\):

\[LCS(X\hat{}e, Y\hat{}e) = LCS(X,Y)\hat{}e\]

For example, \(LCS([abdcab], [dccdab])\) is equal to \(LCS([abdca], [dccda])\hat{}b\) which is also equal to \(LCS([abdc], [dccd])\hat{}ab\) (and we cannot go any further as the ending characters do not match).

Different ending property: If two arrays \(A\) and \(B\), of respective size \(n_A\) and \(n_B\), end with the different elements, then a longest common subsequence of \(A\) and \(B\) is either :

  • a longest common subsequence of \(A_{n_A -1}\) (\(A\) minus its last element) and \(B\), or

  • a longest common subsequence of \(A\) and \(B_{n_B -1}\) (\(B\) minus its last element).

Formally, for any two arrays \(X\) and \(Y\), and any two distinct elements \(e\) and \(f\):

\[LCS(X\hat{}e, Y\hat{}f) = \textrm{the longest array among } \{LCS(X, Y\hat{}f), LCS(X\hat{}e, Y)\}.\]

The idea is that, as \(e\) and \(f\) are different, \(e\) or \(f\) might be in the longest common subsequence but both cannot be: we thus have to consider both cases. Note that it does not mean that \(e\) or \(f\) is in the longest common subsequence: maybe both of them are not.

For example, \(LCS([abdcab], [dccdabd])\) is either equal to \(LCS([abdca], [dccdabd])\) or \(LCS([abdcab], [dccdab])\) (any of two if both have equal length).

Recurrence equation: The two properties above suggest a recurrence equation based on the prefix length of the two input arrays. Let \(A=[a_1a_2\ldots{}a_n]\) and \(B=[b_1b_2\ldots{}b_m]\). We then have:

\[\begin{split}LCS(A_i, B_j) = \begin{cases} [\ ] \textrm{ if } i=0 \textrm{ or } j=0\\ LCS(A_{i-1}, B_{j-1})\hat{}a_i \textrm{ if } a_i = b_j \\ max\{LCS(A_{i-1}, B_{j}), LCS(A_{i}, B_{j-1})\} \textrm{ otherwise.} \end{cases}\end{split}\]

Naïve recursive algorithm

This gives the following naïve recursive algorithm:

def LCS(A,B):
    if A.length == 0 or B.length == 0:
        return []
    elif A[A.length - 1] == B[B.length-1]:
        var lcs_rec = LCS(A[:A.length],B[:B.length])
        return lcs_rec.insert_back(A[A.length-1])
    else:
        var lcs_rec1 = LCS(A[:A.length],B)
        var lcs_rec2 = LCS(A,B[:B.length])
        if lcs_rec1.length >= lcs_rec2.length:
            return lcs_rec1
        else:
            return lcs_rec2

Assume that the size of the array \(A\) is \(n\) and that the size of the array \(B\) is \(m\). The time complexity of the naïve longest common subsequence algorithm \(LCS(A,B)\) is O()

Implement the naive recursive Longest common subsequence algorithm in the python notebook .

Sub-problem graph

In the previous chapter, we only considered problems with a single input variable: the recursion was done on the size of this variable which meant that we needed a simple array to store the results of the sub-problems. Now, we have two variables in the recurrence equation, in order to better understand the situation, we can look at the sub-problem graph/ The following figure shows the sub-problem graph for the computation of the longest common subsequence of two arrays of size 3.

../_images/lcs_graph_rotated.svg

Fig. 35 In the longest common subsequence, each sub-problem is characterized by two numbers, the prefix size in the first array and the prefix size in the second array. We can see that any problem \((i,j)\) depends of the sub-problems \((i-1,j-1)\) (Same ending property), \((i-1,j)\) and \((i,j-1)\) (Different ending property). This leads to the graph on the left. If we rotate this graph, we can indeed see that the sub-problems can be organized as a 2d array whose width is equal to the length of one the array and whose width is equal to the length of the other array.

The results of the sub-problems can thus be stored in a 2d array. The next question is the order of resolution of the sub-problems: from the sub-problem graph we must find an order of traversal of the array such that a sub-problem is visited after all the sub-problems it depends on. Looking at the sub-problem graph, a possible order is simply to iterate over the array from the top left, line by line (or column by column).

Dynamic programming solution

We now have all the required ingredients to produce an efficient dynamic programming algorithm to solve the longest common subsequence problem. As we did before, we will use two arrays to store the information of sub-problems: the first one to store the cost of its optimal solution and the second one to store which choice we did to obtain this score. For a sub-problem of size \((i,j)\), choices will be encoded as follows:

  • \((-1, -1)\) if the solution was deduced from the sub-problem \((i-1, j-1)\) (both prefix arrays end with the same element);

  • \((-1, 0)\) if the solution was deduced from the sub-problem \((i-1, j)\) (prefix arrays end with different elements and the solution for \((i-1, j)\) is better than the one for \((i, j-1)\));

  • \((0, -1)\) if the solution was deduced from the sub-problem \((i, j-1)\) (prefix arrays end with different elements and the solution for \((i, j-1)\) is better than the one for \((i-1, j)\));

This gives:

def LCS_dynamic(A,B):
    score_array = 2d array of size (A.length + 1) x (B.length + 1)
    choice_array = 2d array of size (A.length + 1) x (B.length + 1)

    # sub-problems costs and choices
    for i = 0 to A.length + 1:
        for j = 0 to B.length + 1:
            if i == 0 or j == 0:                                     # => one of the two prefix arrays is empty: LCS is empty
                score_array[i, j] = 0
                choice_array[i, j] = None
            elif A[i - 1] == B[j - 1]:                               # => the two prefix arrays end with the same element: it belongs to the LCS
                score_array[i, j] = score_array[i - 1, j - 1] + 1
                choice_array[i, j] = (-1, -1)
            else:                                                    # => different last elements
                if score_array[i - 1, j] > score_array[i, j - 1]:    # => LCS is obtained by removing the last element of the first prefix array
                    score_array[i, j] = score_array[i - 1, j]
                    choice_array[i, j] = (-1, 0)
                else:
                    score_array[i, j] = score_array[i, j - 1]        # => LCS is obtained by removing the last element of the second prefix array
                    choice_array[i, j] = (0, -1)


    # solution reconstruction
    var position = (A.length + 1, B.length + 1)                      # => current position in the score array
    var solution = array of size score_array(position)               # => array to store the actual solution
    var i = solution.length - 1                                      # => current position in the solution array (solution is constructed in reverse order)
    while i != -1:                                                   # => while solution is not complete
        var c = choice_array[position]
        if c == (-1,-1):                                             # => current choice increased the solution size
            solution[i] = A[position[0] - 1]
            i -= 1
        position += c                                                # => move to previous sub-problem

    return solution

Let’s see how this algorithm works on a small example. Let \(A=[adccab]\) and \(B=[bacd]\). In the first part of the algorithm we construct the score array and the choice array:

../_images/LCS_ex.svg

Fig. 36 Evolution of the score and the choice arrays. In each cell \((i,j)\) we indicate the cost of the optimal solution for the subproblem \(LCS(A_i, B_j)\) and we put a blue arrow indicating from wich sub-problem this cost was deduced (choice). If another equal choice was possible we indicate it with a grey arrow. Choices for cell of score 0 are not indicated.

Then we can reconstruct a solution by tracing back the choices made:

../_images/LCS_ex2.svg

Fig. 37 We start from the last cell of the choice array and we follow the blue arrows. Each time we find a diagonal arrow this means that an element was added to the longest common subsequence: this element is thus stored in the solution. We stop when the length of the solution is equal to the optimal score found in the last cell.

Assume that the size of the array \(A\) is \(n\) and that the size of the array \(B\) is \(m\). The time complexity of the optimized longest common subsequence algorithm \(LCS_{dynamic}(A,B)\) is Θ()

Implement the optimized longest common subsequence algorithm in the python notebook .

Exercises

The edit distance is a way to measure the similarity between two strings \(A\) and \(B\). The edit distance is based on three basic string transformations:

  • Deletion: removes a single character of a string, for example “abc” => “ac” is an deletion of the second character.

  • Insertion: insert a single character in a string, for example “abc” => “abdc” is an instertion of “d” at the 3rd position.

  • Substitution: replace a single character of a string by another one, for example “abc” => “adc” is an substitution of the second character “b” by the character “d”.

The edit distance between the strings \(A\) and \(B\), denoted by \(D(A,B)\), is defined as the minimum number of basic transformations required to transform \(A\) into \(B\). The dit distance is often utilized for quasi-equal string comparisons, in order to ignore, for example typos (typing error, bad spelling…).

For example, the edit distance between “intention” and “execution” is 5, it is obtained by the following basic transformations:

  • “intention” => “ntention”: delete first character;

  • “ntention” => “etention”: substitute first character with “e”;

  • “etention” => “exention”: substitute second character with “x”;

  • “exention” => “execntion”: insert character “c” at position 4;

  • “execntion” => “execution”: substitute fifth character with “u”.

Such a transformation is often represented by an array where the transformed strings are aligned:

i

n

t

e

n

t

i

o

n

e

x

e

c

u

t

i

o

n

Here we see that there was: a deletion of the first character, a substitution on the second and third character, a match on the fourth, an insertion on the fifth, a substitution on the sixth, and finaly only matches.

Let \(A="a_1a_2\ldots{}a_n"\) and \(B="b_1b_2\ldots{}b_m"\). Remember that \(A_i\) denotes the prefix string of \(A\) of length \(i\) (it is the substring composed of the first \(i\) characters of \(A\)). We will formulate the edit distance between \(A\) and \(B\) as a recursive function \(D(A_i,B_j)\) on the prefix lengths \(i\) and \(j\). To do so, observe that.

  • If \(i=0\), then \(A_i\) is the empty string and we must insert all the characters of \(B_j\) into \(A_i\). \(D(A_0,B_j)\) is then equal to \(j\), the length of \(B_j\).

  • If \(j=0\), then \(B_j\) is the empty string and we must delete all the characters of \(A_i\). \(D(A_i,B_0)\) is then equal to \(i\), the length of \(B_i\).

  • If \(i,j>0\) and \(a_i=b_j\), the two strings end with the same character, then this is a match and we can forget this character: \(D(A_i,B_j)\) is then equal to \(D(A_{i-1},B_{j-1})\).

  • Otherwise, \(i,j>0\) and \(a_i\neq b_j\), then there are three possibilities:

    1. Delete the last character of \(A_i\) which leads to a cost equal to \(D(A_{i-1},B_j) + 1\); or

    2. Insert the last character of \(B_j\) in \(A\) which leads to a cost equal to \(D(A_i,B_{j-1}) + 1\); or

    3. Substitute the last character of \(A_i\) by the last character of \(B_j\) which leads to a cost equal to \(D(A_{i-1},B_{j-1}) + 1\).

This gives:

\[\begin{split}D(A_i,B_j) = \begin{cases} j \textrm { if } i=0\\ i \textrm { if } j=0\\ D(A_{i-1},B_{j-1}) \textrm{ if } a_i=b_j\\ 1 + \min\{D(A_{i-1},B_j), D(A_i,B_{j-1}), D(A_{i-1},B_{j-1})\} \textrm{ otherwise.} \end{cases}\end{split}\]
  1. Draw the sub-problem graph for \(D(A,B)\) with \(A\) a string of length 4 and \(B\) a string of length 3.

  2. Propose an optimized iterative algorithm for computing \(D(A,B)\).

  3. The time complexity of this optimized algorithm is O().

  4. Implement the optimized algorithm in the python notebook .

You are the millionth visitor to the local supermarket. To celebrate, the store manager allows you to take the equivalent of your weight in merchandise for free. You weight \(m\) kg. You can choose merchandise among a set of \(n\) items \(x_1,\ldots,x_{n}\); each item is characterized by a price \(p_i\) and a weight \(w_i\). All the weights are integral number given in kg. The smart person in you wants to determine which set of items you should take in order to maximize your profit. This problem is called the 0-1 knapsack problem.

For example, imagine that you weight 70 kg, the weight and price of the available items are given in the following table:

item number

1

2

3

4

5

6

7

8

weight

50

45

35

20

17

12

4

2

price

38

36

35

10

12

8

3

4

Some possible solutions are:

  • items {1, 4}, total weight 70, total price 48

  • items {2, 4, 6}, total weight 69, total price 49

  • items {2, 5, 7, 8}, total weight 68, total price 55

We can model the problem as a recursive function \(M(w, j)\) which is equal to the maximal price that can be attained with a total item weight less than or equal to \(w\) and items taken in the subset \(x_1,\ldots,x_j\) (only the first \(j\) items are considered). To do so, note that:

  • With 0 item the maximal price achievable is 0 whatever the maximal weight \(w\) is: \(M(w,0)=0\).

  • If the weight of the \(i\)-th item is greater than the maximal weight \(w\), then the \(i\)-th item cannot be part of the optimal solution with this weight limit and the optimal price is equal to the optimal price obtained with the same weight limit without this item: \(M(w,i)=M(w,i-1)\) if \(w_i>w\).

  • If the weight of the \(i\)-th item is less than or equal to the maximal weight \(w\), then there is two possibilities:

    1. Either the \(i\)-th item is chosen and we can construct a solution made of this item and the optimal solution for the subproblem where the weight limit \(w\) is decreased by the item weight \(w_i\) and the \(i\)-th item is removed from the item set: \(M(w-w_{i}, i-1) + p_{i}\);

    2. Or the \(i\)-th item is not chosen, we are then in the same situation as when item \(i\) cannot be selected due to its weight and we have to consider the sub-problem with the same weight limit without this item: \(M(w,i)=M(w,i-1)\).

This gives:

\[\begin{split}M(w, i) = \begin{cases} 0 \textrm{ if } j=0\\ M(w,i-1) \textrm{ if } w_i > w \\ \max(M(w-w_{i}, i-1) + p_{i}, M(w,i-1)) \end{cases}\end{split}\]
  1. Draw the sub-problem graph for the following problem instance: maximum weight \(m=6\), number of items \(n=5\), prices \(p=[4,2,3,1,1]\), and weights \(w=[4,2,2,2,1]\). Note that, not all possible pairs (maximum-weight, number of items) appear in the graph.

  2. Propose an optimized iterative algorithm for computing \(M(m, n)\). Note that, even if, we do not require the solution to every sub-problem, we can still compute the whole 2d cost array.

  3. The time complexity of this optimized algorithm is O().

  4. Modify your optimized algorithm to return the set of items giving the optimal price.

  5. Implement the optimized algorithm in the python notebook .

You are a professor of archaeology and you are exploring a Peruvian temple. When entering the treasure room, you see a precious golden idol and you immediately notice the trap, the idol lies on a scale: if you take it the scale will measure the change of weight and it will activate a deadly trap. To take the idol safely, you must replace it by something of the exact same weight.

../_images/raiders-of-the-last-ark-1200.jpg

Fig. 38 A professor of archaeology during a work mission.

The idol weight \(p\) kg, with \(p\) an integer (hopefully it was writen on the back of the idol!). On the ground, close to the hostel, you notice a set of well calibrated stones \(s_1,\ldots,s_n\), each stone \(s_i\) weighting \(m_i\) kg, with \(m_i\) an integer. Your goal is to determine if there exists a set of stones whose total weight is exactly equal to the weight of the idol \(p\).

We denote by \(S(i,k)\) the boolean function stating if it is possible to make the weight \(k\leq p\) with the stones \(s_0,\ldots,s_i\). Our first goal is to give a recursive definition of \(S(i,k)\).

  1. Establish the base case (what if we have no stone?).

  2. Establish the general case (at each step we can chose to include or not include the stone \(k\) in the solution)

  3. Draw the sub-problem graph for the following problem instance: maximum weight \(p=6\), number of stones \(n=5\), weights \(w=[4,2,2,2,1]\).

  4. Propose an optimized iterative algorithm for computing \(S(n, p)\).

  5. The time complexity of this optimized algorithm is O().

  6. Modify your optimized algorithm to print the set of stones if a solution exists.

  7. Implement the optimized algorithm in the python notebook .

Assume that we have a sequence of matrices \(A_1,\ldots,A_n\) such that the number of columns in any matrix \(A_i\) is equal to the number of lines in the matrix \(A_{i+1}\) and we want to compute the product of all matrices:

\[A_1\times{}A_2\times{}A_3\ldots{}\times{}A_n.\]

Matrix multiplication is not commutative but it is associative, which means that the above product can indeed be computed in many different orders, depending on how we put parenthesis, for example:

  • \((\ldots((A_1\times{}A_2)\times{}A_3)\times{}\ldots{})\times{}A_n)\), or

  • \((A_1\times{}A_2)\times{}(A_3\times{}A_4)\times{}\ldots{}\times{}A_n)\), or

  • \(((A_1\times{}A_2)\times{}A_3)\times{}A_4\times{}\ldots{}\times{}A_n)\)

The chosen ordering can have a huge impact on the number of operations required to compute the whole product. For example, assume that we have 3 matrices \(A_1\), \(A_2\), and \(A_3\) of respective size \(10 \times 100\), \(100 \times 5\), and \(5 \times 50\). Remember that multiplying a \(n\times k\) matrix with a \(k\times m\) matrix requires \(n\times k \times m\) scalar mutliplications (with the naïve matrix multiplication algorithm). Now lets see how many scalar multiplications are required to compute the product \(A_1 \times A_2 \times A_3\) with the different possible parenthesization:

  • \(((A_1\times{}A_2)\times{}A_3)\): we perform \(10*100*5=5000\) scalar multiplications to compute \((A_1\times{}A_2)\), the result is a new matrix \(M\) of size \(10\times{}5\). Then multiplying \(M\) with \(A_3\) requires \(10*5*50=2500\) scalar multiplications. The total number of multiplications is then 7500.

  • \(A_1\times{}(A_2\times{}A_3)\): we perform \(100*5*50=25000\) scalar multiplications to compute \((A_2\times{}A_3)\), the result is a new matrix \(M\) of size \(100\times{}50\). Then multiplying \(A_1\) with \(M\) requires \(10*100*50=50000\) scalar multiplications. The total number of multiplications is then 75000.

The first parenthesization thus requires 10 times less scalar multiplications than the second one.

The matrix chain multiplication problem is to find the parenthesization of the product \(A_1\times{}A_2\times{}A_3\ldots{}\times{}A_n\) that minimizes the number of scalar multiplications required to compute this product. Note that we are not interested in computed the product itself: we just want to find where to place the parenthesis.

For a matrix \(A_i\), we denote the number of rows by \(p_{i-1}\) and the number of columns by \(p_i\) (which is also the number of rows of the matrix \(A_{i+1}\)). For any, \(i,j\) with \(1\leq i \leq j\leq n\), let \(C(i,j)\) bet the minimal number of scalar multiplications required to compute the product of the matrices \(i\) to \(j\): \(A_i\times{}\ldots{}\times{}A_j\). In order to find a recursive formulation of \(C(i,j)\), note that:

  • If \(i=j\), then we have a single matrix and we don’t have to perform any multiplication.

  • Otherwise, if \(i<j\), then we can split the product \(A_i\times{}\ldots{}\times{}A_j\) at any position \(k\) with \(i\leq k<j\): \((A_i\times{}\ldots{}A_k)\times{}(A_{k+1}\times{}\ldots{}A_j)\). This split at position \(k\) leads to a number of scalar multiplications equals to \(C(i,k) + C(k+1,j) + p_{i-1}*p_k*p_j\): the optimal number of multiplications required to compute the left and right parts of the split plus the number of multiplications required to multiply the results of those two parts together. We can thus find the optimal split by looking at all possible split positions \(k\).

This gives:

\[\begin{split}C(i,j) = \begin{cases} 0 \textrm{ if } i=j \\ \min_{i\leq k<j} \{C(i,k) + C(k+1,j) + p_{i-1}*p_k*p_j\} \textrm{ otherwise.} \end{cases}\end{split}\]

Note that, in all the previous problems considered in dynamic programming, the parameters of the recurrence equation represented the last considered item in the sub-problem, and we were indeed considering all items from the first one to this last one. Whereas in this problem, the two parameters of the recurrence equation indeed represent an interval of items considered in the sub-problem. The optimal cost for the sub problem is thus given by \(C(1,n)\), the minimal cost for the whole sequence of matrices.

  1. Draw the sub-problem graph for \(C(1,4)\). Note that the sub-problem indices can be organized in an upper triangular square matrix of size \(n\). Think about a way to iterate over the sub-problems such that a sub-problem is visited after all the sub-problems it depends on.

  2. Propose an optimized iterative algorithm for computing \(C(1, n)\).

  3. The time complexity of this optimized algorithm is O().

  4. Modify your optimized algorithm to print the optimal parenthesization.

  5. Implement the optimized algorithm in the python notebook .