Lecture 015 - Dynamic Programming

The word "Dynamic Programming" was made to avoid using the word "research".

"An interesting question is, 'Where did the name, dynamic programming, come from?' The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word, research. I'm not using the term lightly; I'm using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term, research, in his presence. You can imagine how he felt, then, about the term, mathematical. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose? In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various reasons. I decided therefore to use the word, ‘programming.' I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying—I thought, let's kill two birds with one stone. Let's take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense. It also has a very interesting property as an adjective, and that is it's impossible to use the word, dynamic, in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It's impossible. This, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities".

Richard Bellman ("Eye of the Hurricane: An autobiography", World Scientific, 1984)

Dynamic Programming: avoid recalculate smaller instance of the problem.

Fibonacci without Dynamic Programming

Fibonacci without Dynamic Programming

Fibonacci with Dynamic Programming

Fibonacci with Dynamic Programming

Most problems that can be tackled with dynamic programming are optimization or decision problems.

top-down approach: starts at the root(s) of the DAG and recurse and remembers solutions to subproblems (memoization).

bottom-up approach: starts at the leaves of the DAG and processes the DAG in level order traversal.

scheduling approach: (not general technique) find the shortest path in the DAG where the weighs on edges are defined in some problem specific way.

Dynamic programming technique:

  1. Is it a decision or optimization problem?
  2. Define a solution recursively (inductively) by composing the solution to smaller problems.
  3. Identify any sharing in the recursive calls, i.e. calls that use the same arguments.
  4. Model the sharing as a DAG, and calculate the work and span of the computation based on the DAG.
  5. Decide on an implementation strategy: either bottom up top down, or possibly shortest paths.

There are many problems with efficient dynamic programming solutions. Here we list just some of them.

Subset Sums

The subset sum (SS) problem: can a subset of numbers S I give you sum to a number k?

SS is NP-hard.

But... as long as k is polynomial in |S| then the work is also polynomial in |S|. We will try to find a pseudo-polynomial work (or time) solution.

Why do we say the SS algorithm we described is pseudo-polynomial? The size of the subset sum problem is defined to be the number of bits needed to represent the input. Therefore, the input size of k is \log k But the work is O(2^{\log k}|S|), which is exponential in the input size. That is, the complexity of the algorithm is measured with respect to the length of the input (in terms of bits) and not on the numeric value of the input.

A klzzwxh:0004 work top-down approach. First consider an element included in the set, then consider an element not included in the set.

A O(2^n) work top-down approach. First consider an element included in the set, then consider an element not included in the set.

Same algorithm without duplicated computation

Same algorithm without duplicated computation

Recursive Subset Sum

Recursive Subset Sum

We can improve work by observing that there is a large amount of sharing of subproblems.

Recursive Subset Sum (Indexed)

Recursive Subset Sum (Indexed)

Complexity: notice there are |S| many levels and each level has at most k+1 vertices, giving us O(k|S|) work and O(|S|) span.

Minimum Edit Distance

Minimum Edit Distance: given a character set \Sigma, what is the minimum number of insertions and deletions of single characters required to transform S = \Sigma^* \to T = \Sigma^*?

The algorithm used in the Unix diff utility was invented and implemented by Eugene Myers, who also was one of the key people involved in the decoding of the human genome at Celera.

Recursive MED

Recursive MED

An example MED instance with sharing.

An example MED instance with sharing.

Complexity: notice there are |S| + 1 many possible input to the first argument and |T| + 1 many possible input to the second argument. Giving O(|S||T|) work. Since each recursive call either removes an element from S or T, depth is O(|S| + |T|), giving O(|S| + |T|) span.

As in subset sum we can again replace the lists used in MED with integer indices pointing to where in the sequence we are currently at. This gives the following variant of the MED algorithm:

Recursive MED (Indexed)

Recursive MED (Indexed)

Optimal Binary Search Trees

Optimal Binary Search Tree (OBST): given an ordered set of keys S and a probability function p : S \to [0, 1]: (where Trees(S) is the set of all BSTs on S and d(s, T) is the depth of the key s in the tree T (the root has depth 1))

\min_{T \in Trees(S)} \left(\sum_{s \in S} d(s, T) \cdot p(s)\right)

Example of OBST

Example of OBST

optimal substructure property: Observe now that each subtree of the root is an optimal binary search tree. This property is sometimes a clue that either a greedy or dynamic programming algorithm might apply.

greedily construct tree by setting highest probability as root does not yield solution.

Say we have the optimal tree, how do we calculate its cost?

Calculate cost of the subtree

Calculate cost of the subtree

The cost of the subtree is the probability of accessing the subtree (left subtree, root, or right subtree) plus the cost of left and right subtrees.

Recursive Optimal Binary Search Tree

Recursive Optimal Binary Search Tree

To avoid O(2^n) cost, we share recursive solution. Observe that any subtree (whether balanced or not) of a BST on S contains the keys of a contiguous subsequence of S, we count the number of possible arguments to OBST by counting the total number of contiguous subsequences. There are n(n+1)/2 \in O(n^2) many vertices and longest path is O(n).

Each vertex (recursive call not including the subcalls) costs O(n) work O(\log n) span when taking the minimum. So total algorithm is O(n^3) work O(n\log n) span.

Recursive Optimal Binary Search Tree (indexed)

Recursive Optimal Binary Search Tree (indexed)

This example of the optimal BST is one of several applications of dynamic programming based on trying all binary trees and determining an optimal tree given some cost criteria. Another such problem is the matrix chain product problem.

Implementing Dynamic Programming

Bottom-Up Method

For example, consider Minimum Edit Distance problem for the two strings S = tcat, T = atc.

Vertex represent current comparison. We go from bottom right to top left, removing letters from the end.

Vertex represent current comparison. We go from bottom right to top left, removing letters from the end.

The best way to schedule bottom-up method is from top left base cases to bottom right.

pebbles: satisfy parent executing condition by clearing dependency.

This algorithm pebbles the DAG diagonally and stores the result of each vertex in a table M. Because the table is indexed by two integers, it can be represented by an array, which allows constant work random access. Each call to diagonals processes one diagonal and updates the table M. The size of the diagonals grows and then shrinks. We note that the index calculations are tricky.

Bottom up MED diagram

Bottom up MED diagram

Bottom up MED

Bottom up MED

Top-Down Method

Implementing Memoization: we want fast look up and store. balanced search trees (requires a total order), hash tables (requires effective hashing algorithm), or indexing (set up surrogate integer values to represent the input values) are good choice.

klzzwxh:0053 stores function klzzwxh:0054. If given argument klzzwxh:0056 found in table, return result. Otherwise, compute klzzwxh:0055, store, and return result.

memo stores function f. If given argument a found in table, return result. Otherwise, compute f, store, and return result.

MED: The pseudo-code below uses memoization to achieve sharing of solutions to subproblems.

MED: The pseudo-code below uses memoization to achieve sharing of solutions to subproblems.

The above code is purely functional, is inherently sequential by locking. But we could: - use hidden state to implement the memo function such that the memo table is used implicitly - use concurrent hash tables to store the results so that parallel calls might be in flight at the same time, and - use synchronization variables to make sure that no function is computed more than once.

Strategies

Tips:

Table of Content