Lecture 015 - Dynamic Programming

The word "Dynamic Programming" was made to avoid using the word "research".

"An interesting question is, 'Where did the name, dynamic programming, come from?' The 1950s were not good years for mathematical research. We had a very interesting gentleman in Washington named Wilson. He was Secretary of Defense, and he actually had a pathological fear and hatred of the word, research. I'm not using the term lightly; I'm using it precisely. His face would suffuse, he would turn red, and he would get violent if people used the term, research, in his presence. You can imagine how he felt, then, about the term, mathematical. The RAND Corporation was employed by the Air Force, and the Air Force had Wilson as its boss, essentially. Hence, I felt I had to do something to shield Wilson and the Air Force from the fact that I was really doing mathematics inside the RAND Corporation. What title, what name, could I choose? In the first place I was interested in planning, in decision making, in thinking. But planning, is not a good word for various reasons. I decided therefore to use the word, ‘programming.' I wanted to get across the idea that this was dynamic, this was multistage, this was time-varying—I thought, let's kill two birds with one stone. Let's take a word that has an absolutely precise meaning, namely dynamic, in the classical physical sense. It also has a very interesting property as an adjective, and that is it's impossible to use the word, dynamic, in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It's impossible. This, I thought dynamic programming was a good name. It was something not even a Congressman could object to. So I used it as an umbrella for my activities".

Richard Bellman ("Eye of the Hurricane: An autobiography", World Scientific, 1984)

Dynamic Programming: avoid recalculate smaller instance of the problem.

divide-and-conquer: tree diagram
dynamic programming: Directed Acyclic Graph (DAG) dependency (work is sum of vertices, span is heaviest vertex-weighted path)

Most problems that can be tackled with dynamic programming are optimization or decision problems.

top-down approach: starts at the root(s) of the DAG and recurse and remembers solutions to subproblems (memoization).

elegant
only compute needed subproblems

bottom-up approach: starts at the leaves of the DAG and processes the DAG in level order traversal.

easier to parallelize
more space efficient
always requires evaluating all instances

scheduling approach: (not general technique) find the shortest path in the DAG where the weighs on edges are defined in some problem specific way.

Dynamic programming technique:

Is it a decision or optimization problem?
Define a solution recursively (inductively) by composing the solution to smaller problems.
Identify any sharing in the recursive calls, i.e. calls that use the same arguments.
Model the sharing as a DAG, and calculate the work and span of the computation based on the DAG.
Decide on an implementation strategy: either bottom up top down, or possibly shortest paths.

There are many problems with efficient dynamic programming solutions. Here we list just some of them.

Fibonacci numbers
Using only addition compute $n \choose k$ in $O(nk)$ work
Edit distance between two strings
Edit distance between multiple strings
Longest common subsequence
Maximum weight common subsequence
Can two strings S1 and S2 be interleaved into S3
Longest palindrome
longest increasing subsequence
Sequence alignment for genome or protein sequences
Subset sum
Knapsack problem (with and without repetitions)
Weighted interval scheduling
Line breaking in paragraphs
Break text into words when all the spaces have been removed
Chain matrix product
Maximum value for parenthesizing x1/x2/x3.../xn for positive rational numbers
Cutting a string at given locations to minimize cost (costs $n$ to make cut)
All shortest paths
Find maximum independent set in trees
Smallest vertex cover on a tree
Optimal BST
Probability of generating exactly $k$ heads with $n$ biased coin tosses
Triangulate a convex polygon while minimizing the length of the added edges
Cutting squares of given sizes out of a grid
Change making
Box stacking
Segmented least squares problem
Counting Boolean parenthesization – true, false, or, and, xor, count how many parenthesization return true
Balanced partition – given a set of integers up to k, determine most balanced two way partition
Largest common subtree

Subset Sums

The subset sum (SS) problem: can a subset of numbers $S$ I give you sum to a number $k$ ?

SS is NP-hard.

But... as long as $k$ is polynomial in $|S|$ then the work is also polynomial in $|S|$ . We will try to find a pseudo-polynomial work (or time) solution.

Why do we say the SS algorithm we described is pseudo-polynomial? The size of the subset sum problem is defined to be the number of bits needed to represent the input. Therefore, the input size of $k$ is $\log k$ But the work is $O(2^{\log k}|S|)$ , which is exponential in the input size. That is, the complexity of the algorithm is measured with respect to the length of the input (in terms of bits) and not on the numeric value of the input.

A klzzwxh:0004 work top-down approach. First consider an element included in the set, then consider an element not included in the set. — A $O(2^n)$ work top-down approach. First consider an element included in the set, then consider an element not included in the set.

Same algorithm without duplicated computation

We can improve work by observing that there is a large amount of sharing of subproblems.

Complexity: notice there are $|S|$ many levels and each level has at most $k+1$ vertices, giving us $O(k|S|)$ work and $O(|S|)$ span.

Minimum Edit Distance

Minimum Edit Distance: given a character set $\Sigma$ , what is the minimum number of insertions and deletions of single characters required to transform $S = \Sigma^* \to T = \Sigma^*$ ?

The algorithm used in the Unix diff utility was invented and implemented by Eugene Myers, who also was one of the key people involved in the decoding of the human genome at Celera.

Complexity: notice there are $|S| + 1$ many possible input to the first argument and $|T| + 1$ many possible input to the second argument. Giving $O(|S||T|)$ work. Since each recursive call either removes an element from $S$ or $T$ , depth is $O(|S| + |T|)$ , giving $O(|S| + |T|)$ span.

As in subset sum we can again replace the lists used in MED with integer indices pointing to where in the sequence we are currently at. This gives the following variant of the MED algorithm:

Optimal Binary Search Trees

Optimal Binary Search Tree (OBST): given an ordered set of keys $S$ and a probability function $p : S \to [0, 1]$ : (where $Trees(S)$ is the set of all BSTs on $S$ and $d(s, T)$ is the depth of the key $s$ in the tree $T$ (the root has depth $1$ ))

$\min_{T \in Trees(S)} \left(\sum_{s \in S} d(s, T) \cdot p(s)\right)$

optimal substructure property: Observe now that each subtree of the root is an optimal binary search tree. This property is sometimes a clue that either a greedy or dynamic programming algorithm might apply.

greedily construct tree by setting highest probability as root does not yield solution.

Say we have the optimal tree, how do we calculate its cost?

The cost of the subtree is the probability of accessing the subtree (left subtree, root, or right subtree) plus the cost of left and right subtrees.

To avoid $O(2^n)$ cost, we share recursive solution. Observe that any subtree (whether balanced or not) of a BST on $S$ contains the keys of a contiguous subsequence of $S$ , we count the number of possible arguments to OBST by counting the total number of contiguous subsequences. There are $n(n+1)/2 \in O(n^2)$ many vertices and longest path is $O(n)$ .

Each vertex (recursive call not including the subcalls) costs $O(n)$ work $O(\log n)$ span when taking the minimum. So total algorithm is $O(n^3)$ work $O(n\log n)$ span.

Recursive Optimal Binary Search Tree (indexed)

This example of the optimal BST is one of several applications of dynamic programming based on trying all binary trees and determining an optimal tree given some cost criteria. Another such problem is the matrix chain product problem.

Implementing Dynamic Programming

Bottom-Up Method

For example, consider Minimum Edit Distance problem for the two strings $S = tcat, T = atc$ .

Vertex represent current comparison. We go from bottom right to top left, removing letters from the end.

The best way to schedule bottom-up method is from top left base cases to bottom right.

pebbles: satisfy parent executing condition by clearing dependency.

This algorithm pebbles the DAG diagonally and stores the result of each vertex in a table $M$ . Because the table is indexed by two integers, it can be represented by an array, which allows constant work random access. Each call to diagonals processes one diagonal and updates the table $M$ . The size of the diagonals grows and then shrinks. We note that the index calculations are tricky.

Top-Down Method

Implementing Memoization: we want fast look up and store. balanced search trees (requires a total order), hash tables (requires effective hashing algorithm), or indexing (set up surrogate integer values to represent the input values) are good choice.

klzzwxh:0053 stores function klzzwxh:0054. If given argument klzzwxh:0056 found in table, return result. Otherwise, compute klzzwxh:0055, store, and return result. — `memo` stores function `f`. If given argument $a$ found in table, return result. Otherwise, compute `f`, store, and return result.

MED: The pseudo-code below uses memoization to achieve sharing of solutions to subproblems.

The above code is purely functional, is inherently sequential by locking. But we could: - use hidden state to implement the memo function such that the memo table is used implicitly - use concurrent hash tables to store the results so that parallel calls might be in flight at the same time, and - use synchronization variables to make sure that no function is computed more than once.

Strategies

Tips:

assume we have access to a constant data structure that has $O(n)$ size
a subproblem can only has $O(1)$ representation (usually indices), but share the data structure with other subproblems (Think subproblems as oracles that we can call many times to do some search. Once we find the optimal search result, only $O(1)$ is needed to return the right answer)
we can partition the problems as
- a interval split into left/right intervals: best matrix multiplication paraphrasis, hashtag, gorillas game (with additional parameter as target character)
- a interval split into left/right intervals and a middle value, colored frog village, OBST
- shrink interval by 1 from left or right: picking number from interval end game, subset sum
- two intervals, shrink one of them by 1 from right: edit distance
- two subtree to get to the root: shortest path in grid where you only walk up or right
- number of steps and where to go: Bellman-Ford
- convolution: seam carving
we can define indices as:
- tuples representing interval in data structure we should care about
- index separating only a portion of data structure we should care about

Table of Content