# Review 003

## Heaps

Binary Heap:

• shape property: all levels have 2 leafs except for last level

• heap property: parents are bigger/smaller than children

• Calculation: heap can be implemented using sequence

• left i = 2i + 1
• right i = 2i + 2
• parent i = ceil(i/2)-1
• (out of range means no such element)
• Implementation

• insertion: add new key to end of sequence, and swap up
• deletion: removing root, replace and swap children down
• Bound: same as balanced BST but faster in practice when maximum size is known (otherwise need dynamically sized array)

Leftist Rank Lemma: rank of the root node is at most $\log_2 (n+1)$.

## DP

Tips:

• assume we have access to a constant data structure that has $O(n)$ size

• a subproblem can only has $O(1)$ representation (usually indices), but share the data structure with other subproblems (Think subproblems as oracles that we can call many times to do some search. Once we find the optimal search result, only $O(1)$ is needed to return the right answer)

• we can partition the problems as

• a interval split into left/right intervals: best matrix multiplication paraphrasis, hashtag, gorillas game (with additional parameter as target character)
• a interval split into left/right intervals and a middle value, colored frog village, OBST
• shrink interval by 1 from left or right: picking number from interval end game, subset sum
• two intervals, shrink one of them by 1 from right: edit distance
• two subtree to get to the root: shortest path in grid where you only walk up or right
• number of steps and where to go: Bellman-Ford
• convolution: seam carving
• we can define indices as:

• tuples representing interval in data structure we should care about
• index separating only a portion of data structure we should care about

## MST

Algorithms:

• Prim's: priority queue like Dijkstra $O(m \log n)$ work span

• Kruskal's: add shortest edge with sort, cycle detection with union find $O(m \log n)$ work span

• Boruvka's: $O(m \log n)$ work (expectation)

• Tree Contraction: $O(\log^3 n)$ span (expectation)
• Star Contraction: $O(\log^2 n)$ span (expectation)

Edge Contraction:

• Maximum Vertex Matching: pick maximum edges that don't share endpoint, which can be solved in $O(|E|\sqrt{|V|})$ work.

• Randomized Parallel Vertex Macthing: adding vertex bridge, $O(k\log |E|)$ rounds W.H.P, $O(|E|)$ work, $O(\log^2 |E|)$ span

• $1/8$ for cycle: select all edges if (1) the edge is HEAD (2) all nearby edges are TAIL
• $1/4$ for cycle: each vertex $u$ pick one of its neighbor $v \in (V - \{u\})$, select when overlap
• $1/3$ for cycle: elect te edge if it is local maximum (random assign float number)

Star Partition: flip coin on vertex, HEAD or TAIL but no neighbor star is star ($O(|E|+|V|)$ work $O(\log |V|)$ span, contract $1/4$)

Star Contract:

• enumerable: $O(n)$ work

• non-enumerable: $O((|V|+|E|)\log |V|)$ work (non-isolated $W(n, |E|) \leq W(n', |E|) + O(n + |E|) \implies O(|V| + |E| \log |V|)$, isolated add additional $O(|V|\log|V|)$)

• $O(\log^2|V|)$ span, assuming

• base: $O(|V|)$ work and $O(1)$ span

• expand: $O(|V|+|E|)$ work and $O((|V| + |E|)\log|E|)$ span

## Graph

Dijkstra's Algorithm: $O(m + n \log n)$

A* Algorithm: $h(u) \leq \delta(u, v) + h(v)$, $h(v) \leq \delta(v, t)$ (optional), $h(t) = 0$

Bellman-Ford: $O(mn\log n)$ work, $O(n \log n)$ span with table, $O(mn)$ work, $O(n \log n)$ span with sequence

• negative cycle: if the $|V|$-th round is still updating shortest path, $2|V|$ for negative to propagate

Johnson's: $O(mn \log n)$, $O(m \log n)$, update edge weight with $w'(u, v) = w(u, v) + p(u) - p(v)$.

BFS: $O(d\log n)$ span and $O(m+n)$ work

DFS: tree-based $O((m+n)\log n)$, array sequence and adjacency sequences $O(m+n)$

• In undirected graph, there is no forward edge and no cross edge. Only tree edge and back edge are possible.

Algorithm:

1. we first sort the entire graph in topological order using decreasingFinish using DFSAll. This topological order is also the topological order of strongly connected components
2. We transpose the graph (flip every edge)
3. We start many instances of DFSReach in transposed graph with topological order until graph is traversed.

Graph Strategies: adding vertices, adding edges, adding positive, negative, zero weights to edges, run algorithm twice, transpose the graph

## Sets, Table, Recurrence, Probability, Metrics

With augmented table we can implement reduceVal : T -> V given associative function f : V * V -> V and identity I in $O(1)$ work.

W(n) = \alpha W\left(\frac{n}{\beta}\right) + O\left(f(n)\right)

The parent layer has work: $f(n)$. The children layer has work: $\alpha f(\frac{n}{\beta})$. We compare them to find out whether it is root dominated or leaf dominated or balanced.

For leaf-dominated: there are $\alpha^{\log_\beta n}$ leaves, giving $O((\alpha^{\log_\beta n})\times f(1)) = O(n^{\log_\beta \alpha})$ cost of the tree.

For balanced: there are $\sum_{i = 1}^{\log_\beta n} \alpha^{i}$ nodes, giving $O(\sum_{i = 1}^{\log_\beta n} \alpha^{i} f(n/\alpha^i))$ cost of the tree.

Markov Bound: for non-negative random variable $X \geq 0 \land a > 0 \implies Pr\{X \geq a\} \leq \frac{E[X]}{a}$. Conditional Probability: $Pr\{A \cap B\} = Pr\{A\} \cdot Pr\{B | A\} = Pr\{B\} \cdot Pr\{A | B\}$

Performance: $R^*$: troughput of fast sequential baseline. $R_1$: throughput on $1$ processor. $R_P$: throughput on $P$ processors.$T^*$: time of fastest sequential. $T_P$: time of parallel on $P$ processors. Work-Efficiency: $R_1 / R^*$. Self-Speedup: $R_P / R_1$, $T_1 / T_P$. Parallel throughput: $R_P$. Overhead: $T_1 / T^*$

## Algorithm Design

### Randomization

Designing a randomized algorithm needs to be done in terms of the expected bounds, rather than looking at the worst-case or best-case. This means considering all cases of whatever random bits you are using (so all possibilities for a sequence of coinflips or random pivot selections). It can often be useful to know how to write randomized algorithms that are based on old ones, such as quicksort and quickselect which were discussed in lecture.

While not explicitly on the problem set, high probability bounds are fair game for this midterm, so know how to write a good high-probability bound proof. Remember that in 15-210, we want to show that the probability of termination in $O(k \log n)$ rounds is at least $1 - \dfrac 1{n^k}$ for any choice of $k$.

### BSTs/Treaps

Make sure you have a good understanding of the properties - the BST key ordering property and the treap/heap priority property. Remember that you do not have control over the priorities, so if you need to take advantage of any sorted information, it'd likely have to be stored in the keys.

Know how to use the BST functions such as join/joinMid and split. Many problems dealing with implementing a BST function will use these functions. You might want to think about how to implement things such as union/intersection/difference using these functions as practice. Keep in mind that it is possible to store additional information in each node (this is how Tables and Augmented Tables are implemented, we just add more information into the nodes).

### Augmented Tables

For the algorithm design problems, it is often simpler to think about tables as key -> value pairs rather than as a treap.

For these types of problems, you have to define what the structure of your table looks like. This include your key, value, augmenting function, and identity. Don't forget the identity as well!

These problems typically also involve staging, where you have one phase where you build a table based on given data, say a sequence, and the next phase is answering queries in logarithmic time. Queries are usually answered by splitting the table and using reduceVal. Some problems also require supporting an update function - this really restricts what values/augmenting function can be. We'll look at augmenting functions more in a bit. For these types of problems, you have to define what the structure of your table looks like. This include your key, value, augmenting function, and identity. Don't forget the identity as well!

These problems typically also involve staging, where you have one phase where you build a table based on given data, say a sequence, and the next phase is answering queries in logarithmic time. Queries are usually answered by splitting the table and using reduceVal. Some problems also require supporting an update function - this really restricts what values/augmenting function can be. We'll look at this in a bit.

Keys - A property of keys is that they are the only thing we can control that will be sorted (we don't get to control the treap priorities) and have a total ordering (less, greater, equal). We also typically query the table by splitting, which means it'll split on the keys. Therefore, whatever we pick as the key should typically match what the query type is, or the query type can easily be converted such that splitting makes sense. If you see problems involving performing an aggregate function on an interval of a sequence, typically keys will be the index, so that you can split once to get the left and right side, or split twice to get an interval. For the interval stabbing problems in recitation, the keys were usually the coordinate of the endpoint. They'll often be representative of something spatial or geometric, because they are total ordered, and splitting usually gets a range or interval.

Values and Augmenting Function - They should generally be picked to answer our specific problem. One thing to note is that the reduced value of a table is the augmenting function applied on all values of the table. You do not get to specify which side or subset of values you're applying it on. In order to get the reduceVal of the left side of the table, we have to split at a key to get (L, x, R), and then call reduceVal on the entire left table L. If the problem was to get the maximum number in a subsequence, then the key would be index, value would be the number at that index, and augmenting function would be Int.max, for example. If you find yourself making a complex value or augmenting function, pause and reflect on if it is overcomplicated. For the most part, values and especially augmenting functions are simple to state.

If the problem requires updates, such as changing the element at an index or adding a new element, values cannot be defined as an aggregate function! Let's take for example the maximum number in a subsequence problem, where we also have to support an update (i, x)operation. Your value cannot be anything like "the maximum value associated with a key smaller than i". If we do an update, you could potentially have to update all values with keys greater than it, which could be $O(n)$ time. Generally, if you see yourself trying to do some aggregate function for values, see if you can do it with the augmenting function instead - this is what it's for!

### Graph Search and Shortest Paths

You haven't finished the lectures on this topic, but here are some tips for these sorts of problems once we get there:

I think it is really important to have a strong conceptual understanding of when you can use each algorithm and what they do. Here are some highlights:

DFS is used on unweighted graphs. DFS will not find the shortest paths to other nodes; however, the nature of DFS can help us calculate a lot of different properties that shortest path algorithms can't do. A lot of the design is in the visit, revisit, and finish functions. Remember that when you "finish" a node and backtrack up the DFS tree, that this is not a revisit case - revisits only occur when you are trying to DFS onto an already visited node. Reviewing how we use DFS and visit times to detect cycles, DFS numbers to store some interesting properties from the result of our subtree DFS, and how DFS is used in Kosaraju's for Strongly Connected Components would be helpful.

BFS is also used on unweighted graphs, and it will find the shortest path to other nodes.

Dijkstra's is used on weighted graphs and is completely sequential. It'll find the shortest paths to other nodes, provided that edge weights are positive. Dijkstra's cannot be used on any graph with negative weights, since it relies on the assumption that once you visit the node, you have found the best path to that node. A lot of design here is changing the priority function, which will affect which nodes we visit next and properties that we can get. Bellman-Ford can be used on graphs with negative edge weights, and it can also detect cycles at the end of the n iterations by checking to see if any of the distances change.

Johnson's can be used for all pairs shortest paths (or n Dijkstra's calls if the edge weights are positive)

Knowing and understanding the cost bounds are also important. Often times, the cost bounds provided for the problem can give a hint to what algorithm is expected. For example, DFS and Dijkstra's are the sequential algorithms, so if you see the same cost bound for both work and span, it's likely these two. The different algorithms also have unique cost bounds. Do learn the cost bounds for both non-enumerable and enumerable graphs, often the enumerable versions are a log factor less in cost.

Many problems will also involve graph transformations - either constructing a graph or transforming a graph so that you can run a graph search/shortest path algorithm, so be on the lookout for problems where you can't just run an algorithm on the original graph. An example in lecture was the currency exchange problem of finding the maximum product path, where we took the negative log of each edge. Another common instance is where we may make multiple copies of a graph, and then only link some certain edges between the graphs that would match the constraint of the problem. This is where understanding the core requirements of each algorithm can also be really helpful.

## Other Algorithm Design Tips

Brute Force - We want to generate all possible outputs that the problem can have, and pick either the most optimal or the correct output, depending on the type of problem. We are looking for optimal if the problem has some metric to optimize by, such as the contiguous subsequence with maximum sum, and otherwise we look for correctness if there is only one correct answer, such as generating all possible permutations of a sequence and picking the one that is sorted in order to sort a sequence. Brute force is typically very work inefficient, but it is also highly parallel because we can "check" each of the possible outputs in parallel.

One thing to note is that brute force solutions can provide insights to how to achieve a more efficient algorithm. From the brute force solution, we can often try to identify what part(s) of the algorithm are inefficient and try to improve on that redundant computation. For instance, the brute force algorithm for MCSS involved checking every subsequence, and then summing up each subsequence to find the one with the maximum sum. This was $O(n^3)$. Now, how did we improve upon this bound? We noticed that a lot of summations were re-used, which allowed us to both create the "running sum" iterative solution as well as the divide-and-conquer version. One other thing to note is that "optimized" solutions can sometimes contain elements of brute force. Trying all the possible solutions isn't always a bad thing!

Divide and Conquer - We should have base case(s) and split the problem into multiple (most commonly two, but not necessarily two) parts. We recursively solve each of the parts, and then do some computation to "combine" all of the results into an answer for our overall algorithm. One question you should ask while designing an algorithm is, given the answer to each of my parts, can I use those answers to solve my original problem? Perhaps not. If that is the case, then remember that we need to strengthen the problem - which means that we need to solve a more complex problem using divide and conquer, and then finally, using the results of our solution to the more complex problem, we can solve our original instance.

Contraction - We also need base case(s), and the idea is to "contract" / combine elements to form a smaller input, recursively solve that smaller input, and then expand our result from the recursive subproblem to get the answer for our original instance. An example of a contraction algorithm is how scan is implemented. You can think of contraction as basically divide and conquer but you only have one subproblem.

Reductions - A problem X reduces to problem Y if we can solve an instance of problem X using a black-box implementation for problem Y. This involves converting our input to problem X into an input to problem Y, solving problem Y on that input, and then use the output in order to get the correct output for the original problem. When converting the input to problem X to input for problem Y, we must satisfy all the necessary preconditions and assumptions for Problem Y.

Reductions can help us prove lower bounds. We can intuitively think of reductions as saying that "Problem X is at least as difficult as problem Y". Let's say that problem $X$ is known to have $\Omega(f(n))$ cost on an input of size $n$. Suppose that converting our input to an input of problem Y, as well as transforming back to a solution for problem $X$, also costs at most $O(f(n))$. Let the algorithm for problem Y cost $O(g(n))$. This means that we have a solution for problem X that runs in $O(f(n)) + O(g(n))$, which then implies that the cost of problem $Y$ is also in $\Omega(f(n))$, because otherwise we would have a faster algorithm for problem X. For example, an efficient reduction from sorting, which has $\Omega(n \lg n)$, to another problem means that the other problem must also have $\Omega(n \lg n)$. Notice the key distinction in the word efficient, in this example meaning that the transformations must be in at most $O(n \lg n)$. If for example the transformation was in $O(n^2)$, then we have an $O(n^2) + \text{work of problem Y}$ which tells us nothing about the lower bound of problem Y, because there is no contradiction.

Problem solving tip: Can I break the problem down to more manageable parts, and solve each part?

Algorithms are naturally a sequence of steps (well, in this class, some of these steps can be done in parallel). If we can separate our problem into different parts, then now we have multiple smaller and easier problems to solve.

For example, in brute force algorithms, we know we have to generate all possible outputs, and then check each output to see which one is the correct or the best. This naturally breaks down the problem into two parts. We can typically do the generation part by something like generating all possible subsequences, for example, and then we want to check which one best fits the problem. Perhaps we could map each output to a corresponding metric and then use reduce to pick the best one, or use a filter to take only the ones that fit.

Sometimes, our expected output can also help us break down the problem. For example, if we're asked to return a sequence of tuples, it might be useful to first find the first element of each tuple, and then find the corresponding second elements, then third elements, and so on and so forth. Understanding the requirements of what's expected to be returned can provide a pathway for how your algorithm can operate.

Problem solving tip: Have I seen a problem similar to this before?

Many problems are going to end up looking similar to a problem previously solved in lecture, recitation, homework, etc.. If there is a similar problem, ask what intuition can you draw from it? Some problems also sneakily are related to each other but may not appear so at first glance.

Problem Solving Tip: Changing the problem to obtain intuition

Another trick is seeing if we can change our original problem, either by adding or relaxing certain constraints. In particular, we want to do this such that it becomes easier to solve and hopefully provide us with some intuition.

Problem Solving Tip: Looking at the cost bounds

Often times, the cost bounds given in the problem can provide a hint as to what is expected. This may come into play later in the semester when we introduce more algorithms with different cost bounds, but for example, an $O(\lg^2 n)$ span algorithm may indicate that you can do $\lg n$ levels of a divide and conquer algorithm, with each one taking $O(\lg n)$ which would permit usage of functions like scan and filter. An $O(n \lg n)$ work and $O(\lg^2 n)$ span algorithm may suggest that sorting could be involved, because that is the cost of sorting.

Table of Content