# Lecture 099 - Graphs, PFS, BFS, DFS

## Graphs

Undirected Graph: can be represented as directed graphs where each edge is replaced with two edges in opposite direction.

Transpose: flip the direction of every edge.

Representation of Visited Set: because visited set is used sequentially, we can represent it with an ephemeral way with update and nth constant work span.

Persist Datastructure: A data structure such that all operations are to generate new data without modifying the original input data. Think about const keyword in C++ where all inputs are non-mutable.

• safe for parallism, since no data can be modified and therefore only concurrent read can happen

• update, inject will require $\Omega(|a|)$ if using array sequence implementation or $\Omega(\log |a|)$ using tree sequence implementation. They are slow.

Ephemeral Data Structures: A data structure such that all operations are modify the original data structure without outputting anything or output a data structure but invalidate the original data structure.

• sequential algorithm can use ephemeral data structure safely.

• you should not re-use datastructure input that passed into a ephemeral datastructure's operations.

Ephemeral Sequences: for a sequence of length $n$

• update: $O(1)$ work span

• inject: $O(n)$ work, $O(\log n)$ span

• ninject: $O(n)$ work, $O(1)$ span

STSequence is persistent, but it uses benign effects internally. How is it possible? Well, the above cost bound only holds if you use the lastest version of the datastructure. The cost bound changes when you try to access an earlier version.

## Graph Search

There are Priority-First Search (PFS), Breath-First Search (BFS), Depth-First Search (DFS).

Source: starting vertex of search

Out neighbors: out neighbors of vertex $v$ in graph $G$ is $N_G^+(v)$.

Frontier Set: set of un-visited out-neighbors $N_{G}^+(X) - X$ where $X$ is visited set.

Generic graph search from single source: initialize a vertex stack $F$ with initial vertex source $s$, when visit, append vertex to $X$. $U$ is a (potentially singleton) set choosing from $F$, depending on specific algorithm.

graphSearch(G, s) =
let
explore X F =
if |F| = 0 then X
else let
choose U in F (* choose a vertex in unvisited *)
visit U
X = append(X, U) (* update visited *)
F = neighbore(X) - X (* update unvisited stack *)
in
explore X F
end
in
explore {} {s}
end


Note that above algorithm does not visit all the vertices in the graph, especially when there is no path.

Above graph search is generic one, depending how you choose $U$, we can build BFS, DFS, PFS... If $U$ is a set, BFS can be parallel, but DFS must be sequential.

Graph Reachability Problem: for a grpah $G = (V, E)$ and vertex $v \in V$, return all vertices $U \subseteq V$ that are reachable from $v$. Graph search solves reachability.

For undirected graph, graph reachability is the same as finding connected component that contain $v$. But this algorithm is sequential, we can do it in parallel using graph contraction.

## Priority-First Search (PFS)

Used to implement Breadth-First Search, Dijkstra’s algorithm and Prim’s algorithm.

Options to pick set of vertices $U$ to visit

• the highest priority vertex, breaking ties arbitrarily

• all highest priority vertices, or

• all vertices close to being the highest priority, perhaps the top $k$ (this is beam search)

PFS is a greedy algorithm

BFS can be used:

• finding shortest (unweighted path)

• determing if graph is bipartite

• bounding diameter of undirected graph

• partitioning graphs

• used in Ford-Fulkerson's algorithm

Distance: distance $\delta_G(s, v)$ from $s$ to $v$ is the length of shortest path connecting $s$ to $v$

Here is a sequential BFS:

BFSReach (G = (V, E)) s = let
explore X F i =
if |F| = 0 then (X, i)
else let
(u, j) = argmin_{(u, k) in F} (k) (* choose next vertex u (with depth j) such that it has smallest depth k *)
X = append(X, u) (* mark vertex u as visited *)
F = remove(F, (u, j))
F = append(F, (v, j+1) | v in N_G^+(u) and v not in X and (v, _) not in F) (* append unvisited out neighbores of visited vertex u to stack *)
in explore X F j end
in explore {} [(s, 0)] 0 end


Unlike DFS, to keep BFS data structure simple (ie. merge visited set $X$ and frontier $F$), we need to use priority queue that support push and pop in $O(1)$.

Cost: since we do $|V|$ many push, and checking whether each neighbore is visited at most $|E|$ times, the sequential BFS is $O(|V| + |E|)$ work.

Parallel BFS:

BFSReach (G = (V, E), s)= let
explore X F i =
if |F| = 0 then (X, i - 1)
else let
X = append(X, F)
F = remove(X, G_G^+(F))
in explore X F (i + 1) end
in explore {} {s} 0 end


There is no difference for directed graph. In both directed and undirected, when we visit vertex $v$ we don't add parent of $v$ and $v$ itself to $F$.

Cost: the algorithm requires $O(m \log n)$ work and $O(d \log^2 n$ span (where $d$ is the largest distance of any reachable vertex from source vertex)

// TODO: Bounding Cost using Aggregation: https://www.diderot.one/courses/136/books/580/chapter/8115#atom-590479

We can also store a distance in $X$:

To calculate shortest-path tree (where we can compute distance from $s$ to $v$ by follow path on tree), we choose either of the algorithm:

• BFS Tree with Sequence

• Unweighted Shorted Paths

BFS Tree with Sequence: calculate shortest paths from $v$ to any vertex $u$ in graph $G$. What you get is many flattened version of paths (denoted as sequence of vertices) from $u$ back to $u$.

1. visit frontier layer.
2. get all the edges from next layer to frontier layer.
3. flatten those edges.
4. for each to vertex, select at most (to, from) edge using Seq.inject to output.
5. update frontier layer contain unvisited nodes

// TODO: https://www.diderot.one/courses/136/books/582/chapter/8151#atom-591585

## Depth-First Search (DFS)

DFS can be used:

• find cycles in a graph

• topologically sort a DAG

• find strongly connected components

• test whether a graph is bi-connected

DFSReach (G, s) = let
DFS (X, v) =
if v in X then X
else iterate DFS (append(X, v)) N_G^+(v)
in DFS ({}, s) end


### Generic DFS

Here we present a generic DFS:

In generic DFS, visit, finish, revisit are user defined function that modifies user defined structure $\Sigma$. $X$ is a boolean sequence denoting whether a vertex is visited.

If your out-neighbore include your parent, you might revisit your parent. But since the parent is already marked as visited, it will not get expanded.

### DFS Numbers and Edge Classification

DFS Numbers: When running the algorithm, we can mark each vertex the time when we first visit (visit) the vertex and the time when we finish (finish) visit the vertex. Those two numbers associated with vertex are called DFS Numbers

In addition, we can classify the edge in original graph into:

• tree edge: the edge (from, to) we go through when we first visit to

• back edge: the edge (from, to) we findout we already visited to that is an ancestor of from

• forward edge: the edge (from, to) we findout we already visited to that is a child of from

• cross edge: the edge (from, to) we findout we already visited to that is neither an ancestor or a child of from

In undirected graph, there is no forward edge and no cross edge. Only tree edge and back edge are possible.

In addition, we can just look at DFS numbers to classify edges:

### Costs

Costs:

• we make $|E| + |V| = m + n$ calls to DFS where $|E|$ come from calls by DFS itself and $|V|$ comes from calls by DFSall.

• visit and finish: is called $|V|$ times

• revisit is called $(|E| + |V|) - |V| = |E|$ times

• we check whether v is in X $|E| + |V|$ times

• we insert v to X $|V|$ times

• For a tree-based implementation of sets and an adjacency table representation of graphs all operations take $O(\log |V|)$. The total work, assuming user defined functions are $O(\log n)$, then the total cost is $O((m+n)\log n)$

• But using ephemeral array sequences for $X$, and adjacency sequences for the graphs giving $O(1)$ work per operation. The total work, assuming user defined functions are $O(1)$, then the total cost is $O(m+n)$

### Cycle Detection

From classification, we know:

• forward edge: don't create cycle

• cross edge: don't create cycle

• back edge: create cycle

To find cycle, there are two methods:

1. generate DFS number can check for back edge
2. check for back edge directly using a flag and ancestor stack

Applying the cycle-detection algorithm to undirected graphs does not work, because it will find that every edge forms a cycle of length two. So in addition, we need to check make sure back edge don't go to direct ancestor (ie. parent)

### Topological Sort

Directed Acyclic Graph (DAG) obeys:

• transitivity: $a \leq b \land b \leq c \implies a \leq c$, reachability

• antisymmetry: $\lnot (a \leq b \land b \leq a$, this guarantees no cycles

Partial order allow unordered (neither $a \leq b \lor b \leq a$ satisfy) - this is the case when two nodes are not reachable.

To reach total order, you have two choices:

1. You can always pick a subset of vertex that satisfy total order.
2. Assign arbitrary order to unordered elements (This is what we use for topological sort)

Topological Sort: is a total ordering on DAG such that $(\forall (v_i, v_j) \in E)(i < j)$. There might be many possible topological sort for a graph.

If we order vertices from highest finish time to lowest finish time (decreasingFinish), we obtain a topological sorted sequence.

### Strongly Connected Components (SCC)

Strongly Connected Graph: a directed graph is strongly connected if all vertices can reach (not necessarily directly connect to) each other.

Strongly Connected Components: a subgraph $H$ of $G$ that is strongly connected graph and maximal (ie. adding more vertices and edges from $G$ into $H$ will break strong connectivity of $H$).

Components DAG: contracting strongly connected components in a graph into a vertex and eliminating duplicate edges between components.

Strongly Connected Components (SCC) problem: find the strongly connected components of a graph and returning them in topological order.

For example, we need to return $[\{c, f, d\}, \{a\}, \{e, b\}]$ for above graph.

Algorithm:

1. we first sort the entire graph in topological order using decreasingFinish. This topological order is also the topological order of strongly connected components
2. We transpose the graph (flip every edge)
3. We start many instances of DFAReach in transposed graph with topological order until graph is traversed.
SCC (G = (V, E)) = let
F = decreasingFinish G
G^T = transpose G

SCCOne ((X, comps), v) = let
(X', comp) = DFSReach G^T (X, v)
in
(X', comp::comps) (* here: you can check for empty comp if you want *)
end
in
iterate SCCOne ({}, []) F
end


### Parallel DFS

Making DFS parallel is hard. Depth-first search is known to be P-complete, a class of computations that can be done in polynomial work but are widely believed not to admit a polylogarithmic span algorithm. A detailed discussion of this topic is beyond the scope of this book, but it provides evidence that DFS is unlikely to be highly parallel.

Why DFS is good:

• frontier of BFS is memory consuming

• for some large graph (that can't fit into memory, e.g. robot motion planning), computing frontier is infeasible

• DFS has better data locality

Table of Content