Lecture 099 - Graphs, PFS, BFS, DFS

Graphs

Undirected Graph: can be represented as directed graphs where each edge is replaced with two edges in opposite direction.

Transpose: flip the direction of every edge.

Example: enumerable graph representation

Example: enumerable graph representation

Representation of Visited Set: because visited set is used sequentially, we can represent it with an ephemeral way with update and nth constant work span.

Ephemeral and Single-Threaded Sequences

Persist Datastructure: A data structure such that all operations are to generate new data without modifying the original input data. Think about const keyword in C++ where all inputs are non-mutable.

Ephemeral Data Structures: A data structure such that all operations are modify the original data structure without outputting anything or output a data structure but invalidate the original data structure.

Ephemeral Sequences: for a sequence of length n

STSequence is persistent, but it uses benign effects internally. How is it possible? Well, the above cost bound only holds if you use the lastest version of the datastructure. The cost bound changes when you try to access an earlier version.

Graph Search

There are Priority-First Search (PFS), Breath-First Search (BFS), Depth-First Search (DFS).

Source: starting vertex of search

Out neighbors: out neighbors of vertex v in graph G is N_G^+(v).

Frontier Set: set of un-visited out-neighbors N_{G}^+(X) - X where X is visited set.

Generic graph search from single source: initialize a vertex stack F with initial vertex source s, when visit, append vertex to X. U is a (potentially singleton) set choosing from F, depending on specific algorithm.

graphSearch(G, s) =
  let
    explore X F =
      if |F| = 0 then X
      else let
        choose U in F (* choose a vertex in unvisited *)
        visit U
        X = append(X, U) (* update visited *)
        F = neighbore(X) - X (* update unvisited stack *)
      in
        explore X F
      end
  in
    explore {} {s}
  end

Note that above algorithm does not visit all the vertices in the graph, especially when there is no path.

Above graph search is generic one, depending how you choose U, we can build BFS, DFS, PFS... If U is a set, BFS can be parallel, but DFS must be sequential.

Graph Reachability Problem: for a grpah G = (V, E) and vertex v \in V, return all vertices U \subseteq V that are reachable from v. Graph search solves reachability.

For undirected graph, graph reachability is the same as finding connected component that contain v. But this algorithm is sequential, we can do it in parallel using graph contraction.

Priority-First Search (PFS)

Used to implement Breadth-First Search, Dijkstra’s algorithm and Prim’s algorithm.

Options to pick set of vertices U to visit

PFS is a greedy algorithm

Breadth-First Search (BFS)

BFS can be used:

Distance: distance \delta_G(s, v) from s to v is the length of shortest path connecting s to v

Here is a sequential BFS:

BFSReach (G = (V, E)) s = let
  explore X F i =
    if |F| = 0 then (X, i)
    else let
      (u, j) = argmin_{(u, k) in F} (k) (* choose next vertex u (with depth j) such that it has smallest depth k *)
      X = append(X, u) (* mark vertex u as visited *)
      F = remove(F, (u, j))
      F = append(F, (v, j+1) | v in N_G^+(u) and v not in X and (v, _) not in F) (* append unvisited out neighbores of visited vertex u to stack *)
    in explore X F j end
  in explore {} [(s, 0)] 0 end

Unlike DFS, to keep BFS data structure simple (ie. merge visited set X and frontier F), we need to use priority queue that support push and pop in O(1).

Cost: since we do |V| many push, and checking whether each neighbore is visited at most |E| times, the sequential BFS is O(|V| + |E|) work.

Prallel BFS

Prallel BFS

Parallel BFS:

BFSReach (G = (V, E), s)= let
  explore X F i =
    if |F| = 0 then (X, i - 1)
    else let
      X = append(X, F)
      F = remove(X, G_G^+(F))
    in explore X F (i + 1) end
  in explore {} {s} 0 end

Example: directed graph BFS

Example: directed graph BFS

There is no difference for directed graph. In both directed and undirected, when we visit vertex v we don't add parent of v and v itself to F.

Cost: the algorithm requires O(m \log n) work and O(d \log^2 n span (where d is the largest distance of any reachable vertex from source vertex)

// TODO: Bounding Cost using Aggregation: https://www.diderot.one/courses/136/books/580/chapter/8115#atom-590479

We can also store a distance in X:

Storing distance using BFS

Storing distance using BFS

To calculate shortest-path tree (where we can compute distance from s to v by follow path on tree), we choose either of the algorithm:

BFS Tree with Sequence: calculate shortest paths from v to any vertex u in graph G. What you get is many flattened version of paths (denoted as sequence of vertices) from u back to u.

  1. visit frontier layer.
  2. get all the edges from next layer to frontier layer.
  3. flatten those edges.
  4. for each to vertex, select at most (to, from) edge using Seq.inject to output.
  5. update frontier layer contain unvisited nodes

BFS Tree with Sequence

BFS Tree with Sequence

BFS Tree Example

BFS Tree Example

BFS Tree Implementation

BFS Tree Implementation

BFS Tree Trace: klzzwxh:0054 means vertex klzzwxh:0055's parent is klzzwxh:0056.

BFS Tree Trace: X_i[v] = u means vertex v's parent is u.

Cost of BFS Tree with Sequence

Cost of BFS Tree with Sequence

// TODO: https://www.diderot.one/courses/136/books/582/chapter/8151#atom-591585

Depth-First Search (DFS)

DFS can be used:

DFSReach (G, s) = let
  DFS (X, v) =
    if v in X then X
    else iterate DFS (append(X, v)) N_G^+(v)
  in DFS ({}, s) end

DFS Example

DFS Example

Generic DFS

Here we present a generic DFS:

Generic DFS: DFS visit one reachable component, DFSall visit all reachable components.

Generic DFS: DFS visit one reachable component, DFSall visit all reachable components.

In generic DFS, visit, finish, revisit are user defined function that modifies user defined structure \Sigma. X is a boolean sequence denoting whether a vertex is visited.

If your out-neighbore include your parent, you might revisit your parent. But since the parent is already marked as visited, it will not get expanded.

DFS Numbers and Edge Classification

DFS Numbers: When running the algorithm, we can mark each vertex the time when we first visit (visit) the vertex and the time when we finish (finish) visit the vertex. Those two numbers associated with vertex are called DFS Numbers

DFS Numbers

DFS Numbers

tree, back, forward, and cross edge, in unordered view

tree, back, forward, and cross edge, in unordered view

tree, back, forward, and cross edge. All edges appear in the original graph, we just classify them. Unlabeled edges in black are tree edges.

tree, back, forward, and cross edge. All edges appear in the original graph, we just classify them. Unlabeled edges in black are tree edges.

In addition, we can classify the edge in original graph into:

In undirected graph, there is no forward edge and no cross edge. Only tree edge and back edge are possible.

In addition, we can just look at DFS numbers to classify edges:

Classify Edges with DFS Number

Classify Edges with DFS Number

Costs

Costs:

Cycle Detection

From classification, we know:

To find cycle, there are two methods:

  1. generate DFS number can check for back edge
  2. check for back edge directly using a flag and ancestor stack

Directed Graph Only: Cycle Detection using Generic DFS

Directed Graph Only: Cycle Detection using Generic DFS

Applying the cycle-detection algorithm to undirected graphs does not work, because it will find that every edge forms a cycle of length two. So in addition, we need to check make sure back edge don't go to direct ancestor (ie. parent)

Topological Sort

Partial vs Total Order

Partial vs Total Order

Directed Acyclic Graph (DAG) obeys:

Partial order allow unordered (neither a \leq b \lor b \leq a satisfy) - this is the case when two nodes are not reachable.

To reach total order, you have two choices:

  1. You can always pick a subset of vertex that satisfy total order.
  2. Assign arbitrary order to unordered elements (This is what we use for topological sort)

Topological Sort: is a total ordering on DAG such that (\forall (v_i, v_j) \in E)(i < j). There might be many possible topological sort for a graph.

If we order vertices from highest finish time to lowest finish time (decreasingFinish), we obtain a topological sorted sequence.

Strongly Connected Components (SCC)

Strongly Connected Graph: a directed graph is strongly connected if all vertices can reach (not necessarily directly connect to) each other.

Strongly Connected Components: a subgraph H of G that is strongly connected graph and maximal (ie. adding more vertices and edges from G into H will break strong connectivity of H).

Components DAG: contracting strongly connected components in a graph into a vertex and eliminating duplicate edges between components.

Strongly Connected Components (left) and Component DAG (right)

Strongly Connected Components (left) and Component DAG (right)

Strongly Connected Components (SCC) problem: find the strongly connected components of a graph and returning them in topological order.

For example, we need to return [\{c, f, d\}, \{a\}, \{e, b\}] for above graph.

Algorithm:

  1. we first sort the entire graph in topological order using decreasingFinish. This topological order is also the topological order of strongly connected components
  2. We transpose the graph (flip every edge)
  3. We start many instances of DFAReach in transposed graph with topological order until graph is traversed.
SCC (G = (V, E)) = let
  F = decreasingFinish G
  G^T = transpose G

  SCCOne ((X, comps), v) = let
    (X', comp) = DFSReach G^T (X, v)
  in
    (X', comp::comps) (* here: you can check for empty comp if you want *)
  end
in
  iterate SCCOne ({}, []) F
end

SCC Example

SCC Example

Parallel DFS

Making DFS parallel is hard. Depth-first search is known to be P-complete, a class of computations that can be done in polynomial work but are widely believed not to admit a polylogarithmic span algorithm. A detailed discussion of this topic is beyond the scope of this book, but it provides evidence that DFS is unlikely to be highly parallel.

Why DFS is good:

Table of Content