Lecture 001 - Problems and Specifications

Functions are functions.

In C programming language, parallelism on 32 cores usually range between 10 to 30 times speed up. Sorting is very parallelizable. In Parallel ML, algorithm with 72 cores usually range between 10 to 65 times speed up. Python compared to C is about 100x. But ML compared to C is about 2x, and ML out-perform Java, Go, and Haskell.

Designing a good parallel algorithm involve identifying independent tasks. Because work captures the total consumption of energy, we often design a good sequential algorithm first.

Functional programming for parallism is easier than programming in, say CUDA, but performance might be slightly worse.

We use MPL (MaPLe) compiler (CMU's own research compiler)

So in this class we do

Proofs: about run time, with probability
Interfaces: abstract problem, solution model
Reduction: search for best algorithm by combining known algorithms

Specification, Problem, and Implementation

Specifications:

Algorithm Specification: describe what the algorithm should do, but it does not describe how it achieve.
Cost Specification: specify the cost of an algorithm should perform (both/either work or span)
Abstract Data Type (ADT) Specification: specify what operation the data structure has, how it work, and what is the cost.

Problems:

Algorithm Problem: the challenge to implement an algorithm to meet algorithm specification with certain cost
Data Structure Problem: the challenge to implement an algorithm to meet algorithm specification with certain cost (a collection of algorithm problem)

Implementations:

Algorithm: piece of code that solve Algorithm Problem
Data Structure: piece of code that solve Data Structure Problem

To put it simple, ADT is a logical description and data structure is concrete. ADT is the logical picture of the data and the operations to manipulate the component elements of the data. Data structure is the actual representation of the data during the implementation and the algorithms to manipulate the data elements. ADT is in the logical level and data structure is in the implementation level. in ADT vs DS

Benifits having Specification:

we don't care implementation as they are complicated
we can swap out better implementation without breaking things
give a standard to compare implementations

Shortest Superstring (SS) Problem

Nucleotide: basic building block of nucleic acid polymers such as DNA and RNA, bind together to form the double-helix. Components include:

a nitrogenous base , one of Adenine, Cytosine, Guanine, Thymine (Uracil)
a 5-carbon sugar molecule
one or more phosphate groups

We distinguish nucleotides based on nitrogenous base (A, C, G, T)

This problem is particular interesting in sequencing the human genome, as genomes can only be read pieces by pieces (10~1000 base pairs compared to over three billion pairs) due to lab constraints.

Techniques of Reading Molecules:

Primer Walking: construct a special molecule named primer and read first 1000 pairs. The most recent read molecules construct a primer for next part.
Fragments: breaking long pairs into fragments and use Primer Walking
Shotgun Method: make copies of sequence and try to read using fragments, and try to assemble them back.
- Take a DNA sequence and make multiple copies.
- Randomly cut the sequences using a “shotgun” (in reality, using radiation or chemicals) into short fragments.
- Sequence each fragments (possibly in parallel).
- Reconstruct the original genome from the fragments.

Note that there is no easy way to know if repeated fragments are actual repetitions in the sequence or if they are a product of the shotgun method itself. "Double-barrel shotgun method" is used to cut DNA section long enough to span the repeated section. By reading the two end of long section and approximately knowing how far apart are the two ends, we get information about repeats.

Problem Description

Substring: a continuous piece of a super string

Superstring: giving a string $s$ , to construct a super string, we add stuff to the left or right of $s$ .

$\Sigma^*$ : set of all possible string consisting of character set $\Sigma$ (including empty string)

$\Sigma^+$ : set of all possible string consisting of character set $\Sigma$ (excluding empty string)

Shortest Superstring (SS) Problem: output the shortest string that includes all input fragments as substrings.

Shortest superstring is more likely to be the right answer by Occam's razor. But there is no enough information to determine the right answer. $<span class="arithmatex"><span class="MathJax_Preview">\Sigma = \{a, c, g, t\}</span><script type="math/tex">\Sigma = \{a, c, g, t\}$ Also, fragments readings may contain errors in real laboratory settings. Error can be addressed by giving a different score of overlap.

Example from geeksforgeeks.

Input:  arr[] = {"geeks", "quiz", "for"}
Output: geeksquizfor

Input:  arr[] = {"catg", "ctaagt", "gcta", "ttca", "atgcatc"}
Output: gctaagttcatgcatc

Observations

Snippets: fragments that are not substrings of other substrings.

Only snippets are important: we can safely delete non-snippets because they are duplicated.
Snippets cannot start at the same position or end at the same position of original sequence. The snippets (indexed by starting point or ending point) is a strict total order.

From above observation, we only need to try all possible permutations of snippets.

Bruteforce Solution

One solution is: (with $O(n!)$ complexity)

try all possible permutations
remove overlaps
pick shortest result (maximum overlaps) of all permutations

Checking overlap between string $s$ and $t$ can be achieved in the following algorithm: which is $O(|s|, |t|)$ in work and $O(\lg |s| + \lg |t|) \subseteq O(\lg (|s| + |t|) + \lg (|s| + |t|)) = O(\lg (|s| + |t|))$ span if using tree sum.

s = "KANSAS"
t = "SASHIMI"
In this example, the maximum overlap is 3 ("SAS").
  1. Check if the last character of s ("S") = the first character of t ("S"): true (cost = 1)
  2. Check if the last 2 characters of s ("AS") = the first 2 characters of t ("SA"): false (cost = 2)
  3. Check if the last 3 characters of s ("SAS") = the first 3 characters of t ("SAS"): true (cost = 3)
  4. Check if the last 4 characters of s ("NSAS") = the first 4 characters of t ("SASH"): false (cost = 4)
  5. Check if the last 5 characters of s ("ANSAS") = the first 5 characters of t ("SASHI"): false (cost = 5)
  6. Check if the last 6 characters of s ("KANSAS") = the first 6 characters of t ("SASHIM"): false (cost = 6)

If we assume one string is a superstring of the other, then we can iterate from index $i = 0$ to $|s|$ (assuming $s$ is the shorter string) to get complexity $O(\min(|s|, |t|))$

But in fact, you can do overlap checking in $O(n)$ by Ukkonen's algorithm for constant size alphabet and $O(\log n)$ for general case using trees.

Staging: We can also calculate overlap before hand and store them in a dictionary for easier access.

// TODO: cost analysis of staging: https://www.diderot.one/courses/136/books/578/chapter/8091#atom-589865

Reduction

This problem is actually NP-hard, can be reduced to TSP as follow:

String in SS is converted to vertex in TSP
Overlap between 2 string is converted to negative weight of edge in TSP ( $weight(s_i, s_h) = -overlap(s_i, s_j)$ or equivalently $weight(s_i, s_h) = |s_j|-overlap(s_i, s_j)$ since the first will find Hamiltonian Cycle with maximum overlap and the latter will find minimum incurrence)
add a dummy vertex $\hat{}$ so that we don't overlap start and end when making a cycle.

TSP will find most overlap possible

Approximation Algorithms

Greedy Approximation

pick minimum edge
add it to the path
contract its endpoints
repeat until there is a cycle

The solution is a 2-approximation.

// TODO: greedy algorithm for SS: https://www.diderot.one/courses/136/books/578/chapter/8091#atom-589886

Example for greedy algorithm:
Example for greedy algorithm: all arcs with weight 0 are omitted for simplicity

Say we have snippets: {catt, gagtat, tagg, tta, gga}, we do the following: 1. join tagg and gga to obtain tagga (overlap is 2) 2. join catt and tta to obtain catta (overlap is 2) 3. join gagtat and tagga to obtain gagtatagga (overlap is 1) 4. join gagtatagga and catta to obtain gagtataggacatta (overlap is 0)

// TODO: cost analysis of greedy: https://www.diderot.one/courses/136/books/578/chapter/8091#atom-589891

Functional Programming Language

Nested Parallelism: fork-join parallelism where you have parant that suspend until all child finish and join.

can be modeled by dependence graphs. Dependence graphs are Directed Acyclic Graphs (DAGs)
Work: number of nodes
Span: length of longest path

Functional Algorithms: no side effect, good for safe parallelism and abstractions

Benign Effects: side effects that can't be observed by caller, such as storing intermediate values.

SPARC: toy language to describe algorithms and data structures.

Function vs. Algorithm/Mapping Algorithm is a general idea how to solve the problem and a function is the actual code to implement an algorithm. Functions is more than mathematical function (algorithm, mapping), but it specify the mechanism by which the output is generated from the input.

Heisenbug: The term Heisenbug was coined in the early 80s to refer to a type of bug that “disappears” when you try to pinpoint or study it and “appears” when you stop studying it. They are named after the famous Heisenberg uncertainty principle: if you observe one bug, you will lose information about another bug.

Race conditions cannot occur in pure computation.

In functional programming, all data is persistent and no input has been modified (they are just copies of the original). Memory wastage is automatically handeled by garbage collection or by compilers.

Granularity: Size of the smallest tasks that are executed without parallelism. If we do not control, parallel algorithm may perform worse than sequential due to overhead. (e.g. Primitives.par is expensive). We often control granularity by setting threshold for input size.

Lambda Calculus

Lambda Calculus: first general purpose "programming language"

can easily implement recursion, conditions, and datatypes
inherently parallel

Lambda calculus consists of expression $e$ that is one of the following three orms

variable: such as $x, y, z$
lambda abstraction: $(\lambda x, e)$ where $x$ is a variable representing the function argument and $e$ is an expression representing body of the function (possibly containing $x$ )
application: $(e_1, e_2)$ where $e_1, e^2$ are expressions. $e_2$ is input to $e_1$ .

Beta Reduction: if in an application, the left is a lambda abstraction, then beta reduction "applies the function" by making the following transformation

$(\lambda x, e_1) e_2 \to e_1 [x / e_2]$

Computation is essentially beta reduction until there is nothing left to reduce.

Normal Form: an expression that has nothing left to reduce.

It is possible for an expression to never reduce to normal form since lambda calculus can loop forever. (So that it is Turing complete)

Order of operation matters. The two most prominent orders adopted by programming languages are called "call-by-value" and "call-by-need".

Call by Value (parallel): beta reduction is applied to $(\lambda x, e_1) e_2$ if $e_2$ is a value. (e.g. SML, CAML, OCAML)
Call by Need/Name (sequential): beta reduction is applied to $(\lambda x, e_1) e_2$ even if $e_2$ is not a value. If during beta reduction $e_2$ is copied into each variable $x$ in the body, this reduction order is called call-by-name, and if $e_2$ is shared, it is called call-by-need. (This enables lazy evaluation: Haskell)

Since neither reduction order reduce inside of a lambda abstraction, neither of them reduce expressions to normal form. Instead they reduce to what is called weak head normal form.

Call by Value is parallel because $e_1$ and $e_2$ can be evaluated in parallel in application $(e_1, e_2)$ .

// QUESTION: is call by need and call by name differ by whether we modify value or we // QUESTION: why is call by need sequential, since it can apply beta reduction with more flexibility (more cases even if not value) (why only the first subexpression can be evaluated) // QUESTION: are these equivalent in term of evaluated result?

SPARC Language

syntax: the structure of the program itself semantics: what the program computes operational semantics: how algorithms compute cost semantics: how algorithms compute and what is the computational complexity syntactic sugar: syntax that makes it easier to read or write code without adding any real power

In SPARC, every closed expression, which have no undefined (free) variables, evaluates to a value or runs forever.

// TODO: finish at https://www.diderot.one/courses/136/books/578/chapter/8074

Table of Content