Lecture 001

Functions are functions.

In C programming language, parallelism on 32 cores usually range between 10 to 30 times speed up. Sorting is very parallelizable. In Parallel ML, algorithm with 72 cores usually range between 10 to 65 times speed up. Python compared to C is about 100x. But ML compared to C is about 2x, and ML out-perform Java, Go, and Haskell.

Designing a good parallel algorithm involve identifying independent tasks. Because work captures the total consumption of energy, we often design a good sequential algorithm first.

Functional programming for parallism is easier than programming in, say CUDA, but performance might be slightly worse.

We use MPL (MaPLe) compiler (CMU's own research compiler)

So in this class we do

Specification, Problem, and Implementation




To put it simple, ADT is a logical description and data structure is concrete. ADT is the logical picture of the data and the operations to manipulate the component elements of the data. Data structure is the actual representation of the data during the implementation and the algorithms to manipulate the data elements. ADT is in the logical level and data structure is in the implementation level. in ADT vs DS

Benifits having Specification:

Shortest Superstring (SS) Problem

Nucleotide: basic building block of nucleic acid polymers such as DNA and RNA, bind together to form the double-helix. Components include:

We distinguish nucleotides based on nitrogenous base (A, C, G, T)

This problem is particular interesting in sequencing the human genome, as genomes can only be read pieces by pieces (10~1000 base pairs compared to over three billion pairs) due to lab constraints.

Techniques of Reading Molecules:

Note that there is no easy way to know if repeated fragments are actual repetitions in the sequence or if they are a product of the shotgun method itself. "Double-barrel shotgun method" is used to cut DNA section long enough to span the repeated section. By reading the two end of long section and approximately knowing how far apart are the two ends, we get information about repeats.

Problem Description

Substring: a continuous piece of a super string

Superstring: giving a string s, to construct a super string, we add stuff to the left or right of s.

\Sigma^*: set of all possible string consisting of character set \Sigma (including empty string)

\Sigma^+: set of all possible string consisting of character set \Sigma (excluding empty string)

Shortest Superstring (SS) Problem: output the shortest string that includes all input fragments as substrings.

Shortest superstring is more likely to be the right answer by Occam's razor. But there is no enough information to determine the right answer. \Sigma = \{a, c, g, t\} Also, fragments readings may contain errors in real laboratory settings. Error can be addressed by giving a different score of overlap.

Example from geeksforgeeks.

Input:  arr[] = {"geeks", "quiz", "for"}
Output: geeksquizfor

Input:  arr[] = {"catg", "ctaagt", "gcta", "ttca", "atgcatc"}
Output: gctaagttcatgcatc


Snippets: fragments that are not substrings of other substrings.

From above observation, we only need to try all possible permutations of snippets.

Bruteforce Solution

One solution is: (with O(n!) complexity)

  1. try all possible permutations
  2. remove overlaps
  3. pick shortest result (maximum overlaps) of all permutations

Checking overlap between string s and t can be achieved in the following algorithm: which is O(|s|, |t|) in work and O(\lg |s| + \lg |t|) \subseteq O(\lg (|s| + |t|) + \lg (|s| + |t|)) = O(\lg (|s| + |t|)) span if using tree sum.

s = "KANSAS"
In this example, the maximum overlap is 3 ("SAS").
  1. Check if the last character of s ("S") = the first character of t ("S"): true (cost = 1)
  2. Check if the last 2 characters of s ("AS") = the first 2 characters of t ("SA"): false (cost = 2)
  3. Check if the last 3 characters of s ("SAS") = the first 3 characters of t ("SAS"): true (cost = 3)
  4. Check if the last 4 characters of s ("NSAS") = the first 4 characters of t ("SASH"): false (cost = 4)
  5. Check if the last 5 characters of s ("ANSAS") = the first 5 characters of t ("SASHI"): false (cost = 5)
  6. Check if the last 6 characters of s ("KANSAS") = the first 6 characters of t ("SASHIM"): false (cost = 6)

If we assume one string is a superstring of the other, then we can iterate from index i = 0 to |s| (assuming s is the shorter string) to get complexity O(\min(|s|, |t|))

But in fact, you can do overlap checking in O(n) by Ukkonen's algorithm for constant size alphabet and O(\log n) for general case using trees.

Staging: We can also calculate overlap before hand and store them in a dictionary for easier access.

// TODO: cost analysis of staging: https://www.diderot.one/courses/136/books/578/chapter/8091#atom-589865


Reduce SS to TSP

Reduce SS to TSP

This problem is actually NP-hard, can be reduced to TSP as follow:

  1. String in SS is converted to vertex in TSP
  2. Overlap between 2 string is converted to negative weight of edge in TSP (weight(s_i, s_h) = -overlap(s_i, s_j) or equivalently weight(s_i, s_h) = |s_j|-overlap(s_i, s_j) since the first will find Hamiltonian Cycle with maximum overlap and the latter will find minimum incurrence)
  3. add a dummy vertex \hat{} so that we don't overlap start and end when making a cycle.

TSP will find most overlap possible

Approximation Algorithms

Greedy Approximation

  1. pick minimum edge
  2. add it to the path
  3. contract its endpoints
  4. repeat until there is a cycle

The solution is a 2-approximation.

// TODO: greedy algorithm for SS: https://www.diderot.one/courses/136/books/578/chapter/8091#atom-589886

Example for greedy algorithm:

Example for greedy algorithm: all arcs with weight 0 are omitted for simplicity

Example for greedy algorithm: all arcs with weight 0 are omitted for simplicity
Say we have snippets: {catt, gagtat, tagg, tta, gga}, we do the following: 1. join tagg and gga to obtain tagga (overlap is 2) 2. join catt and tta to obtain catta (overlap is 2) 3. join gagtat and tagga to obtain gagtatagga (overlap is 1) 4. join gagtatagga and catta to obtain gagtataggacatta (overlap is 0)

// TODO: cost analysis of greedy: https://www.diderot.one/courses/136/books/578/chapter/8091#atom-589891

Functional Programming Language

Nested Parallelism: fork-join parallelism where you have parant that suspend until all child finish and join.

Functional Algorithms: no side effect, good for safe parallelism and abstractions

Benign Effects: side effects that can't be observed by caller, such as storing intermediate values.

SPARC: toy language to describe algorithms and data structures.

Function vs. Algorithm/Mapping Algorithm is a general idea how to solve the problem and a function is the actual code to implement an algorithm. Functions is more than mathematical function (algorithm, mapping), but it specify the mechanism by which the output is generated from the input.

Heisenbug: The term Heisenbug was coined in the early 80s to refer to a type of bug that “disappears” when you try to pinpoint or study it and “appears” when you stop studying it. They are named after the famous Heisenberg uncertainty principle: if you observe one bug, you will lose information about another bug.

Race conditions cannot occur in pure computation.

In functional programming, all data is persistent and no input has been modified (they are just copies of the original). Memory wastage is automatically handeled by garbage collection or by compilers.

Granularity: Size of the smallest tasks that are executed without parallelism. If we do not control, parallel algorithm may perform worse than sequential due to overhead. (e.g. Primitives.par is expensive). We often control granularity by setting threshold for input size.

Lambda Calculus

Lambda Calculus: first general purpose "programming language"

Lambda calculus consists of expression e that is one of the following three orms

Beta Reduction: if in an application, the left is a lambda abstraction, then beta reduction "applies the function" by making the following transformation

(\lambda x, e_1) e_2 \to e_1 [x / e_2]

Computation is essentially beta reduction until there is nothing left to reduce.

Normal Form: an expression that has nothing left to reduce.

It is possible for an expression to never reduce to normal form since lambda calculus can loop forever. (So that it is Turing complete)

Order of operation matters. The two most prominent orders adopted by programming languages are called "call-by-value" and "call-by-need".

Since neither reduction order reduce inside of a lambda abstraction, neither of them reduce expressions to normal form. Instead they reduce to what is called weak head normal form.

Call by Value is parallel because e_1 and e_2 can be evaluated in parallel in application (e_1, e_2).

// QUESTION: is call by need and call by name differ by whether we modify value or we // QUESTION: why is call by need sequential, since it can apply beta reduction with more flexibility (more cases even if not value) (why only the first subexpression can be evaluated) // QUESTION: are these equivalent in term of evaluated result?

SPARC Language

syntax: the structure of the program itself semantics: what the program computes operational semantics: how algorithms compute cost semantics: how algorithms compute and what is the computational complexity syntactic sugar: syntax that makes it easier to read or write code without adding any real power

In SPARC, every closed expression, which have no undefined (free) variables, evaluates to a value or runs forever.

SPARC Expressions

SPARC Expressions

// TODO: finish at https://www.diderot.one/courses/136/books/578/chapter/8074

Table of Content