Functions are functions.
In C
programming language, parallelism on 32 cores usually range between 10 to 30 times speed up. Sorting is very parallelizable. In Parallel ML, algorithm with 72 cores usually range between 10 to 65 times speed up. Python compared to C is about 100x. But ML compared to C is about 2x, and ML out-perform Java, Go, and Haskell.
Designing a good parallel algorithm involve identifying independent tasks. Because work captures the total consumption of energy, we often design a good sequential algorithm first.
Functional programming for parallism is easier than programming in, say CUDA, but performance might be slightly worse.
We use MPL (MaPLe)
compiler (CMU's own research compiler)
So in this class we do
Proofs: about run time, with probability
Interfaces: abstract problem, solution model
Reduction: search for best algorithm by combining known algorithms
Specifications:
Algorithm Specification: describe what the algorithm should do, but it does not describe how it achieve.
Cost Specification: specify the cost of an algorithm should perform (both/either work or span)
Abstract Data Type (ADT) Specification: specify what operation the data structure has, how it work, and what is the cost.
Problems:
Algorithm Problem: the challenge to implement an algorithm to meet algorithm specification with certain cost
Data Structure Problem: the challenge to implement an algorithm to meet algorithm specification with certain cost (a collection of algorithm problem)
Implementations:
Algorithm: piece of code that solve Algorithm Problem
Data Structure: piece of code that solve Data Structure Problem
To put it simple, ADT is a logical description and data structure is concrete. ADT is the logical picture of the data and the operations to manipulate the component elements of the data. Data structure is the actual representation of the data during the implementation and the algorithms to manipulate the data elements. ADT is in the logical level and data structure is in the implementation level. in ADT vs DS
Benifits having Specification:
we don't care implementation as they are complicated
we can swap out better implementation without breaking things
give a standard to compare implementations
Nucleotide: basic building block of nucleic acid polymers such as DNA and RNA, bind together to form the double-helix. Components include:
a nitrogenous base , one of Adenine, Cytosine, Guanine, Thymine (Uracil)
a 5-carbon sugar molecule
one or more phosphate groups
We distinguish nucleotides based on nitrogenous base (A, C, G, T)
This problem is particular interesting in sequencing the human genome, as genomes can only be read pieces by pieces (10~1000 base pairs compared to over three billion pairs) due to lab constraints.
Techniques of Reading Molecules:
Primer Walking: construct a special molecule named primer and read first 1000 pairs. The most recent read molecules construct a primer for next part.
Fragments: breaking long pairs into fragments and use Primer Walking
Shotgun Method: make copies of sequence and try to read using fragments, and try to assemble them back.
Note that there is no easy way to know if repeated fragments are actual repetitions in the sequence or if they are a product of the shotgun method itself. "Double-barrel shotgun method" is used to cut DNA section long enough to span the repeated section. By reading the two end of long section and approximately knowing how far apart are the two ends, we get information about repeats.
Substring: a continuous piece of a super string
Superstring: giving a string s, to construct a super string, we add stuff to the left or right of s.
\Sigma^*: set of all possible string consisting of character set \Sigma (including empty string)
\Sigma^+: set of all possible string consisting of character set \Sigma (excluding empty string)
Shortest Superstring (SS) Problem: output the shortest string that includes all input fragments as substrings.
Shortest superstring is more likely to be the right answer by Occam's razor. But there is no enough information to determine the right answer. \Sigma = \{a, c, g, t\} Also, fragments readings may contain errors in real laboratory settings. Error can be addressed by giving a different score of overlap.
Example from geeksforgeeks.
Input: arr[] = {"geeks", "quiz", "for"}
Output: geeksquizfor
Input: arr[] = {"catg", "ctaagt", "gcta", "ttca", "atgcatc"}
Output: gctaagttcatgcatc
Snippets: fragments that are not substrings of other substrings.
Only snippets are important: we can safely delete non-snippets because they are duplicated.
Snippets cannot start at the same position or end at the same position of original sequence. The snippets (indexed by starting point or ending point) is a strict total order.
From above observation, we only need to try all possible permutations of snippets.
One solution is: (with O(n!) complexity)
Checking overlap between string s and t can be achieved in the following algorithm: which is O(|s|, |t|) in work and O(\lg |s| + \lg |t|) \subseteq O(\lg (|s| + |t|) + \lg (|s| + |t|)) = O(\lg (|s| + |t|)) span if using tree sum.
s = "KANSAS"
t = "SASHIMI"
In this example, the maximum overlap is 3 ("SAS").
1. Check if the last character of s ("S") = the first character of t ("S"): true (cost = 1)
2. Check if the last 2 characters of s ("AS") = the first 2 characters of t ("SA"): false (cost = 2)
3. Check if the last 3 characters of s ("SAS") = the first 3 characters of t ("SAS"): true (cost = 3)
4. Check if the last 4 characters of s ("NSAS") = the first 4 characters of t ("SASH"): false (cost = 4)
5. Check if the last 5 characters of s ("ANSAS") = the first 5 characters of t ("SASHI"): false (cost = 5)
6. Check if the last 6 characters of s ("KANSAS") = the first 6 characters of t ("SASHIM"): false (cost = 6)
If we assume one string is a superstring of the other, then we can iterate from index i = 0 to |s| (assuming s is the shorter string) to get complexity O(\min(|s|, |t|))
But in fact, you can do overlap checking in O(n) by Ukkonen's algorithm for constant size alphabet and O(\log n) for general case using trees.
Staging: We can also calculate overlap before hand and store them in a dictionary for easier access.
// TODO: cost analysis of staging: https://www.diderot.one/courses/136/books/578/chapter/8091#atom-589865
This problem is actually NP-hard, can be reduced to TSP as follow:
TSP will find most overlap possible
Greedy Approximation
The solution is a 2
-approximation.
// TODO: greedy algorithm for SS: https://www.diderot.one/courses/136/books/578/chapter/8091#atom-589886
Example for greedy algorithm: Say we have snippets:
{catt, gagtat, tagg, tta, gga}
, we do the following: 1. jointagg
andgga
to obtaintagga
(overlap is 2) 2. joincatt
andtta
to obtaincatta
(overlap is 2) 3. joingagtat
andtagga
to obtaingagtatagga
(overlap is 1) 4. joingagtatagga
andcatta
to obtaingagtataggacatta
(overlap is 0)
// TODO: cost analysis of greedy: https://www.diderot.one/courses/136/books/578/chapter/8091#atom-589891
Nested Parallelism: fork-join parallelism where you have parant that suspend until all child finish and join.
can be modeled by dependence graphs. Dependence graphs are Directed Acyclic Graphs (DAGs)
Work: number of nodes
Span: length of longest path
Functional Algorithms: no side effect, good for safe parallelism and abstractions
Benign Effects: side effects that can't be observed by caller, such as storing intermediate values.
SPARC: toy language to describe algorithms and data structures.
Function vs. Algorithm/Mapping Algorithm is a general idea how to solve the problem and a function is the actual code to implement an algorithm. Functions is more than mathematical function (algorithm, mapping), but it specify the mechanism by which the output is generated from the input.
Heisenbug: The term Heisenbug was coined in the early 80s to refer to a type of bug that “disappears” when you try to pinpoint or study it and “appears” when you stop studying it. They are named after the famous Heisenberg uncertainty principle: if you observe one bug, you will lose information about another bug.
Race conditions cannot occur in pure computation.
In functional programming, all data is persistent and no input has been modified (they are just copies of the original). Memory wastage is automatically handeled by garbage collection or by compilers.
Granularity: Size of the smallest tasks that are executed without parallelism. If we do not control, parallel algorithm may perform worse than sequential due to overhead. (e.g. Primitives.par
is expensive). We often control granularity by setting threshold for input size.
Lambda Calculus: first general purpose "programming language"
can easily implement recursion, conditions, and datatypes
inherently parallel
Lambda calculus consists of expression e that is one of the following three orms
variable: such as x, y, z
lambda abstraction: (\lambda x, e) where x is a variable representing the function argument and e is an expression representing body of the function (possibly containing x)
application: (e_1, e_2) where e_1, e^2 are expressions. e_2 is input to e_1.
Beta Reduction: if in an application, the left is a lambda abstraction, then beta reduction "applies the function" by making the following transformation
Computation is essentially beta reduction until there is nothing left to reduce.
Normal Form: an expression that has nothing left to reduce.
It is possible for an expression to never reduce to normal form since lambda calculus can loop forever. (So that it is Turing complete)
Order of operation matters. The two most prominent orders adopted by programming languages are called "call-by-value" and "call-by-need".
Call by Value (parallel): beta reduction is applied to (\lambda x, e_1) e_2 if e_2 is a value. (e.g. SML, CAML, OCAML)
Call by Need/Name (sequential): beta reduction is applied to (\lambda x, e_1) e_2 even if e_2 is not a value. If during beta reduction e_2 is copied into each variable x in the body, this reduction order is called call-by-name, and if e_2 is shared, it is called call-by-need. (This enables lazy evaluation: Haskell)
Since neither reduction order reduce inside of a lambda abstraction, neither of them reduce expressions to normal form. Instead they reduce to what is called weak head normal form.
Call by Value is parallel because e_1 and e_2 can be evaluated in parallel in application (e_1, e_2).
// QUESTION: is call by need and call by name differ by whether we modify value or we // QUESTION: why is call by need sequential, since it can apply beta reduction with more flexibility (more cases even if not value) (why only the first subexpression can be evaluated) // QUESTION: are these equivalent in term of evaluated result?
syntax: the structure of the program itself semantics: what the program computes operational semantics: how algorithms compute cost semantics: how algorithms compute and what is the computational complexity syntactic sugar: syntax that makes it easier to read or write code without adding any real power
In SPARC, every closed expression, which have no undefined (free) variables, evaluates to a value or runs forever.
// TODO: finish at https://www.diderot.one/courses/136/books/578/chapter/8074
Table of Content