Review_001

Performance Testing

Performance:

$R^*$ : troughput of fast sequential baseline
$R_1$ : throughput on $1$ processor
$R_P$ : throughput on $P$ processors
$T^*$ : time of fastest sequential
$T_P$ : time of parallel on $P$ processors
Work-Efficiency: $R_1 / R^*$
Self-Speedup: $R_P / R_1$ , $T_1 / T_P$
Parallel throughput: $R_P$
Overhead: $T_1 / T^*$

Probability

$W(n) \in O(k \cdot f(n)) \geq 1 - \left(\frac{1}{n}\right)^k$

Expectation: $E[X] = \sum_{a \in \Omega} Pr\{a\} \cdot X(a)$

Linear Expectation: $X = X_0 + X_1 \implies E[X] = E[X_0] + E[X_1]$

Union Bound: $Pr\{A \cup B\} \leq Pr\{A\} + Pr\{B\}$

Markov Bound: for non-negative random variable $X \geq 0 \land a > 0 \implies Pr\{X \geq a\} \leq \frac{E[X]}{a}$

Conditional Probability: $Pr\{A \cap B\} = Pr\{A\} \cdot Pr\{B | A\} = Pr\{B\} \cdot Pr\{A | B\}$

Work and Span

We know that

$\begin{align*} T_p \geq S \land T_p \geq \lceil\frac{W}{P}\rceil \implies T_p \geq \max\left( S, \lceil\frac{W}{P}\rceil \right) \end{align*}$

$T_p \geq \lceil\frac{W}{P}\rceil$ because $\lceil\frac{W}{P}\rceil$ assumes no dependency.

$T_p \geq S$ because we have limited number of processors.

In addition, for greedy scheduler, we have

$T_p < \lceil\frac{W}{P}\rceil + S$

The above proof is provided below

and therefore

$\max \left(\lceil\frac{W}{P}\rceil, S\right) \leq T_p < \lceil\frac{W}{P}\rceil + S$

and we also know that parallel time $T_p$

$\begin{align*} T_p <& \frac{W}{P} + S\\ =& \frac{W}{P} + \frac{W}{\bar{P}} \tag{by definition}\\ =& \frac{W}{P} (1 + \frac{P}{\bar{P}})\\ \end{align*}$

Observe that when $\bar{P} >> P$ , the parallel time $\max \left(\frac{W}{P}, S\right) \leq T_P \leq \frac{W}{P}$ which means $S \to 0$ . Therefore $\bar{P}$ is a good measurement of parallism.

Speedup $S_p$ : fraction of sequential time over parallel time.

Perfect Speedup: $S_p = P$ (speed up equal to the number of processor)

$S_p = \frac{T_s}{T_p}$

Contraction

Adding and Merge to produce desired output

Recurrence

Generally, your formula look like this:

$W(n) = \alpha W\left(\frac{n}{\beta}\right) + O\left(f(n)\right)$

The parent layer has work: $f(n)$ . The children layer has work: $\alpha f(\frac{n}{\beta})$ . We compare them to find out whether it is root dominated or leaf dominated or balanced.

For leaf-dominated: there are $\alpha^{\log_\beta n}$ leaves, giving $O((\alpha^{\log_\beta n})\times f(1)) = O(n^{\log_\beta \alpha})$ cost of the tree.

For balanced: there are $\sum_{i = 1}^{\log_\beta n} \alpha^{i}$ nodes, giving $O((\sum_{i = 1}^{\log_\beta n} \alpha^{i}) \times f(1))$ cost of the tree.

Hacks:

parent layer's cost is just what's after $+$
children layer's cost is just subsitute what's in $W(\cdot)$ to what's after $+$ and times coefficient for $\alpha$
if there is no coefficient for $\alpha$ , and sequence length is decreasing, then it is root dominated
if there is $+ 1$ and there is coefficient for $\alpha$ , then it is leaf-dominated

If it involves $O(\sqrt{n})$ , to solve for the number of layers $k$ . We solve $1=n^{\left(\frac{1}{2}\right)^{k}}$ . We got $k = \log_{1/2}\left(\log_{n}1\right)$ . But this won't get us any further. You should set $n = 2^m$ , then define $S(m) = T(2^m)$ and write everything in terms of $S(\cdot)$

Here is an example: We want to solve $S(n) = 2 (\sqrt{n}) + 1$ . Write $S(2^m) = 2 S(2^(m/2)) + 1$ . Define $W(m) = S(2^m)$ , then we have equation $W(m) = W(m/2) + 1$ . We solve to get $W(m) \in O(log m)$ , therefore we know that $S(2^m) \in O(\log m)$ . Let $n = 2^m \implies m = \log n$ . Subsitute we get $S(n) \in O(\log \log n)$ .

The solution for $T(n) = T(\sqrt{n}) + 1$ is $O(\log \log n)$ (balanced).

Table of Content