# Review_001

## Performance Testing

Performance:

• $R^*$: troughput of fast sequential baseline

• $R_1$: throughput on $1$ processor

• $R_P$: throughput on $P$ processors

• $T^*$: time of fastest sequential

• $T_P$: time of parallel on $P$ processors

• Work-Efficiency: $R_1 / R^*$

• Self-Speedup: $R_P / R_1$, $T_1 / T_P$

• Parallel throughput: $R_P$

• Overhead: $T_1 / T^*$

## Probability

W(n) \in O(k \cdot f(n)) \geq 1 - \left(\frac{1}{n}\right)^k

Expectation: $E[X] = \sum_{a \in \Omega} Pr\{a\} \cdot X(a)$

Linear Expectation: $X = X_0 + X_1 \implies E[X] = E[X_0] + E[X_1]$

Union Bound: $Pr\{A \cup B\} \leq Pr\{A\} + Pr\{B\}$

Markov Bound: for non-negative random variable $X \geq 0 \land a > 0 \implies Pr\{X \geq a\} \leq \frac{E[X]}{a}$

Conditional Probability: $Pr\{A \cap B\} = Pr\{A\} \cdot Pr\{B | A\} = Pr\{B\} \cdot Pr\{A | B\}$

## Work and Span

We know that

\begin{align*} T_p \geq S \land T_p \geq \lceil\frac{W}{P}\rceil \implies T_p \geq \max\left( S, \lceil\frac{W}{P}\rceil \right) \end{align*}

$T_p \geq \lceil\frac{W}{P}\rceil$ because $\lceil\frac{W}{P}\rceil$ assumes no dependency.

$T_p \geq S$ because we have limited number of processors.

In addition, for greedy scheduler, we have

T_p < \lceil\frac{W}{P}\rceil + S

The above proof is provided below

and therefore

\max \left(\lceil\frac{W}{P}\rceil, S\right) \leq T_p < \lceil\frac{W}{P}\rceil + S

and we also know that parallel time $T_p$

\begin{align*} T_p <& \frac{W}{P} + S\\ =& \frac{W}{P} + \frac{W}{\bar{P}} \tag{by definition}\\ =& \frac{W}{P} (1 + \frac{P}{\bar{P}})\\ \end{align*}

Observe that when $\bar{P} >> P$, the parallel time $\max \left(\frac{W}{P}, S\right) \leq T_P \leq \frac{W}{P}$ which means $S \to 0$. Therefore $\bar{P}$ is a good measurement of parallism.

Speedup $S_p$: fraction of sequential time over parallel time.

• Perfect Speedup: $S_p = P$ (speed up equal to the number of processor)
S_p = \frac{T_s}{T_p}

## Recurrence

Generally, your formula look like this:

W(n) = \alpha W\left(\frac{n}{\beta}\right) + O\left(f(n)\right)

The parent layer has work: $f(n)$. The children layer has work: $\alpha f(\frac{n}{\beta})$. We compare them to find out whether it is root dominated or leaf dominated or balanced.

For leaf-dominated: there are $\alpha^{\log_\beta n}$ leaves, giving $O((\alpha^{\log_\beta n})\times f(1)) = O(n^{\log_\beta \alpha})$ cost of the tree.

For balanced: there are $\sum_{i = 1}^{\log_\beta n} \alpha^{i}$ nodes, giving $O((\sum_{i = 1}^{\log_\beta n} \alpha^{i}) \times f(1))$ cost of the tree.

Hacks:

• parent layer's cost is just what's after $+$

• children layer's cost is just subsitute what's in $W(\cdot)$ to what's after $+$ and times coefficient for $\alpha$

• if there is no coefficient for $\alpha$, and sequence length is decreasing, then it is root dominated

• if there is $+ 1$ and there is coefficient for $\alpha$, then it is leaf-dominated

If it involves $O(\sqrt{n})$, to solve for the number of layers $k$. We solve $1=n^{\left(\frac{1}{2}\right)^{k}}$. We got $k = \log_{1/2}\left(\log_{n}1\right)$. But this won't get us any further. You should set $n = 2^m$, then define $S(m) = T(2^m)$ and write everything in terms of $S(\cdot)$

Here is an example: We want to solve $S(n) = 2 (\sqrt{n}) + 1$. Write $S(2^m) = 2 S(2^(m/2)) + 1$. Define $W(m) = S(2^m)$, then we have equation $W(m) = W(m/2) + 1$. We solve to get $W(m) \in O(log m)$, therefore we know that $S(2^m) \in O(\log m)$. Let $n = 2^m \implies m = \log n$. Subsitute we get $S(n) \in O(\log \log n)$.

The solution for $T(n) = T(\sqrt{n}) + 1$ is $O(\log \log n)$ (balanced).

Table of Content