Lecture 012

Optimizations

Levels of optimization

Compiler: don't improve asymptotic efficiency

Optimization Blockers

Limitation:

Procedure Calls

Since compiler does not optimize calling procedure, because functions may invoke side-effects, loop invariant involving function calls are not optimized. Compiler treats function calls as black box.

Compiler does not optimize strlen() to outside of loop

Compiler does not optimize strlen() to outside of loop

Remedies: use inline function so that compiler can optimize

Memory Aliasing

When two variable belong to the same space in memory, optimizations can be hindered because behavior of memory manipulation can change based on whether two inputs are in the same region of memory.

Aliasing Problem

Aliasing Problem

Remedies:

Instruction-Level Parallelism

Super-scalar Processor: can issue and execute multiple instructions in one cycle. Instructions are retried from a sequential instruction stream and are scheduled dynamically.

Hardware

Hardware

If an instruction takes more than one cycle, they can be scheduled.

Pipelined Functional Units

Pipelined Functional Units

Typical Hardware capability:

Haswell CPU Example

Haswell CPU Example

Loop unrolling: two operation at a time.

Loop re-association:

re-association code

re-association code

Separate Accumulators: turn multiplication in loop to something like merge sort.

Programming with AVX2 (YMM Registers, SIMD Operations)

Performance Chart

Performance Chart

Conditionals

Branch prediction default behavior: (95% accuracy)

General Practice

Optimization

  1. use correct data structure for asymptotic complexity
  2. Good compiler flag (-O3)
  3. Watch out for optimization blocker
    • procedure calls
    • memory reference
  4. Optimize inner most loops
  5. exploit instruction-level parallelism
  6. avoid unpredictable branches
  7. cache friendly

Table of Content