Optimization

  1. Complex data structures
    1. Tables accessed in order are cache friendly
    2. But eventually lose to O()
      1. Trees - Jump around, but O(lg N)
        • See heap for pointerless
      2. Hash - Random access, but O(1)
      3. Others...
  2. Lazy evaluation
    1. Don't forget cost of dirty bit
    2. Memory cost: up to 1 word
    3. Branch cost
      1. Misprediction 20ish cycles = 20-80 instructions
      2. 4x4 matrix multiply
        1. 16 *, 12 + = 28 instructions by most naive method
        2. 4 *, 12 MAD = 16 instructions if slightly smarter
        3. 4 *, 3 + = 7 instructions in SSE
        4. 1 *, 3 MAD = 4 instructions in SSE5
      3. Don't use branch to avoid work that's cheaper than branching
  3. Virtual inheritance
    1. Use:
      1. Pointer to object of base class
      2. Member functions based on concrete class
      3. Sphere, Polygon, Cone derived from Object class
      4. obj->render() calls Sphere::render(), Polygon::render() or Cone::render()
    2. How it's done
      1. Base class has a v-table with pointers to virtual functions
      2. Derived classes put different function pointers into the v-table
      3. obj->render() is likely misprediction on v-table lookup
      4. obj->render() is also likely instruction cache miss
      5. v-table locality isn't great: don't tend to use all of v-table cache block
    3. Alternatives
      1. Job list per derived type
      2. Explicit arrays per derived type
      3. Components instead of inheritance