I’ve been spending time away from coding for a while as I was modeling actors and props. Now I’m in for another round of optimizations.

At this point, my stage includes actors (about 8-20 on-screen, 100 on stage), tiles (around 150) and props.

I am running somewhere between 5.7 and 8 fps. Here’s a quick breakdown of the game loop

  • 75% on the rendering loop
    • 33% rendering the terrain (terrain geometry is basic)
      • 11% on glDrawElements
    • 22% rendering actors
      • 10% frame interpolation
      • 2% on glDrawElements
    • 10% rendering props
      • 8% on glDrawElements
  • 25% on running game logic

The terrain is very simple, but I draw each tile with a call to glDrawElements. On overall processor usage, I get the following:

  • 20% on glDrawElements
  • 20% on msgSend
  • 5.6% for keyframe interpolation

Since I’ve also enabled a light source with smooth shading, I’ve also run a couple of tests to check how this affects frame rate. Given overheads everywhere else, the difference can be neglected.

Optimizing Terrain Rendering

If I tried VBOs at this stage, I think the result might be disappointing. Instead, what I want to do is cache large tiles including all terrain and props. Not only this means calling glDrawElements much less, also, it will remove all kinds of overheads associated with drawing terrain and props. Caching large tiles will use RAM and isn’t completely straightforward (plus we still need to find time to build the ‘super-tiles’ before caching them); I think it’s worth a try.

Test results

  • Cache 15×15 tiles, 8~10 FPS (vertices: 3817 faces: 4868)
  • Cache 25×15 tiles, 6.2~8.3 FPS (vertices: 11654; faces: 15054)

Now let’s see how this changes processing overhead distributions:

  • 58% on rendering (- 22)
    • 9.5% rendering terrain and props (-33)
      • 9.5% on glDrawElements
    • 46% >> rendering actors (+24)
      • 20% >> frame interpolation (+10)
      • 5% >> on glDrawElements (+3)
  • 41% >> on running game logic (+16%)

Frame interpolation

The next obvious step is to get actor rendering to run faster. If I disable frame interpolation altogether, I run at 11-12 fps. Sure I can’t just disable frame interpolation.

I can cache frames, unfortunately a simple caching approach will find it’s limit quickly, causing any of the following:

  • More work to design and execute models and animations
  • Drastic limits on the number of actor types on stage at any time.
  • Busting live memory

So instead of caching frames, can I make frame interpolation faster? Here’s a quick list of what I changed:

  • A FrameDescriptor class now stores pre-calculated weights and keyframe references – instead of looking up the right keyframes to interpolate from every time
  • Instead of allocating/deallocating memory for frame geometry, always use the same object/memory allocation as target for interpolation result
  • The core interpolation routine has been optimized by replacing external array accesses (something like foo.coordinates[index] ) by pointer accesses.

Herding CPU time

The new profile is as follows:

Now let’s see how this changes processing overhead distributions:

  • 38% on rendering (- 20)
    • 15% rendering terrain and props (-33)
      • 9.5% on glDrawElements
    • 19% >> rendering actors (-27)
      • 10% >> frame interpolation (-10)
      • 7% >> on glDrawElements (+2)
  • 61% >> on running game logic – at the moment, that’s evaluating decision for 100… sheep (+20%).

This runs at 10~11 fps.

In the mean-time, I’m still running logic for 100 NPCs. If I reduce that to 50, I get rates between 17 and 22 fps. But…

SuperTiles?

My first step was to optimize terrain rendering. That surely goes much faster in a way but…

Adding tiles and props to create large chunks of terrains is one thing. In my original test, I could render pretty large chunks without generating a significant overhead, except that…

  • Getting a large enough terrain at a low viewing angle (45/30 degrees) just requires rendering so many tiles
  • Optimizing this requires… well… a different approach.

In other words I found that my rendering slowed down to a crawl again after moving the SuperTiles thing from proof of concept to something more accurate.

Part of my problem was that I was using very small tiles, inherited from a 2D (!) version of my engine designed to support 32×32 pixel tiles.

I could make the tiles larger, but then other constraints (1 prop per tile) would start affecting artistic design adversely so…

…finally I dropped the grid, and I’m now using a quite different approach.

To be continued :)