This post is dry, all about performance.
I’ve been working on two scenes. ‘Bridge view’ has no visible NPCs and more geometry. ‘Village View’ reverses the trend.
Whether NPCs are showing or not, they add an overhead to processing.
Read the Data
Processing times are in percentages, either relative to the game loop root or to the parent procedure (—-). For Bridge View I detail sub-routines as percentages of either rendering or game logic. For Village View all percentages are relative to the root of the game loop.
The profiler is accurate. Me writing down the numbers and ordering them is not).
I don’t have love for optimization. Optimizing Objective C code may often involve replacing it by C code – something to do with dynamic binding. I work in rounds. If I feel my game is getting too slow, I get into a week-end optimization session.
This is the roughest session I’ve had so far. Too much processing goes into the game loop – put another way, a lot more processing should be event based. I’ve had several sessions before this one, typically resolved issues by targeting just a couple of under performing code sections. Not this time. Overheads are increasingly fragmented, and I won’t get into a redesign before… releasing a couple more games.
Aside from modeling interactions using events, which allows keeping game AI out of the game loop, I have no ‘optimize upfront’ advice. Events make design easier and better anyway.
I wouldn’t hope completing a project if I started off worrying about 1% overheads.
The solutions I discarded for this round are [bracketed]. Using such or such solution is always a coding time versus performance gain issue. Staggering is cheap; simple, code level optimizations are cheap; re-design isn’t cheap.
Note1: While the SDK disallows running processes in the background, iPhones and iPods are multitasking – your app will be running, and another app may download in the background, or you may be receiving a text message. In most cases, I find that the main app is using 85 to 97% of processing time
Note2: I always run in debug mode – yes, because it’s a little slower, to encourage my optimization efforts.
1. Bridge View: 8.5fps, 70 units, 32 611 faces
78.8% – Rendering
—- 52.4% Terrain Rendering (97.5 GLDrawElements)
=> replaced OpenGL lighting (realtime) by just in time pre-calculated lighting.
+=> enabled back-face culling
[reduce fog distance - no effect on rendering time]
[use VBOs; yes VBOs are most likely definitely worth a go. I just couldn't bother]
[reduce calls to glDrawElements (group trees?)]
—- 2.9% Actor rendering
—- 31.1% Camera management (More hit testing than reason)
=> discarded steep angle faces; that’s causing bugs too unfortunately]
=> code optimizations(i) (use C code, pointers etc… bringing processing share down to 6%)
21.1% – Game Logic
—- 42% AffectMap (How a PC/NPC is affected by others and the environment)
—- 34.3% Actions (Simple Game AI + Game events)
——– 22% ReflexMap (NPCs taking decisions – simple AI)
——– 11.8 Gate apply (hard to explain)
——– 19.8 BasicActorUpdates apply (Actors moving around and doing stuff)
(i) Intersection test profile before optimizations
30.8% Loading triangle corners into a face object.
30.1% Running the intersection test.
25.4% Objective C method calls
I optimized this by adding a table to my 3D geometry. Instead of indexing vertices per face and running all the redirections, vertex coordinates are duplicated on a per face basis and dumped in a memory block. So instead of doing nice maths with triangle and vector objects, I do so so maths with pointers everywhere.
Just for camera management… not worth it. But that will be useful when I get into the too many NPCs roaming around issue.
2. Village view – 25 rats (13fps – 17000 faces)
Rats… small, low poly, black-eyed mammals with red tails.
—- 26.7% Render Terrain (26.4% for GLDrawElements)
—- 11.9% Render Actors (10.3% glDrawElements)
Not worth optimizing for now; however note few actors are moving. probably collision tests to further optimize later
—- 8.4% Setup Camera (already optimized as above)
=> now doing this 3 times per second; load drops to about 1%
26.2% Game Logic
6% ReflexMap (includes 3.8% to test against potential targets for a given decision; 1% to determine which actors and props can be interacted with)
=> This still uses significant time even though it is staggered.
1) there are agents that will never interact with a given ‘specie’. This can be updated at a comparatively very low rate (I use tags, e.g. ‘opponent’ or ‘magic’ to decide what to interact with; tags may change over time, they’re not static).
2) similarly (and maybe more important) an action will only target specified agents. We shouldn’t need to iterate all unrelated agents whether in range or not.
Consider the following:
- fire an event whenever an actor is added
- submit the actor to all reflex maps
- each reflex map submits the actor to relevant target selectors
[- use actor IDs. each target selector can lookup an array using the id as index (this avoids
checking the tag every time - could use tag events to detect this kind of change]
3% Gate apply (Game management; now staggered)
8.3% AffectMap apply
==> The proposed strategy for ReflexMap now also applies to AffectMap and ActorAffectCondition
ActorAffectCondition is testing all actors on the stage. We only want to test actors that might ever
affect the target.
[==> No time for that, but this should really just be event based. Comparatively, event based reactions (getting hit) are easier to model than event based actions (Launching a kick)]
3.4% Actor step (animation management)
1.6% ActivityChangeDispatcher actor:activityChangedFrom:to:
==> This triggers PlaySoundOnActivityChange the problem here is that I’m generating a string before mapping the sound. Same thing can be done virtually at no cost, i.e. map the sound directly from the activity name (maybe combining the actor’s name). This code hasn’t been updated for a long time, it doesn’t even do anything anymore (!)
1.3% Actor getCycleLength -
==> while retaining cycle length may be unsafe (the mapping with actions isn’t one to one) step is checking cycle length at every step. This should be cached at the beginning of an animation cycle (evaluating how long an animation lasts is actually a slightly complex process the way I do it. It involves comparing strings and other well intended, suboptimal steps)
16% OverlayView tick
This is irrelevant – the ‘tick’ is actually used to… …update a text box displaying the frame rate.
I left out significant optimizations yet:
- Disable out of range NPCs. May seem straightforward – why process AIs that won’t interact? I wrote my behavior engine so that NPCs could interact, and this would affect the game even when the player isn’t around. Running a classical adventure/action game atop the engine, I should use this as a game specific optimization.
- Use VBOs (vertex buffer objects). There’s a lot more I’d like to learn about OpenGL-ES, and I have decided to try not to learn it until I have completed my first project. I also tend to keep this as a ‘magic card’. When optimizing, I choose scenes and configurations that are purposefully loaded, and sometimes pretend not to know about various things that can make things faster with little effort – works for me, like talking with stones in your mouth.
- Reuse interpolated frames. I’ve already optimized frame interpolation. This may be combined with caching strategies.
- Use optimized intersection tests (as applied to camera management) for collision detection and walking on uneven terrain.
- Stagger collision/intersection tests
For the last 3, the profiling cases aren’t suitable – I’ve only profiled static scenes with nothing moving this time. Anyway, before I forsake myself and get into half-life teeming with, dynamic scenes, I’d rather I have had gotten stills flowing in peace and harmony.