Skip to content

Archive

Category: OpenGL-ES

We know how to use instruments to fine tune the performance of our games: we have a quick run (maybe one minute) check the heaviest stack traces, and do some surgery here and there, right.

How about cases when…

  • There’s just a couple of frames hanging every once in a while.
  • The game slows down for a short period of time

Here’s how I did it. I had a short span in my game where the frame rate dropped dramatically. While this typically lasts for less than 5 seconds, you may have noticed than the best games rarely slow down at all, if ever. Besides, what’s an isolated incident in a game may signal a problem in the game engine. Let’s get to work.

Here’s what I did:

  1. I create a new file in Instruments, choosing CPU sampler.
  2. I record a 2 to 5 minutes session, making sure I record the part of the game where the frame rate drops (!)
  3. After the run, spikes (higher spikes…) would show, matching the time when the frame drop occured. I tried to get to this sample by sample, but I couldn’t extract any useful information.
  4. Now, checking the inspection range button group at the top of the instruments UI, I noticed that I could restrict the span considered for profiling.
  5. I then compared overheads within and without the frame drop time-span.

I took a couple of screenshots so I could compare easily. Here’s what we see:

  • In the first case, CPU usage is very low. The game spends about 80% of the time waiting for the frame callback (note we’re running on an iPhone4, and this is a universal app also available on iPod 2nd gen.)
  • In the second case, 85% of machine time is used up. There will be other things happening in the background etc… but more importantly, we can identify several activities that are, otherwise, simply insignificant (too fast for the sampler to pickup):
    • Drawing actors. This occupies 50% of the run loop
    • Evaluating decorations

Oh well. I happen to have 4 actors in there. Incidentally the model is the heaviest I’ve got. The actors aren’t showing on-screen when the frame drop occurs, they’re just nearby. But as a general rule, it’s much better to send more than less than there is to view(!).

The next obvious step was to go back to the game script and see what happens when the actors are removed. The frame rate recovered completely – all that was needed was optimizing the geometry for these actors.

Conclusion

This quick case study shows how CPU sampler can be used to identify overheads within a specific time-span. No traces, no manual profiling. It’s a very simple technique, and it can avoid heading in the wrong direction based on vague intuitions of where the overheads should lie.

In this case, the scene considered actually cumulated several candidates:

  • The scene is complex. In fact I had to break up the scenery into several components because the max vertex count (owing to number format in my files) is ~8000.
  • Procedurally generated decorations add to the rendering overhead for static elements.
  • But the actor model, duplicated 4 times, is also heavy.

When I started off, I was so convinced that the complex scene was responsible for my overhead that I was about to do the artwork all over again, even though scene rendering uses VBOs. Bothering with firing up the profiler and running a 15 minutes session total pointed me in the right direction.

Hairlock - dev pic

Originally developed under the codename ‘Hairlock’, Antistar 3D: Rising. is a universal game (iPhone, iPad, iPod Touch) published on the 1st of August 2010. Artwork in the game is a mix of Blender art and procedurally generated elements.

A Key Decision…

It was decided very early that all the details in the game would be modeled – no textures! There are several reasons why Antistar was created this way:

1 – I have never been very fond of textured games. This is isn’t because I think textures are bad, but because usually, I find that textures are abused, becoming an easy substitute for shading and high quality visual content.

2 – I trained myself in blender around 2000-2002; I always focused on modeling.

3 – Supporting textures would have added a load on the engine, taking longer to develop, leaving much less time to produce the actual content, making it potentially harder to run at frame rate and increasing loading times.

4 – Although it is now available on iPhone 4 and iPad, the game was originally developed to run nicely on an iPod Touch. On a small screen, the need to crowd scenes with tiny flatland decorations isn’t obvious – even if some games (e.g. Samurai: Way of the Warrior) use textures very ingenuously.

Blender’s good

Edit mode rocks!

Back in 2000 when I discovered Blender, I was still looking for software that could help me make my models. I am not a professional artist, so I need to try again and again until I see what I have in mind in the 2D view. Blender’s edit mode, with one hand on the keyboard and another securely cuddling the mouse, lets you change things over and over again at the speed of thought.

While the game doesn’t use textures, Blender has a nifty ‘retopo’ mode allowing to project geometry onto a surface while editing. This was used to add effects that make some players think that the game supports textures, but more importantly, offer an OK substitute to textures in situations where they’re really needed.

Free and comprehensive

It just turns out that Blender is completely free, open source and comprehensive. I didn’t use many features, but these are key to creating a game:

1 – Support for bone animation, with a little of a learning curve, but the same ergonomics found in edit mode, allowing to quickly define and adjust areas of influence for bones, add constraints, etc…

2 – Python scripting. I looked around for an export plugin for ages before getting into it, only to realise that python scripting isn’t hard, and there are key advantages to writing your own file format for 3D. In Antistar, geometry is loaded on the fly, in a format that Open GL understands, as exported directly from Blender, in just a couple of clicks.
Hairlock - dev pic
Technical data

How much geometry in there?

As this was my first 3D engine, and my first experience on a mobile platform, I was fairly cautious when I started modeling, as I was worried I would run out of GPU time fairly quickly. Over time I realised that (a) rendering is pretty fast, even on an iPod Touch 2nd gen, and (b) once we do away with mapped images, we have a lot of polygons to play with (it’s also good to know that smaller poylgons render much faster than large ones). Here are some examples:

- Humanoids and other creatures: 650 to 2500 vertices/quads(*)
- Small decorations: ~40 vertices/quads or less. These can get duplicated a few dozen times in the same scene.
- Terrain (e.g. the ‘incubator’ scene displayed at the top): up to 10,000 vertices/quads.

(*) In most cases, the number of vertices is roughly equal to the number of quads.

Rendering

For rendering, I used a mix of just-in-time calculated and real time lighting (for actors and props). Evaluating simple illumination on the fly while loading models has advantages, but I haven’t implemented a method to refresh lighting following the day/night cycle in the game; instead, fog is used to affect the rendering of the environment according to the day/night cycle.

I’m still considering whether to use LOD (level of detail) nodes (aka, lighter models when viewed from far away) or not. In the first version of the game, I used depth of field balancing to avoid dropping the frame rate too much. depth of field balancing works quite well, although it seems a little difficult to make it perfect :) . in the meantime, artefacts aren’t usually disturbing and may even bring a little life to the game, with soft oscillations in depth of field and fog level.

Cheats ?

At times 3D can get a little intense. To create the dark forest near Klinnburg, I flattened the trees, only leaving 2D geometry. This allowed keeping patches of procedural grass that I kinda like, and adding walkable stones here and there. I don’t like doing that too much, because it always does show a little.

Future work

materials

In the first release, the exporter only processed RGBA colors for materials. The engine is a little more powerful though, as it can take parameters for shading, ambient… well, the basic kit.

I probably want to get better at python scripting, so I can explore ways of generating geometry using plugins, and better balance what’s generated on the fly versus what gets loaded.

Hairlock - dev pic

level editor

Currently, my export plugin handles objects and bone animations quite well, but it works more like a library export plugin, with any blender file containing a generous mix of actors, items and terrain. I want to take the time to turn this into a level editor. This would help streamlining the workflow, as for now the ‘level editing’ process consists in walking around the world and writing down coordinates (**)

forward raytracing?

And finally, I’ve been working on ‘rough and ready’ procedures for self illumination (!). This can be used part in real time, part by pre-calculating contributions. Initial results are promising and the results may ship with the game within a couple of months!

————-

Antistar 3D: Rising – is a realtime 3D adenture taking advantage of 3D capabilities on the iPod Touch, iPhone 3G/3GS, iPhone 4 and iPad. (app store link)

Playing with fire

Before moving any further, remember using ad-hoc methods to detect which device you’re running is playing with fire. Here is minimal advice to avoid getting into trouble:

  1. If you’re trying to determine how old/powerful a device might be to enable greedy features, your list of older devices needs to be as exhaustive as possible. You want to enable advanced features by default to ensure they will be available on later (hopefully faster!) hardware.
  2. Generally, remember that a function based on known, existing device names cannot return you the name of a future device. Whatever function operates based on device names requires you to specify a default for an unknown device.
  3. There is a rumor suggesting that apple doesn’t want to devs to detect which device they’re running on. Given (1,2) it’s not difficult to understand why.

How to…

  • The recommended way to detect your device name/OS version is via UIDevice. Unfortunately this cannot tell the difference between an iPhone 3G and a 3GS. I don’t have a 3D to test, so I have to assume that iPhone3G is significantly slower than 3GS until I can test.
  • You can use sysctlbyname as suggested in a couple of articles. This gives fine grain information. While the list of strings returned by the hw.machine parameter is already out of date on that post, the question can distinguish between various iPod and iPhone models.

Some people seem to think that sysctlbyname is not a documented function and shouldn’t be used. Interestingly, in a rejection letter posted here, apple encourages a developer to use this function. This function is documented. But hey, hold on. That doesn’t imply we should fine-grain detect the device model, right? This function has tons of uses beside that.

How can I solve my problem?

What I want to do is determine how fast the device I am running on is, so I can set the antialiasing level for my game. On an iPod touch, I can’t run antialiasing (not with the 3D content I have,that is), it’s too slow. On a 3GS, it’s OK.

For now, I’m falling back on checking for OpenGLES1.2 support. It doesn’t directly check speed, but earlier devices only support OpenGLES2.0, so I’m guessing it should be OK.

If you’re lazy and you want a simple background image for your 3D, here’s a neat trick:

  1. Add a UIImageView to your .nib file (this will be your background image)
  2. Add your EAGLView as a child to the same parent as the  background image
  3. Clear the background before rendering:
    glClearColor(0.0f,0.0f,0.0f,0.0f);
  4. Run your app.
  5. Sit back and enjoy.

You can color the background by passing other values.

You can still enable fog. Fog colors polygons, not the background. The only problem is, far objects won’t blend in the background. They’ll stand out fogged like hair in the distant soup. Works well for starry skies, or just if the lower part of your sky is the same color as the fog.

I’ve been reading everywhere that VBOs are great. Since I’m having a little downtime not knowing how to carry my project forward, I thought I’d have a look into it. What lured me in is the promise of a dramatic performance increase.

Coding recipe

1. Call the code below, or similar, once (not in the rendering loop) for each array you wish to buffer

// this example illustrates how to load a vertex array buffer.
// only small variations for loading color arrays
// and element (indexed triangle) arrays.
// see the rest of the article for details.
// create a handle that you will use to refer to the buffer when talking to gl
GLuint handle;
glGenBuffers(1,&handle);
// let's load our buffer data...
// size of your buffer, here, 3 times the number of vertices,
// multiplied by your format size, for example sizeof(GL_FLOAT).
int dataByteSize = ...
float* data = malloc(dataByteSize)
// need to populate your buffer with vertex data!
...
// this loads 'data' into graphic memory referred by 'handle'
glBindBuffer(GL_ARRAY_BUFFER,handle);
// Typically you want to use static draw. This implies you will use the buffer
// over and over, in other words you won't modify geometry afterwards.
glBufferData( GL_ARRAY_BUFFER, dataSize, data, GL_STATIC_DRAW);
// clear the binding after loading your array, otherwise you will get crashes
glBindBuffer(GL_ARRAY_BUFFER,0);


2. Put that in your rendering loop to pass the buffer instead of a regular pointer

// this example illustrates how to 'pass pointers' with
// 'pass array buffers'
// note: this example only makes sense if you can do basic gl rendering.
glBindBuffer(GL_ARRAY_BUFFER,handleToColorBuffer);
glVertexPointer(4,GL_FLOAT,0,0 /*replaced the pointer reference by zero*/ );
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER,handleToPerFaceVertexIndices);
glDrawElements(... , 0 /*replaced the pointer reference by zero*/ );

What are VBOs?

Apparently VBO means vertex buffer object. GL-ES allows defining buffers into which we load our vertex (but also color, normal and face indexing) data. Color, normal and coordinate arrays are passed as ‘just array buffers’. Indexed faces are passed using element array buffers.

The promise is that by using buffer objects, we give to OpenGL an opportunity to move the data ’somewhere fast to access’. I guess that would be graphic memory, if anything, but who knows? Anyway that’s a hardware sensitive topic.

Creating Buffers

First you define a handle variable. Buffers are indexed, so we need handles.

GLuint handle;
glGenBuffers(1,&handle);

This code generates just one handle. That’s what the number ‘1′ is saying. We’re passing by reference so if we wanted, we could have allocated several handles within the same memory chunk. I’m not really sure why you’d ever want to do that but… (hey, I didn’t say it wasn’t marginally more efficient, right?)

Next you make the handle ‘current’, or in other words, ‘bind the buffer’. We tell gl that instead of addressing memory using pointers, we will address the buffer referred to by the handle, or, if loading data into the buffer, a buffer will be created for us and referred to by the handle.

glBindBuffer(GL_ARRAY_BUFFER,handle);

There are two ‘channels’ for addressing, one (GL_ARRAY_BUFFER) relates to functions like glColorPointer, glVertexPointer, glNormalPointer. The other refers mainly to glDrawElements (pass GL_ELEMENT_ARRAY_BUFFER).
In this case I’m planning on loading coordinates in the buffer, so I use GL_ARRAY_BUFFER. To load vertex indices (as used by glDrawElements), use GL_ELEMENT_ARRAY_BUFFER.

Next we pass a pointer to the memory block containing our data:

// data is pointing at our vertex coordinates. Something like float* data = malloc(…)
// dataSize is 3 times the number of vertices, multiplied by your format size, e.g sizeof(GL_FLOAT).
glBufferData( GL_ARRAY_BUFFER, dataSize, data, GL_STATIC_DRAW)

GL_STATIC_DRAW is telling gl that we will never change the data pointed at, so it’s safe to copy the data somewhere else (it’s a hint. Check the doc for less useful hints, because if you assert that the data is dynamic, it will be much harder for gl to optimize and your buffer may be a little cosmetic).

Now that we’ve loaded our data into the buffer referred by ‘handle’, it may be a good idea to reset the buffer binding, otherwise we may crash later on (see caveats, below).

glBindBuffer(GL_ARRAY_BUFFER,0);

Rendering from buffers

It is easy to convert gl drawing code using memory pointers to code using buffers. All we need to do is bind the buffer we want to use, then replace whatever pointer we used to use by zero (1).

glBindBuffer(GL_ARRAY_BUFFER,handle);
glVertexPointer(3,GL_FLOAT,0,0 /*replaced the pointer reference by zero*/ );

So we first have to bind the buffer once again, then we call glVertexPointer as we used to, but instead of passing our memory pointer, we pass zero to indicate that we want to read from the first entry in the buffer.

And we do (a) exactly the same thing for color buffers and (b) quite the same thing for element buffers, viz.:

glBindBuffer(GL_ARRAY_BUFFER,handleToColorBuffer);
glVertexPointer(4,GL_FLOAT,0,0 /*replaced the pointer reference by zero*/ );

glBindBuffer(GL_ELEMENT_ARRAY_BUFFER,handleToPerFaceVertexIndices);
glDrawElements(… , 0 /*replaced the pointer reference by zero*/ );

Errors and caveats

  1. Passing GL_ARRAY_BUFFER instead of GL_ELEMENT_ARRAY_BUFFER (or vice versa).
  2. Not calling glBindBuffer(xxx,0) before running an Open-GL call that doesn’t use VBOs.
    Why is that a problem? Because glBindBuffer reinterprets the pointer you pass to glVertextPointer, glColorPointer and so forth…
  3. So if you pass a regular pointer to any of these functions afterwards, your pointer is an offset inside the current buffer, not a memory pointer. Bang.
  4. A typical caveat related to (2) may be when you generate buffers while rendering. Since I have a mix of artwork and procedural geometry, that’s exactly what I do, and I don’t want/cannot use VBOs everywhere. So I bound my buffer for loading them, then moved on to rendering something else, and had a nice crash.
    Either way failing to reset the bindings after assigning them, then calling glDrawElements, will consistently mess your app, and maybe the next 3D app you run afterwards
  5. Not releasing buffer memory. Following  the above examples, we’d call glDeleteBuffers(1,&handle) to release the buffer referred from ‘handle’. Whatever memory is used by buffers is, in my humble experience, quite limited. It would appear that, short of making it a science, a memmove error signals memory shortage when running an iPod touch.

Notwithstanding, I kind of tried three approaches to redirecting my calls to regular memory after using buffers:

  • Call glBindbuffer(xxx,0) whenever you want to draw from client memory (as in, you don’t want to use a buffer, just pass a regular pointer)
  • Call glBindBuffer(xxx,0) as the last thing you do after rendering buffered data.
  • Somehow avoid resetting the binding too often. For example, maybe you want to loop over elements drawn using regular pointers, then you call glBindBuffer(xxx,0) just once (twice if you are binding both the array and element array buffers). Maybe it increases performance – What’s for sure is that this approach is error prone, and may give you pain.

Performance

I guess I’m testing with the wrong hardware. On an iPod Touch 8GB bought at the end of last year, I detected no performance increase. Nothing at all, zero!
Now, don’t expect VBOs to be useless. It’s either me, or the device i’m using. VBOs are good for us.

Notes

(1) Why zero? Actually, instead of using the pointer as a memory pointer, glBindBuffer tells gl that the pointer is ‘x machine units’ starting from the beginning of the graphic memory addressed by the handle. So if somebody knows what machine units are (maybe a float if it’s an array of GL_FLOAT) they might want to use that to draw only portions of the buffer they passed.
You may be tempted to pass null or nil (and find bad literature to inspire you). That’s kind of obfuscating the matter. You don’t pass zero to indicate no pointer, or a null pointer. Zero is really just an offset.

If I started 3D graphics programming targeting a desktop platform or XBox Indie, I would have felt hopelessly intimidated. Also, I would have gotten bogged down into textures and shaders.

Instead I started on the ITouch. By far one of the most powerful mobile platforms (not counting the PSP, likely no worse than the DS), it’s still a weeny tiny computer with little memory and processing power to boast. Here’s what I’m learning:

  1. Read/Ignore advice about what to use and what not to use. File it. It’ll be useful later when you really need it. Optimize a beautiful scene, not an empty screen.
  2. Live with polygons and normals and make the better of it. Textures are great to paint flags. You wanna make a 2D game or what?
  3. Keep game logic where it f*****g belongs. 1% to 10% of your processing time. Game logic includes NPC management, camera management, animation and simple physics.
  4. Hack from game engines (hey, I should really look into doing this!). Don’t use a game engine (unless you want to hand in low quality material right on time). Guys writing game engines are really, really, experienced and smart but… …their engines cater for too much to be either easy to learn or practically faster than anything custom built with minimum care.
  5. Write your own export routines for a start. Scripting 3D tools is easy and avoids importing and parsing stuff you don’t even want to know about.
  6. Don’t under-estimate GL transformations. Anything that moves/rotates, looks a different size, is nice and cool. Bone/Frame interpolation is nice, but GL won’t do it for you.
  7. Learn procedural methods. Anything you can evaluate at run-time without disrupting game play is something you can reuse at will, never need to load and hardly need an artist for.

What’s the conclusion? I wouldn’t code on a 10 years old PC. Too damn stupid. I wouldn’t test on a net-book – kind of lame. Starting on a mobile platform makes me feel happy, because mobile gaming has a bright, beautiful future. And the best part is, when I scale up, I’ll be fearless.

Happy coding :)

Hairlock - dev picWhat you’re looking at may well be the first official preview pic for Hairlock. Unless I change the name to Redlock… or whatever.

But I’m too tired to even draft a 5 line blurb, which is where my ambition lies. For the game itself, well… Fancy running this through the profiler since adding more trees and more growth is hitting me low.

  • 16.4/18.6% procedural grass. Awkward considering so little of it is showing in this shot. Well what’s awkward is I should really grid-chunk this stuff, and likely define LOD nodes for it.
  • 14/4% decorations. That’s the shorter plants and all the stones. Yea… each stone is rendered as a 3D object. Compared to grass, other decorations use pretty efficient bounding boxes. Maybe too efficient. I’d like to use less GL calls for these (grid-chunking again – different problem, same proposed solution)
  • 11.1/4% the terrain. All trees, trunks and the ground count as terrain; plus the axe at the front.
  • 10.4% rendering actors – count the little lady and the crow.
  • 7.7/10.6% objc_msgSend
  • 0.7/0.5% clouds in the sky (animated, not quite visible on this pic)
  • 0/18.3% eval geometry for decorations

In all cases i gave two figures – the first one for the scene presented here, the second for walking around, but mostly focusing on a very different kind of scene.

Is it so bad after all?

I wrote a few times that optimization isn’t the first thing to have in mind. I’ve been struggling a little to manage drawing everything I want (to be accurate, I have a radical impulse to add more grass). I also started this post assuming that I couldn’t balance around 15FPS. Wrong.

I was staring at my little cotton clouds crossing the screen, and I suddenly realized that the clouds were obscured by otherwise invisible background elements – because I have a linear fog set to start at 1/4th of the viewing distance. Point is, I have this reference scene with a couple of houses in the background, and what this little observation shows is that balancing didn’t require this stuff to disappear.

I don’t like large elements (5-10% of the screen) to pop in the blue; conclusion, I likely need to do a couple of things that have little to do with optimization:

  • Account for an object’s size when deciding whether to draw a far object or not. Grass vanishing would be fine (no, actually I really like the jagged effect on far edges, and I’d also like better-than-spikes grass blades in the foreground)
  • Altogether prevent larger-than-something objects from disappearing in another way than exiting the field of view. That makes plain sense. If it’s not within viewing distance, it can’t appear. If it’s already on the screen, it shouldn’t disappear. This might kill frame rate a little at times – no so much so, because keeping stuff visible will force less of the new stuff to enter the screen.
  • Worry less about using fog to ease in and out, and more about using fog to bring atmosphere.

Notes to self

  • Still need to improve this grass and render it faster.
  • Should really try VBOs.

I’ve been wondering for a while how to make water look more like water in my game. If you’re thinking about transparency, well… you can get a simple transparency effect just by coloring underwater areas. If you really want/need to use the alpha channel instead, you probably have to sort your geometry back to front before rendering (see this link at opengl.org for an overview of what’s involved).

A simple principle can be used to render water using reflections – just render the scene all over again, mirrored against the XZ plane that defines your water…

glPushMatrix();

// 3. translate back after mirroring.

glTranslatef(0.0f, water.level, 0.0f);

// 2. Mirror the scene. I add a little bit of animation with sin(period)

// for effect (see below)

glScalef(1.0f, -0.8f+sin(period)*0.08f, 1.0f);

// 1. move to water level before reflecting the scene.

glTranslatef(0.0f, -water.level, 0.0f);

// … (re-render your scene, or just a small part)

glPopMatrix();


// that’s for animation, won’t make your water ripple, but without shaders…

// (no processing overhead)

period+=0.03f;

if(period>=M_PI)period=0.0f;

3,2,1 goes in reverse order, that’s just how the GL transform chain works.

The next question is then, how do we color the water. I’ve been looking for a simple way to apply a color filter via the GL, but couldn’t find anything. Here’s a few options to consider:

  • Colored GL lights. That’s probably the easiest. Reflect your light sources and change their RGB values. That’s also processing intensive. As if re-rendering significant chunks of your scene wasn’t enough…
  • Global ambient. You can set the global component of a GL light. That’s what I’m using right now. I feel suspicious, however… this doesn’t seem to ‘just change the color’. The rendering looks like something more (processing intensive) is happening.
  • Use vertex colors. likely, that’s among the fastest and flexible options. If your scene is static, you can just duplicate all your vertex color arrays and change each color component as you please. Sure, it’s a little memory intensive.
  • Use material color, or better, just glColor. If you’re OK to render everything using the same color (just for your reflections), you can just disable your color arrays and pass a single color to the GL. Now that’s fast, requires no lighting and generates no memory overheads.

This post is dry, all about performance.

I’ve been working on two scenes. ‘Bridge view’ has no visible NPCs and more geometry. ‘Village View’ reverses the trend.
Whether NPCs are showing or not, they add an overhead to processing.

Read the Data

Processing times are in percentages, either relative to the game loop root or to the parent procedure (—-). For Bridge View I detail sub-routines as percentages of either rendering or game logic. For Village View all percentages are relative to the root of the game loop.

The profiler is accurate. Me writing down the numbers and ordering them is not).

Session overview

I don’t have love for optimization. Optimizing Objective C code may often involve replacing it by C code – something to do with dynamic binding. I work in rounds. If I feel my game is getting too slow, I get into a week-end optimization session.
This is the roughest session I’ve had so far. Too much processing goes into the game loop – put another way, a lot more processing should be event based. I’ve had several sessions before this one, typically resolved issues by targeting just a couple of under performing code sections. Not this time. Overheads are increasingly fragmented, and I won’t get into a redesign before… releasing a couple more games.

Aside from modeling interactions using events, which allows keeping game AI out of the game loop, I have no ‘optimize upfront’ advice. Events make design easier and better anyway.

I wouldn’t hope completing a project if I started off worrying about 1% overheads.

The solutions I discarded for this round are [bracketed]. Using such or such solution is always a coding time versus performance gain issue. Staggering is cheap; simple, code level optimizations are cheap; re-design isn’t cheap.

Note1: While the SDK disallows running processes in the background, iPhones and iPods are multitasking – your app will be running, and another app may download in the background, or you may be receiving a text message. In most cases, I find that the main app is using 85 to 97% of processing time

Note2: I always run in debug mode – yes, because it’s a little slower, to encourage my optimization efforts.

1. Bridge View: 8.5fps, 70 units, 32 611 faces

78.8% – Rendering

—- 52.4% Terrain Rendering (97.5 GLDrawElements)

=> replaced OpenGL lighting (realtime) by just in time pre-calculated lighting.

+=> enabled back-face culling

—> 13FPS

[reduce fog distance - no effect on rendering time]

[use VBOs; yes VBOs are most likely definitely worth a go. I just couldn't bother]

[reduce calls to glDrawElements (group trees?)]

—- 2.9% Actor rendering

—- 31.1% Camera management (More hit testing than reason)

=> discarded steep angle faces; that’s causing bugs too unfortunately]

=> code optimizations(i) (use C code, pointers etc… bringing processing share down to 6%)

21.1% – Game Logic

—- 42% AffectMap (How a PC/NPC is affected by others and the environment)

—- 34.3% Actions (Simple Game AI + Game events)

including…

——– 22% ReflexMap (NPCs taking decisions – simple AI)

——– 11.8 Gate apply (hard to explain)

——– 19.8 BasicActorUpdates apply (Actors moving around and doing stuff)

(i) Intersection test profile before optimizations

30.8% Loading triangle corners into a face object.

30.1% Running the intersection test.

25.4% Objective C method calls

I optimized this by adding a table to my 3D geometry. Instead of indexing vertices per face and running all the redirections, vertex coordinates are duplicated on a per face basis and dumped in a memory block. So instead of doing nice maths with triangle and vector objects, I do so so maths with pointers everywhere.

Just for camera management… not worth it. But that will be useful when I get into the too many NPCs roaming around issue.

2. Village view – 25 rats (13fps – 17000 faces)

Rats… small, low poly, black-eyed mammals with red tails.

41.2% Rendering

—- 26.7% Render Terrain (26.4% for GLDrawElements)

—- 11.9% Render Actors (10.3% glDrawElements)

Not worth optimizing for now; however note few actors are moving. probably collision tests to further optimize later

—- 8.4% Setup Camera (already optimized as above)

=> now doing this 3 times per second; load drops to about 1%

26.2% Game Logic

6% ReflexMap (includes 3.8% to test against potential targets for a given decision; 1% to determine which actors and props can be interacted with)

=> This still uses significant time even though it is staggered.

1) there are agents that will never interact with a given ’specie’. This can be updated at a comparatively very low rate (I use tags, e.g. ‘opponent’ or ‘magic’ to decide what to interact with; tags may change over time, they’re not static).

2) similarly (and maybe more important) an action will only target specified agents.  We shouldn’t need to iterate all unrelated agents whether in range or not.

Consider the following:

- fire an event whenever an actor is added

- submit the actor to all reflex maps

- each reflex map submits the actor to relevant target selectors

[- use actor IDs. each target selector can lookup an array using the id as index (this avoids

checking the tag every time - could use tag events to detect this kind of change]

3% Gate apply (Game management; now staggered)

8.3% AffectMap apply

==> The proposed strategy for ReflexMap now also applies to AffectMap and ActorAffectCondition

ActorAffectCondition is testing all actors on the stage. We only want to test actors that might ever

affect the target.

[==> No time for that, but this should really just be event based. Comparatively, event based reactions (getting hit) are easier to model than event based actions (Launching a kick)]

4.8% BasicActorUpdates

3.4% Actor step (animation management)

1.6% ActivityChangeDispatcher actor:activityChangedFrom:to:

==> This triggers PlaySoundOnActivityChange the problem here is that I’m generating a string before mapping the sound. Same thing can be done virtually at no cost, i.e. map the sound directly from the activity name (maybe combining the actor’s name). This code hasn’t been updated for a long time, it doesn’t even do anything anymore (!)

1.3% Actor getCycleLength -

==> while retaining cycle length may be unsafe (the mapping with actions isn’t one to one) step is checking cycle length at every step. This should be cached at the beginning of an animation cycle (evaluating how long an animation lasts is actually a slightly complex process the way I do it. It involves comparing strings and other well intended, suboptimal steps)

16% OverlayView tick

This is irrelevant – the ‘tick’ is actually used to… …update a text box displaying the frame rate.

Working notes

I left out significant optimizations yet:

  • Disable out of range NPCs. May seem straightforward – why process AIs that won’t interact? I wrote my behavior engine so that NPCs could interact, and this would affect the game even when the player isn’t around. Running a classical adventure/action game atop the engine, I should use this as a game specific optimization.
  • Use VBOs (vertex buffer objects). There’s a lot more I’d like to learn about OpenGL-ES, and I have decided to try not to learn it until I have completed my first project. I also tend to keep this as a ‘magic card’. When optimizing, I choose scenes and configurations that are purposefully loaded, and sometimes pretend not to know about various things that can make things faster with little effort – works for me, like talking with stones in your mouth.
  • Reuse interpolated frames. I’ve already optimized frame interpolation. This may be combined with caching strategies.
  • Use optimized intersection tests (as applied to camera management) for collision detection and walking on uneven terrain.
  • Stagger collision/intersection tests

For the last 3, the profiling cases aren’t suitable – I’ve only profiled static scenes with nothing moving this time. Anyway, before I forsake myself and get into half-life teeming with, dynamic scenes, I’d rather I have had gotten stills flowing in peace and harmony.

Hairlock - dev picHere’s a quick glimpse of the forest covering about a quarter of Hairlock’s tiny world. You bet the large flowers that you see at the bottom are generated/cloned on the fly.

How much plant growth can I afford within frame rate? Probably not as much as I’d like, but little is better than nothing and at least the ground recipe is set. Here’s a quick summary:

  • I define growth per material. So if the terrain has grass, something can grow on the grass, and so forth.
  • Growth isn’t just random. It uses random sources, then randomizes plant locations within (it could be anything. Bones? Bones grow in the desert, right?) and their density. This helps a little because plants aren’t just there to decorate. Pseudo-random distributions also help the player orient themselves a little, acting as landmarks in the landscape.
  • It’s not too hard to hit the roof on how many decorations can be added to a 3D scene. If you double the definition of a 3D model, you don’t double the number of pixels rendered for that model. On the other hand, if I double the number of flowers in a scene, you really draw twice as many pixels over.
  • Better news, there’s no practical limit to how many species of plants you might throw in your game. In this scene, I have only a couple of species (water lillies lost in the fog on the left) but that’s not a rendering limitation – from a design point of view, the trick is not to throw everything in at the same time: if we get our players to find stuff less often, we’re surprising them more often.
  • A side effect of having to pack geometry together (well yea… we can’t call glDrawElements once per flower, right?) is that it’s probably OK to vary the size, shape and color of each element. Haven’t tried yet.

Putting this up has been a little troublesome. as I’ve just mentioned, we can’t just draw each element on the fly, which makes things a little harder

  • For a given terrain (a piece of the 3D world), sources for growth are evaluated the first time the terrain is rendered. Each source will contain many instances of the same decoration, but we don’t evaluate anything at this point.
  • Sources pass a test to make sure they’re in-bound. The first time a source passes the test, memory is allocated for all flowers (or whatever) within that source. Then the terrain is intersected to find the insertion point of each element.
  • For each source, geometry for each element has to be duplicated (remember, we’re trying to call glDrawElements less often). This means the allocation for each source is… sizable. Surely we don’t want to evaluate each rendered source every time. So we cache the sources. So we quickly arrive at a point where the memory is filled with, wow, bouquets of low poly models (here, roughly 30 triangles each).
  • In a first approach, I release geometry for sources that haven’t been drawn for a while.

The trees in the pic are also randomized. But these are large enough (and the geometry is much heavier) so I just use gl operations to transform and render each tree.

By the way these trees have been around for at least a month, time to change them for something better I guess :) The decorator module wasn’t really hard to write – just a little more than an evening’s job.