Just collected a few useful resources. I was mainly looking for explanations and sample code about matrix calculus:
Just collected a few useful resources. I was mainly looking for explanations and sample code about matrix calculus:
We know how to use instruments to fine tune the performance of our games: we have a quick run (maybe one minute) check the heaviest stack traces, and do some surgery here and there, right.
How about cases when…
Here’s how I did it. I had a short span in my game where the frame rate dropped dramatically. While this typically lasts for less than 5 seconds, you may have noticed than the best games rarely slow down at all, if ever. Besides, what’s an isolated incident in a game may signal a problem in the game engine. Let’s get to work.
Here’s what I did:
I took a couple of screenshots so I could compare easily. Here’s what we see:
Oh well. I happen to have 4 actors in there. Incidentally the model is the heaviest I’ve got. The actors aren’t showing on-screen when the frame drop occurs, they’re just nearby. But as a general rule, it’s much better to send more than less than there is to view(!).
The next obvious step was to go back to the game script and see what happens when the actors are removed. The frame rate recovered completely – all that was needed was optimizing the geometry for these actors.
This quick case study shows how CPU sampler can be used to identify overheads within a specific time-span. No traces, no manual profiling. It’s a very simple technique, and it can avoid heading in the wrong direction based on vague intuitions of where the overheads should lie.
In this case, the scene considered actually cumulated several candidates:
When I started off, I was so convinced that the complex scene was responsible for my overhead that I was about to do the artwork all over again, even though scene rendering uses VBOs. Bothering with firing up the profiler and running a 15 minutes session total pointed me in the right direction.
Originally developed under the codename ‘Hairlock’, Antistar 3D: Rising. is a universal game (iPhone, iPad, iPod Touch) published on the 1st of August 2010. Artwork in the game is a mix of Blender art and procedurally generated elements.
A Key Decision…
It was decided very early that all the details in the game would be modeled – no textures! There are several reasons why Antistar was created this way:
1 – I have never been very fond of textured games. This is isn’t because I think textures are bad, but because usually, I find that textures are abused, becoming an easy substitute for shading and high quality visual content.
2 – I trained myself in blender around 2000-2002; I always focused on modeling.
3 – Supporting textures would have added a load on the engine, taking longer to develop, leaving much less time to produce the actual content, making it potentially harder to run at frame rate and increasing loading times.
4 – Although it is now available on iPhone 4 and iPad, the game was originally developed to run nicely on an iPod Touch. On a small screen, the need to crowd scenes with tiny flatland decorations isn’t obvious – even if some games (e.g. Samurai: Way of the Warrior) use textures very ingenuously.
Edit mode rocks!
Back in 2000 when I discovered Blender, I was still looking for software that could help me make my models. I am not a professional artist, so I need to try again and again until I see what I have in mind in the 2D view. Blender’s edit mode, with one hand on the keyboard and another securely cuddling the mouse, lets you change things over and over again at the speed of thought.
While the game doesn’t use textures, Blender has a nifty ‘retopo’ mode allowing to project geometry onto a surface while editing. This was used to add effects that make some players think that the game supports textures, but more importantly, offer an OK substitute to textures in situations where they’re really needed.
Free and comprehensive
It just turns out that Blender is completely free, open source and comprehensive. I didn’t use many features, but these are key to creating a game:
1 – Support for bone animation, with a little of a learning curve, but the same ergonomics found in edit mode, allowing to quickly define and adjust areas of influence for bones, add constraints, etc…
2 – Python scripting. I looked around for an export plugin for ages before getting into it, only to realise that python scripting isn’t hard, and there are key advantages to writing your own file format for 3D. In Antistar, geometry is loaded on the fly, in a format that Open GL understands, as exported directly from Blender, in just a couple of clicks.
How much geometry in there?
As this was my first 3D engine, and my first experience on a mobile platform, I was fairly cautious when I started modeling, as I was worried I would run out of GPU time fairly quickly. Over time I realised that (a) rendering is pretty fast, even on an iPod Touch 2nd gen, and (b) once we do away with mapped images, we have a lot of polygons to play with (it’s also good to know that smaller poylgons render much faster than large ones). Here are some examples:
- Humanoids and other creatures: 650 to 2500 vertices/quads(*)
- Small decorations: ~40 vertices/quads or less. These can get duplicated a few dozen times in the same scene.
- Terrain (e.g. the ‘incubator’ scene displayed at the top): up to 10,000 vertices/quads.
(*) In most cases, the number of vertices is roughly equal to the number of quads.
For rendering, I used a mix of just-in-time calculated and real time lighting (for actors and props). Evaluating simple illumination on the fly while loading models has advantages, but I haven’t implemented a method to refresh lighting following the day/night cycle in the game; instead, fog is used to affect the rendering of the environment according to the day/night cycle.
I’m still considering whether to use LOD (level of detail) nodes (aka, lighter models when viewed from far away) or not. In the first version of the game, I used depth of field balancing to avoid dropping the frame rate too much. depth of field balancing works quite well, although it seems a little difficult to make it perfect :). in the meantime, artefacts aren’t usually disturbing and may even bring a little life to the game, with soft oscillations in depth of field and fog level.
At times 3D can get a little intense. To create the dark forest near Klinnburg, I flattened the trees, only leaving 2D geometry. This allowed keeping patches of procedural grass that I kinda like, and adding walkable stones here and there. I don’t like doing that too much, because it always does show a little.
In the first release, the exporter only processed RGBA colors for materials. The engine is a little more powerful though, as it can take parameters for shading, ambient… well, the basic kit.
I probably want to get better at python scripting, so I can explore ways of generating geometry using plugins, and better balance what’s generated on the fly versus what gets loaded.
Currently, my export plugin handles objects and bone animations quite well, but it works more like a library export plugin, with any blender file containing a generous mix of actors, items and terrain. I want to take the time to turn this into a level editor. This would help streamlining the workflow, as for now the ‘level editing’ process consists in walking around the world and writing down coordinates (**)
And finally, I’ve been working on ‘rough and ready’ procedures for self illumination (!). This can be used part in real time, part by pre-calculating contributions. Initial results are promising and the results may ship with the game within a couple of months!
Antistar 3D: Rising – is a realtime 3D adenture taking advantage of 3D capabilities on the iPod Touch, iPhone 3G/3GS, iPhone 4 and iPad. (app store link)
Playing with fire
Before moving any further, remember using ad-hoc methods to detect which device you’re running is playing with fire. Here is minimal advice to avoid getting into trouble:
Some people seem to think that sysctlbyname is not a documented function and shouldn’t be used. Interestingly, in a rejection letter posted here, apple encourages a developer to use this function. This function is documented. But hey, hold on. That doesn’t imply we should fine-grain detect the device model, right? This function has tons of uses beside that.
How can I solve my problem?
What I want to do is determine how fast the device I am running on is, so I can set the antialiasing level for my game. On an iPod touch, I can’t run antialiasing (not with the 3D content I have,that is), it’s too slow. On a 3GS, it’s OK.
For now, I’m falling back on checking for OpenGLES1.2 support. It doesn’t directly check speed, but earlier devices only support OpenGLES2.0, so I’m guessing it should be OK.
If you’re lazy and you want a simple background image for your 3D, here’s a neat trick:
You can color the background by passing other values.
You can still enable fog. Fog colors polygons, not the background. The only problem is, far objects won’t blend in the background. They’ll stand out fogged like hair in the distant soup. Works well for starry skies, or just if the lower part of your sky is the same color as the fog.
VBOs* are great. What will lured you in is the promise of a dramatic increase in rendering performance.
*(Vertex buffer objects, also referred as ‘VAOs’ – vertex array objects)
1. Setup code (not in your rendering loop)
// this example illustrates how to load a vertex array buffer. // only small variations for loading color arrays // and element (indexed triangle) arrays. // see the rest of the article for details. // create a handle that you will use to refer to the buffer when talking to gl GLuint handle; glGenBuffers(1,&handle); // let's load our buffer data... // size of your buffer, here, 3 times the number of vertices, // multiplied by your format size, for example sizeof(GL_FLOAT). int dataByteSize = ... float* data = malloc(dataByteSize) // need to populate your buffer with vertex data! ... // this loads 'data' into graphic memory referred by 'handle' glBindBuffer(GL_ARRAY_BUFFER,handle); // Typically you want to use static draw. This implies you will use the buffer // over and over, in other words you won't modify geometry afterwards. glBufferData( GL_ARRAY_BUFFER, dataSize, data, GL_STATIC_DRAW); // clear the binding after loading your array, otherwise you will get crashes glBindBuffer(GL_ARRAY_BUFFER,0);
2. In the rendering loop
To draw from the buffers, use something like this:
glBindBuffer(GL_ARRAY_BUFFER,handleToCoordBuffer); glVertexPointer(4,GL_FLOAT,0,0); glBindBuffer(GL_ARRAY_BUFFER,0); // reset glBindBuffer(GL_ELEMENT_ARRAY_BUFFER,handleToPerFaceVertexIndices); glDrawElements(... , 0); glBindBuffer(GL_ELEMENT_ARRAY_BUFFER,0); // reset
What are VBOs?
Open GL-ES 1.x allows defining buffers into which we load our vertex (but also color, normal and face indexing) data. Color, normal and coordinate arrays are passed as ‘just array buffers’. Indexed faces are passed using element array buffers.
When possible (depending on the underlying hardware), buffer objects are stored in graphic memory, where the GPU can access them real fast.
First you define a handle variable. Buffers are indexed, so we need handles.
This code generates just one handle. That’s what the number ’1′ is saying. We’re passing by reference so if we wanted, we could have allocated several handles within the same memory chunk. I’m not really sure why you’d ever want to do that but… (hey, I didn’t say it wasn’t marginally more efficient, right?)
Next you make the handle ‘current’, or in other words, ‘bind the buffer’. We tell gl that instead of addressing memory using pointers, we will address the buffer referred to by the handle, or, if loading data into the buffer, a buffer will be created for us and referred to by the handle.
There are two ‘channels’ for addressing, one (GL_ARRAY_BUFFER) relates to functions like glColorPointer, glVertexPointer, glNormalPointer. The other refers mainly to glDrawElements (pass GL_ELEMENT_ARRAY_BUFFER).
In this case I’m planning on loading coordinates in the buffer, so I use GL_ARRAY_BUFFER. To load vertex indices (as used by glDrawElements), use GL_ELEMENT_ARRAY_BUFFER.
Next we pass a pointer to the memory block containing our data:
// data is pointing at our vertex coordinates. Something like float* data = malloc(…)
// dataSize is 3 times the number of vertices, multiplied by your format size, e.g sizeof(GL_FLOAT).
glBufferData( GL_ARRAY_BUFFER, dataSize, data, GL_STATIC_DRAW)
GL_STATIC_DRAW is telling gl that we will never change the data pointed at, so it’s safe to copy the data somewhere else (it’s a hint. Check the doc for less useful hints, because if you assert that the data is dynamic, it will be much harder for gl to optimize and your buffer may be a little cosmetic).
Now that we’ve loaded our data into the buffer referred by ‘handle’, it may be a good idea to reset the buffer binding, otherwise we may crash later on (see caveats, below).
Rendering from buffers
It is easy to convert gl drawing code using memory pointers to code using buffers. All we need to do is bind the buffer we want to use, then replace whatever pointer we used to use by zero (1).
glVertexPointer(3,GL_FLOAT,0,0 /*replaced the pointer reference by zero*/ );
So we first have to bind the buffer once again, then we call glVertexPointer as we used to, but instead of passing our memory pointer, we pass zero to indicate that we want to read from the first entry in the buffer.
And we do (a) exactly the same thing for color buffers and (b) quite the same thing for element buffers, viz.:
glVertexPointer(4,GL_FLOAT,0,0 /*replaced the pointer reference by zero*/ );
glDrawElements(… , 0 /*replaced the pointer reference by zero*/ );
Errors and caveats
(1) Why zero? Actually, instead of using the pointer as a memory pointer, glBindBuffer tells gl that the pointer is ‘x machine units’ starting from the beginning of the graphic memory addressed by the handle. So if somebody knows what machine units are (maybe a float if it’s an array of GL_FLOAT) they might want to use that to draw only portions of the buffer they passed.
You may be tempted to pass null or nil (and find bad literature to inspire you). That’s kind of obfuscating the matter. You don’t pass zero to indicate no pointer, or a null pointer. Zero is really just an offset.
If I started 3D graphics programming targeting a desktop platform or XBox Indie, I would have felt hopelessly intimidated. Also, I would have gotten bogged down into textures and shaders.
Instead I started on the ITouch. By far one of the most powerful mobile platforms (not counting the PSP, likely no worse than the DS), it’s still a weeny tiny computer with little memory and processing power to boast. Here’s what I’m learning:
What’s the conclusion? I wouldn’t code on a 10 years old PC. Too damn stupid. I wouldn’t test on a net-book – kind of lame. Starting on a mobile platform makes me feel happy, because mobile gaming has a bright, beautiful future. And the best part is, when I scale up, I’ll be fearless.
Happy coding :)
What you’re looking at may well be the first official preview pic for Hairlock. Unless I change the name to Redlock… or whatever.
But I’m too tired to even draft a 5 line blurb, which is where my ambition lies. For the game itself, well… Fancy running this through the profiler since adding more trees and more growth is hitting me low.
In all cases i gave two figures – the first one for the scene presented here, the second for walking around, but mostly focusing on a very different kind of scene.
Is it so bad after all?
I wrote a few times that optimization isn’t the first thing to have in mind. I’ve been struggling a little to manage drawing everything I want (to be accurate, I have a radical impulse to add more grass). I also started this post assuming that I couldn’t balance around 15FPS. Wrong.
I was staring at my little cotton clouds crossing the screen, and I suddenly realized that the clouds were obscured by otherwise invisible background elements – because I have a linear fog set to start at 1/4th of the viewing distance. Point is, I have this reference scene with a couple of houses in the background, and what this little observation shows is that balancing didn’t require this stuff to disappear.
I don’t like large elements (5-10% of the screen) to pop in the blue; conclusion, I likely need to do a couple of things that have little to do with optimization:
Notes to self
I’ve been wondering for a while how to make water look more like water in my game. If you’re thinking about transparency, well… you can get a simple transparency effect just by coloring underwater areas. If you really want/need to use the alpha channel instead, you probably have to sort your geometry back to front before rendering (see this link at opengl.org for an overview of what’s involved).
A simple principle can be used to render water using reflections – just render the scene all over again, mirrored against the XZ plane that defines your water…
// 3. translate back after mirroring.
glTranslatef(0.0f, water.level, 0.0f);
// 2. Mirror the scene. I add a little bit of animation with sin(period)
// for effect (see below)
glScalef(1.0f, -0.8f+sin(period)*0.08f, 1.0f);
// 1. move to water level before reflecting the scene.
glTranslatef(0.0f, -water.level, 0.0f);
// … (re-render your scene, or just a small part)
// that’s for animation, won’t make your water ripple, but without shaders…
// (no processing overhead)
3,2,1 goes in reverse order, that’s just how the GL transform chain works.
The next question is then, how do we color the water. I’ve been looking for a simple way to apply a color filter via the GL, but couldn’t find anything. Here’s a few options to consider:
This post is dry, all about performance.
I’ve been working on two scenes. ‘Bridge view’ has no visible NPCs and more geometry. ‘Village View’ reverses the trend.
Whether NPCs are showing or not, they add an overhead to processing.
Read the Data
Processing times are in percentages, either relative to the game loop root or to the parent procedure (—-). For Bridge View I detail sub-routines as percentages of either rendering or game logic. For Village View all percentages are relative to the root of the game loop.
The profiler is accurate. Me writing down the numbers and ordering them is not).
I don’t have love for optimization. Optimizing Objective C code may often involve replacing it by C code – something to do with dynamic binding. I work in rounds. If I feel my game is getting too slow, I get into a week-end optimization session.
This is the roughest session I’ve had so far. Too much processing goes into the game loop – put another way, a lot more processing should be event based. I’ve had several sessions before this one, typically resolved issues by targeting just a couple of under performing code sections. Not this time. Overheads are increasingly fragmented, and I won’t get into a redesign before… releasing a couple more games.
Aside from modeling interactions using events, which allows keeping game AI out of the game loop, I have no ‘optimize upfront’ advice. Events make design easier and better anyway.
I wouldn’t hope completing a project if I started off worrying about 1% overheads.
The solutions I discarded for this round are [bracketed]. Using such or such solution is always a coding time versus performance gain issue. Staggering is cheap; simple, code level optimizations are cheap; re-design isn’t cheap.
Note1: While the SDK disallows running processes in the background, iPhones and iPods are multitasking – your app will be running, and another app may download in the background, or you may be receiving a text message. In most cases, I find that the main app is using 85 to 97% of processing time
Note2: I always run in debug mode – yes, because it’s a little slower, to encourage my optimization efforts.
78.8% – Rendering
—- 52.4% Terrain Rendering (97.5 GLDrawElements)
=> replaced OpenGL lighting (realtime) by just in time pre-calculated lighting.
+=> enabled back-face culling
[reduce fog distance - no effect on rendering time]
[use VBOs; yes VBOs are most likely definitely worth a go. I just couldn't bother]
[reduce calls to glDrawElements (group trees?)]
—- 2.9% Actor rendering
—- 31.1% Camera management (More hit testing than reason)
=> discarded steep angle faces; that’s causing bugs too unfortunately]
=> code optimizations(i) (use C code, pointers etc… bringing processing share down to 6%)
21.1% – Game Logic
—- 42% AffectMap (How a PC/NPC is affected by others and the environment)
—- 34.3% Actions (Simple Game AI + Game events)
——– 22% ReflexMap (NPCs taking decisions – simple AI)
——– 11.8 Gate apply (hard to explain)
——– 19.8 BasicActorUpdates apply (Actors moving around and doing stuff)
(i) Intersection test profile before optimizations
30.8% Loading triangle corners into a face object.
30.1% Running the intersection test.
25.4% Objective C method calls
I optimized this by adding a table to my 3D geometry. Instead of indexing vertices per face and running all the redirections, vertex coordinates are duplicated on a per face basis and dumped in a memory block. So instead of doing nice maths with triangle and vector objects, I do so so maths with pointers everywhere.
Just for camera management… not worth it. But that will be useful when I get into the too many NPCs roaming around issue.
Rats… small, low poly, black-eyed mammals with red tails.
—- 26.7% Render Terrain (26.4% for GLDrawElements)
—- 11.9% Render Actors (10.3% glDrawElements)
Not worth optimizing for now; however note few actors are moving. probably collision tests to further optimize later
—- 8.4% Setup Camera (already optimized as above)
=> now doing this 3 times per second; load drops to about 1%
26.2% Game Logic
6% ReflexMap (includes 3.8% to test against potential targets for a given decision; 1% to determine which actors and props can be interacted with)
=> This still uses significant time even though it is staggered.
1) there are agents that will never interact with a given ‘specie’. This can be updated at a comparatively very low rate (I use tags, e.g. ‘opponent’ or ‘magic’ to decide what to interact with; tags may change over time, they’re not static).
2) similarly (and maybe more important) an action will only target specified agents. We shouldn’t need to iterate all unrelated agents whether in range or not.
Consider the following:
- fire an event whenever an actor is added
- submit the actor to all reflex maps
- each reflex map submits the actor to relevant target selectors
[- use actor IDs. each target selector can lookup an array using the id as index (this avoids
checking the tag every time - could use tag events to detect this kind of change]
3% Gate apply (Game management; now staggered)
8.3% AffectMap apply
==> The proposed strategy for ReflexMap now also applies to AffectMap and ActorAffectCondition
ActorAffectCondition is testing all actors on the stage. We only want to test actors that might ever
affect the target.
[==> No time for that, but this should really just be event based. Comparatively, event based reactions (getting hit) are easier to model than event based actions (Launching a kick)]
3.4% Actor step (animation management)
1.6% ActivityChangeDispatcher actor:activityChangedFrom:to:
==> This triggers PlaySoundOnActivityChange the problem here is that I’m generating a string before mapping the sound. Same thing can be done virtually at no cost, i.e. map the sound directly from the activity name (maybe combining the actor’s name). This code hasn’t been updated for a long time, it doesn’t even do anything anymore (!)
1.3% Actor getCycleLength -
==> while retaining cycle length may be unsafe (the mapping with actions isn’t one to one) step is checking cycle length at every step. This should be cached at the beginning of an animation cycle (evaluating how long an animation lasts is actually a slightly complex process the way I do it. It involves comparing strings and other well intended, suboptimal steps)
16% OverlayView tick
This is irrelevant – the ‘tick’ is actually used to… …update a text box displaying the frame rate.
I left out significant optimizations yet:
For the last 3, the profiling cases aren’t suitable – I’ve only profiled static scenes with nothing moving this time. Anyway, before I forsake myself and get into half-life teeming with, dynamic scenes, I’d rather I have had gotten stills flowing in peace and harmony.