Skip to content

Archive

Tag: optimization

XCode notes

  • Code generation settings are under project > Build Settings > Code generation
  • The ‘main switch’ is under LLVM GCC 4.2 > Optimization level.
  • You can pass additional flags under LLVM GCC 4.2 > Language > Other C flags
LLVM or GCC?
In the best case scenario you can pick between 3 compilers. I don’t have benchmarks; the overall feel from reading around is that GCC may generate faster apps.
Incidentally, my system stalled while building a C++ library with -O2/-O3/-Os turned on (LLVM GCC, LLVM 3.0). Adding -fno-inline-functions to Other C flags ‘fixed’ the issue. But now, that’s quite note what I wanted to do. Switching to GCC 4.2 solved the problem (the target library wraps basic maths with inlines; enforcing inlining yielded 10-15% gain).
.
Read here?
.
Thumbs up?
.
On ARMv6 ( iPhone 3GS and earlier), turning thumbs off, used to yield a dramatic performance increase on floating point calculations. Well not anymore. Put in a better way, there aren’t many ARMv6 devices around anymore ( ~3.5%? )
.
I tested an FP intensive task and observed a marginal improvement ( maybe less than 3%) with an iPad2.
.
-/O0/O1/O2/O3/Os
.
I was curious to see whether -Os (smallest/fastest) would perform significantly worse than -O3. It doesn’t – well not in my case. I observe about 4% speed increase between -O2/O3/Os and -O0 (-O0 : no optimizations).
.
I tested this on a graphic application running OpenGL ES 1.x . I measure the performance increase by looking at % usage per frame for non GPU tasks (In my case, the CPU/GPU balance is fairly even).
.
Measuring performance
.
Running with GDB or Instruments turned on will slow down your app; a crude way to measure release-grade performance may be to stick a frame rate meter right on top of your UI (now that is common practice, right?) and check with an unplugged device.
This won’t tell you where your overheads are going. For this, I still find Instruments an invaluable help.
.
I’m looking at averaging performance over one or several game sessions to get useful figures. Formal test cases can help solving specific performance issues – keeping in mind that optimization issues and level design strongly interact.
.
Stuff that looks interesting
.
.
Conclusion
.
With a background in Java and other virtual languages, I find that playing with compile time settings is, overall, almost a waste of time. Sadly (cf. the thumbs episode) it isn’t always quite so.
.
Wasting a day on belittling tweaks and flags from time to time isn’t altogether bad.

We know how to use instruments to fine tune the performance of our games: we have a quick run (maybe one minute) check the heaviest stack traces, and do some surgery here and there, right.

How about cases when…

  • There’s just a couple of frames hanging every once in a while.
  • The game slows down for a short period of time

Here’s how I did it. I had a short span in my game where the frame rate dropped dramatically. While this typically lasts for less than 5 seconds, you may have noticed than the best games rarely slow down at all, if ever. Besides, what’s an isolated incident in a game may signal a problem in the game engine. Let’s get to work.

Here’s what I did:

  1. I create a new file in Instruments, choosing CPU sampler.
  2. I record a 2 to 5 minutes session, making sure I record the part of the game where the frame rate drops (!)
  3. After the run, spikes (higher spikes…) would show, matching the time when the frame drop occured. I tried to get to this sample by sample, but I couldn’t extract any useful information.
  4. Now, checking the inspection range button group at the top of the instruments UI, I noticed that I could restrict the span considered for profiling.
  5. I then compared overheads within and without the frame drop time-span.

I took a couple of screenshots so I could compare easily. Here’s what we see:

  • In the first case, CPU usage is very low. The game spends about 80% of the time waiting for the frame callback (note we’re running on an iPhone4, and this is a universal app also available on iPod 2nd gen.)
  • In the second case, 85% of machine time is used up. There will be other things happening in the background etc… but more importantly, we can identify several activities that are, otherwise, simply insignificant (too fast for the sampler to pickup):
    • Drawing actors. This occupies 50% of the run loop
    • Evaluating decorations

Oh well. I happen to have 4 actors in there. Incidentally the model is the heaviest I’ve got. The actors aren’t showing on-screen when the frame drop occurs, they’re just nearby. But as a general rule, it’s much better to send more than less than there is to view(!).

The next obvious step was to go back to the game script and see what happens when the actors are removed. The frame rate recovered completely – all that was needed was optimizing the geometry for these actors.

Conclusion

This quick case study shows how CPU sampler can be used to identify overheads within a specific time-span. No traces, no manual profiling. It’s a very simple technique, and it can avoid heading in the wrong direction based on vague intuitions of where the overheads should lie.

In this case, the scene considered actually cumulated several candidates:

  • The scene is complex. In fact I had to break up the scenery into several components because the max vertex count (owing to number format in my files) is ~8000.
  • Procedurally generated decorations add to the rendering overhead for static elements.
  • But the actor model, duplicated 4 times, is also heavy.

When I started off, I was so convinced that the complex scene was responsible for my overhead that I was about to do the artwork all over again, even though scene rendering uses VBOs. Bothering with firing up the profiler and running a 15 minutes session total pointed me in the right direction.

VBOs* are great. What will lured you in is the promise of a dramatic increase in rendering performance.

*(Vertex buffer objects, also referred as ‘VAOs’ – vertex array objects)

Coding recipe

1. Setup code (not in your rendering loop)

// this example illustrates how to load a vertex array buffer.
// only small variations for loading color arrays
// and element (indexed triangle) arrays.
// see the rest of the article for details.
// create a handle that you will use to refer to the buffer when talking to gl
GLuint handle; glGenBuffers(1,&handle);
// let's load our buffer data...
// size of your buffer, here, 3 times the number of vertices,
// multiplied by your format size, for example sizeof(GL_FLOAT).
int dataByteSize = ... float* data = malloc(dataByteSize)
// need to populate your buffer with vertex data!
...
// this loads 'data' into graphic memory referred by 'handle'
glBindBuffer(GL_ARRAY_BUFFER,handle);
// Typically you want to use static draw. This implies you will use the buffer
// over and over, in other words you won't modify geometry afterwards.
glBufferData( GL_ARRAY_BUFFER, dataSize, data, GL_STATIC_DRAW);
// clear the binding after loading your array, otherwise you will get crashes
glBindBuffer(GL_ARRAY_BUFFER,0);


2. In the rendering loop

To draw from the buffers, use something like this:

glBindBuffer(GL_ARRAY_BUFFER,handleToCoordBuffer);
glVertexPointer(4,GL_FLOAT,0,0); 
glBindBuffer(GL_ARRAY_BUFFER,0); // reset
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER,handleToPerFaceVertexIndices); 
glDrawElements(... , 0);
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER,0); // reset

What are VBOs?

Open GL-ES 1.x allows defining buffers into which we load our vertex (but also color, normal and face indexing) data. Color, normal and coordinate arrays are passed as ‘just array buffers’. Indexed faces are passed using element array buffers.

When possible (depending on the underlying hardware), buffer objects are stored in graphic memory, where the GPU can access them real fast.

Creating Buffers

First you define a handle variable. Buffers are indexed, so we need handles.

GLuint handle;
glGenBuffers(1,&handle);

This code generates just one handle. That’s what the number ’1′ is saying. We’re passing by reference so if we wanted, we could have allocated several handles within the same memory chunk. I’m not really sure why you’d ever want to do that but… (hey, I didn’t say it wasn’t marginally more efficient, right?)

Next you make the handle ‘current’, or in other words, ‘bind the buffer’. We tell gl that instead of addressing memory using pointers, we will address the buffer referred to by the handle, or, if loading data into the buffer, a buffer will be created for us and referred to by the handle.

glBindBuffer(GL_ARRAY_BUFFER,handle);

There are two ‘channels’ for addressing, one (GL_ARRAY_BUFFER) relates to functions like glColorPointer, glVertexPointer, glNormalPointer. The other refers mainly to glDrawElements (pass GL_ELEMENT_ARRAY_BUFFER).
In this case I’m planning on loading coordinates in the buffer, so I use GL_ARRAY_BUFFER. To load vertex indices (as used by glDrawElements), use GL_ELEMENT_ARRAY_BUFFER.

Next we pass a pointer to the memory block containing our data:

// data is pointing at our vertex coordinates. Something like float* data = malloc(…)
// dataSize is 3 times the number of vertices, multiplied by your format size, e.g sizeof(GL_FLOAT).
glBufferData( GL_ARRAY_BUFFER, dataSize, data, GL_STATIC_DRAW)

GL_STATIC_DRAW is telling gl that we will never change the data pointed at, so it’s safe to copy the data somewhere else (it’s a hint. Check the doc for less useful hints, because if you assert that the data is dynamic, it will be much harder for gl to optimize and your buffer may be a little cosmetic).

Now that we’ve loaded our data into the buffer referred by ‘handle’, it may be a good idea to reset the buffer binding, otherwise we may crash later on (see caveats, below).

glBindBuffer(GL_ARRAY_BUFFER,0);

Rendering from buffers

It is easy to convert gl drawing code using memory pointers to code using buffers. All we need to do is bind the buffer we want to use, then replace whatever pointer we used to use by zero (1).

glBindBuffer(GL_ARRAY_BUFFER,handle);
glVertexPointer(3,GL_FLOAT,0,0 /*replaced the pointer reference by zero*/ );

So we first have to bind the buffer once again, then we call glVertexPointer as we used to, but instead of passing our memory pointer, we pass zero to indicate that we want to read from the first entry in the buffer.

And we do (a) exactly the same thing for color buffers and (b) quite the same thing for element buffers, viz.:

glBindBuffer(GL_ARRAY_BUFFER,handleToColorBuffer);
glVertexPointer(4,GL_FLOAT,0,0 /*replaced the pointer reference by zero*/ );

glBindBuffer(GL_ELEMENT_ARRAY_BUFFER,handleToPerFaceVertexIndices);
glDrawElements(… , 0 /*replaced the pointer reference by zero*/ );

Errors and caveats

  1. Passing GL_ARRAY_BUFFER instead of GL_ELEMENT_ARRAY_BUFFER (or vice versa).
  2. Not calling glBindBuffer(xxx,0) before running an Open-GL call that doesn’t use VBOs.
    Why is that a problem? Because glBindBuffer reinterprets the pointer you pass to glVertextPointer, glColorPointer (…) as a pointer relative to the buffer, instead of a general memory pointer
  3. So if you pass a regular pointer to any of these functions afterwards, your pointer is an offset inside the current buffer, not a memory pointer. Bang.
  4. A typical caveat related to (2) may be when you generate buffers while rendering. Since I have a mix of artwork and procedural geometry, that’s exactly what I do, and I don’t want/cannot use VBOs everywhere. So I bound my buffer for loading them, then moved on to rendering something else, and had a nice crash.
    Either way failing to reset the bindings after assigning them, then calling glDrawElements, will consistently mess your app, and maybe the next 3D app you run afterwards
  5. Not releasing buffer memory. Following  the above examples, we’d call glDeleteBuffers(1,&handle) to release the buffer referred from ‘handle’. Whatever memory is used by buffers is, in my humble experience, quite limited. It would appear that, short of making it a science, a memmove error signals memory shortage when running an iPod touch.

Performance

  • In the simulator, no performance increase
  • On 2nd gen devices, up to iPod touch 2nd gen 8 GB, no performance increase
  • Dramatic performance increase on 3rd gen+ devices.

Notes

(1) Why zero? Actually, instead of using the pointer as a memory pointer, glBindBuffer tells gl that the pointer is ‘x machine units’ starting from the beginning of the graphic memory addressed by the handle. So if somebody knows what machine units are (maybe a float if it’s an array of GL_FLOAT) they might want to use that to draw only portions of the buffer they passed.
You may be tempted to pass null or nil (and find bad literature to inspire you). That’s kind of obfuscating the matter. You don’t pass zero to indicate no pointer, or a null pointer. Zero is really just an offset.

Hairlock - dev picWhat you’re looking at may well be the first official preview pic for Hairlock. Unless I change the name to Redlock… or whatever.

But I’m too tired to even draft a 5 line blurb, which is where my ambition lies. For the game itself, well… Fancy running this through the profiler since adding more trees and more growth is hitting me low.

  • 16.4/18.6% procedural grass. Awkward considering so little of it is showing in this shot. Well what’s awkward is I should really grid-chunk this stuff, and likely define LOD nodes for it.
  • 14/4% decorations. That’s the shorter plants and all the stones. Yea… each stone is rendered as a 3D object. Compared to grass, other decorations use pretty efficient bounding boxes. Maybe too efficient. I’d like to use less GL calls for these (grid-chunking again – different problem, same proposed solution)
  • 11.1/4% the terrain. All trees, trunks and the ground count as terrain; plus the axe at the front.
  • 10.4% rendering actors – count the little lady and the crow.
  • 7.7/10.6% objc_msgSend
  • 0.7/0.5% clouds in the sky (animated, not quite visible on this pic)
  • 0/18.3% eval geometry for decorations

In all cases i gave two figures – the first one for the scene presented here, the second for walking around, but mostly focusing on a very different kind of scene.

Is it so bad after all?

I wrote a few times that optimization isn’t the first thing to have in mind. I’ve been struggling a little to manage drawing everything I want (to be accurate, I have a radical impulse to add more grass). I also started this post assuming that I couldn’t balance around 15FPS. Wrong.

I was staring at my little cotton clouds crossing the screen, and I suddenly realized that the clouds were obscured by otherwise invisible background elements – because I have a linear fog set to start at 1/4th of the viewing distance. Point is, I have this reference scene with a couple of houses in the background, and what this little observation shows is that balancing didn’t require this stuff to disappear.

I don’t like large elements (5-10% of the screen) to pop in the blue; conclusion, I likely need to do a couple of things that have little to do with optimization:

  • Account for an object’s size when deciding whether to draw a far object or not. Grass vanishing would be fine (no, actually I really like the jagged effect on far edges, and I’d also like better-than-spikes grass blades in the foreground)
  • Altogether prevent larger-than-something objects from disappearing in another way than exiting the field of view. That makes plain sense. If it’s not within viewing distance, it can’t appear. If it’s already on the screen, it shouldn’t disappear. This might kill frame rate a little at times – no so much so, because keeping stuff visible will force less of the new stuff to enter the screen.
  • Worry less about using fog to ease in and out, and more about using fog to bring atmosphere.

Notes to self

  • Still need to improve this grass and render it faster.
  • Should really try VBOs.

This post is dry, all about performance.

I’ve been working on two scenes. ‘Bridge view’ has no visible NPCs and more geometry. ‘Village View’ reverses the trend.
Whether NPCs are showing or not, they add an overhead to processing.

Read the Data

Processing times are in percentages, either relative to the game loop root or to the parent procedure (—-). For Bridge View I detail sub-routines as percentages of either rendering or game logic. For Village View all percentages are relative to the root of the game loop.

The profiler is accurate. Me writing down the numbers and ordering them is not).

Session overview

I don’t have love for optimization. Optimizing Objective C code may often involve replacing it by C code – something to do with dynamic binding. I work in rounds. If I feel my game is getting too slow, I get into a week-end optimization session.
This is the roughest session I’ve had so far. Too much processing goes into the game loop – put another way, a lot more processing should be event based. I’ve had several sessions before this one, typically resolved issues by targeting just a couple of under performing code sections. Not this time. Overheads are increasingly fragmented, and I won’t get into a redesign before… releasing a couple more games.

Aside from modeling interactions using events, which allows keeping game AI out of the game loop, I have no ‘optimize upfront’ advice. Events make design easier and better anyway.

I wouldn’t hope completing a project if I started off worrying about 1% overheads.

The solutions I discarded for this round are [bracketed]. Using such or such solution is always a coding time versus performance gain issue. Staggering is cheap; simple, code level optimizations are cheap; re-design isn’t cheap.

Note1: While the SDK disallows running processes in the background, iPhones and iPods are multitasking – your app will be running, and another app may download in the background, or you may be receiving a text message. In most cases, I find that the main app is using 85 to 97% of processing time

Note2: I always run in debug mode – yes, because it’s a little slower, to encourage my optimization efforts.

1. Bridge View: 8.5fps, 70 units, 32 611 faces

78.8% – Rendering

—- 52.4% Terrain Rendering (97.5 GLDrawElements)

=> replaced OpenGL lighting (realtime) by just in time pre-calculated lighting.

+=> enabled back-face culling

—> 13FPS

[reduce fog distance - no effect on rendering time]

[use VBOs; yes VBOs are most likely definitely worth a go. I just couldn't bother]

[reduce calls to glDrawElements (group trees?)]

—- 2.9% Actor rendering

—- 31.1% Camera management (More hit testing than reason)

=> discarded steep angle faces; that’s causing bugs too unfortunately]

=> code optimizations(i) (use C code, pointers etc… bringing processing share down to 6%)

21.1% – Game Logic

—- 42% AffectMap (How a PC/NPC is affected by others and the environment)

—- 34.3% Actions (Simple Game AI + Game events)

including…

——– 22% ReflexMap (NPCs taking decisions – simple AI)

——– 11.8 Gate apply (hard to explain)

——– 19.8 BasicActorUpdates apply (Actors moving around and doing stuff)

(i) Intersection test profile before optimizations

30.8% Loading triangle corners into a face object.

30.1% Running the intersection test.

25.4% Objective C method calls

I optimized this by adding a table to my 3D geometry. Instead of indexing vertices per face and running all the redirections, vertex coordinates are duplicated on a per face basis and dumped in a memory block. So instead of doing nice maths with triangle and vector objects, I do so so maths with pointers everywhere.

Just for camera management… not worth it. But that will be useful when I get into the too many NPCs roaming around issue.

2. Village view – 25 rats (13fps – 17000 faces)

Rats… small, low poly, black-eyed mammals with red tails.

41.2% Rendering

—- 26.7% Render Terrain (26.4% for GLDrawElements)

—- 11.9% Render Actors (10.3% glDrawElements)

Not worth optimizing for now; however note few actors are moving. probably collision tests to further optimize later

—- 8.4% Setup Camera (already optimized as above)

=> now doing this 3 times per second; load drops to about 1%

26.2% Game Logic

6% ReflexMap (includes 3.8% to test against potential targets for a given decision; 1% to determine which actors and props can be interacted with)

=> This still uses significant time even though it is staggered.

1) there are agents that will never interact with a given ‘specie’. This can be updated at a comparatively very low rate (I use tags, e.g. ‘opponent’ or ‘magic’ to decide what to interact with; tags may change over time, they’re not static).

2) similarly (and maybe more important) an action will only target specified agents.  We shouldn’t need to iterate all unrelated agents whether in range or not.

Consider the following:

- fire an event whenever an actor is added

- submit the actor to all reflex maps

- each reflex map submits the actor to relevant target selectors

[- use actor IDs. each target selector can lookup an array using the id as index (this avoids

checking the tag every time - could use tag events to detect this kind of change]

3% Gate apply (Game management; now staggered)

8.3% AffectMap apply

==> The proposed strategy for ReflexMap now also applies to AffectMap and ActorAffectCondition

ActorAffectCondition is testing all actors on the stage. We only want to test actors that might ever

affect the target.

[==> No time for that, but this should really just be event based. Comparatively, event based reactions (getting hit) are easier to model than event based actions (Launching a kick)]

4.8% BasicActorUpdates

3.4% Actor step (animation management)

1.6% ActivityChangeDispatcher actor:activityChangedFrom:to:

==> This triggers PlaySoundOnActivityChange the problem here is that I’m generating a string before mapping the sound. Same thing can be done virtually at no cost, i.e. map the sound directly from the activity name (maybe combining the actor’s name). This code hasn’t been updated for a long time, it doesn’t even do anything anymore (!)

1.3% Actor getCycleLength -

==> while retaining cycle length may be unsafe (the mapping with actions isn’t one to one) step is checking cycle length at every step. This should be cached at the beginning of an animation cycle (evaluating how long an animation lasts is actually a slightly complex process the way I do it. It involves comparing strings and other well intended, suboptimal steps)

16% OverlayView tick

This is irrelevant – the ‘tick’ is actually used to… …update a text box displaying the frame rate.

Working notes

I left out significant optimizations yet:

  • Disable out of range NPCs. May seem straightforward – why process AIs that won’t interact? I wrote my behavior engine so that NPCs could interact, and this would affect the game even when the player isn’t around. Running a classical adventure/action game atop the engine, I should use this as a game specific optimization.
  • Use VBOs (vertex buffer objects). There’s a lot more I’d like to learn about OpenGL-ES, and I have decided to try not to learn it until I have completed my first project. I also tend to keep this as a ‘magic card’. When optimizing, I choose scenes and configurations that are purposefully loaded, and sometimes pretend not to know about various things that can make things faster with little effort – works for me, like talking with stones in your mouth.
  • Reuse interpolated frames. I’ve already optimized frame interpolation. This may be combined with caching strategies.
  • Use optimized intersection tests (as applied to camera management) for collision detection and walking on uneven terrain.
  • Stagger collision/intersection tests

For the last 3, the profiling cases aren’t suitable – I’ve only profiled static scenes with nothing moving this time. Anyway, before I forsake myself and get into half-life teeming with, dynamic scenes, I’d rather I have had gotten stills flowing in peace and harmony.

Note: this article isn’t a benchmark. This article investigates GL-ES rendering performance in a running game prototype (with game logic running, not just GL rendering).

Today I’ve tested rendering performance with a simple character – ‘hit-man’ (230 vertices, 206 faces):

  • idle animation
  • walk cycle
  • hit animation

I use indexed faces (GLDrawElements) without VBOs (glVertexPointer). I’m not sure about using VBOs for actors because I interpolate animation frames on the fly – if I want to use VBOs, I’ll have to worry about how many mesh duplicates I can buffer, when to release them and so forth…

I animated a 100 actors this way, and it wasn’t fast – not hopelessly crawling like when I added game logic for the same 100 actors – just slow. So I did the obvious – added a boundary check to render only actors visible on-screen. That brought back an acceptable frame rate right away, without the need for further optimizations. The frame rate is still about OK with 20 actors on-screen.

This came as a surprise since my terrain rendering (no VBOs either) generates the highest overhead in my iPhone app. Here’s a quick breakdown of processor usage (rendering related only):

  • 40% – GLDrawElements for terrain (300 tiles, about 18400 faces)
  • 20% – Frame interpolation
  • 3.5% – GLDrawElements for actors (20-25 on-screen, about 4000-5000 faces )
  • (drawing props and doing other things)

Admittedly, my tiles are heavy. But frankly, it doesn’t add up – Getting 4 times more actors on-screen would just bring up drawing cost to 15%(1) (never mind frame interpolation), so still much less than tile rendering on the side of GL performance.

An experiment

I had a feeling that drawing large triangles took much longer than drawing small ones – so I iterated with basic square tiles (2 faces, 4 vertices) – here’s the result:

  • 9% – GLDrawElements for terrain (300 tiles, about 18400 faces)
  • 34% – Frame interpolation
  • 7.6% – GLDrawElements for actors (20-25 on-screen, about 4000-5000 faces )
  • (drawing props and doing other things)

In this configuration the game runs all nice and smooth – it’s not just that the tiles had many faces – faces for a single tile covered each other as well.

Conclusion

Rendering more pixels slows down rendering more than rendering more faces – With an isometric view, simple tiles and detailed characters can be a good combination.

(1) The way I evaluate this is so so – multiplying percentages this way isn’t maths

So far I’ve had only a couple of actors on my game board. Granted I might not need more than 6 to 12 actors onscreen at the same time to bring some excitement, There are additional factors to consider:

  • It is better to be able to process interactions for a large number of characters, on and off-screen.
  • It may be simpler to be able to add all character for a game level right away, rather than having to generate NPCs on the fly.

I’m running a mere 100 actors now and the game is already crawling on our iTouch.

For now the rendering overhead is very low (all additional actors are just rectangles). With the help of instruments, I reduced several overheads related to game logic – I didn’t get a smooth running game again until clearing everything.

  • Actors need to find about nearby actors while idle. This allows an actor to determine what to do next using a ReflexMap class.
    • Added bound checks to reduce the overhead of finding nearby actors.
    • Removed unnecessary messaging involving type casting and type inference.
    • Removed all code associated with actors picking objects. This code may need to be optimised later but for now it is simply unused.
  • Disable debug related rendering. Just this uses 18% of the processing time (drawing strings over the 3D view using Quartz.
  • After applying the above, the game was still running pretty slow. So I randomly avoid deciding a new action for idle actors, skipping 7 frames out of 8. At this rate, actors still respond reasonably fast.
  • Remove all NSLog calls from the run loop. Don’t even think about profiling or optimizing while generating console output.

After this, the game is running fairly smoothly, although not at a great frame rate (16fps).

I further optimized AffectMap. This complements ReflexMap – affect map determines what happens to an actor (e.g. getting hit) versus what they choose to do. I used bounds checking with a shorter range (effects are always direct). Running AffectMap is still twice as expensive as ReflexMap, because I can’t skip any frames.

Now I can push to 24fps and things look reasonably smooth. A hundred actors on stage, all live and interacting even when off-screen, doesn’t seem too bad.

Having a look at the current snapshot, it’s not difficult to guess more efforts will need to be dedicated to optimizing -this app is drawing next to nothing still.

How big can a game board be? I have currently two limiting factors:

  • Memory. I crash my iTouch with a Board bigger than 750×750
  • Loading/Generation Time. Currently, with a 750×750 Board, this takes seconds.

Surely we can make a much larger board. It all depends how we store the data. For now I use a flat 2 dimensional array, with each cell including the following information:

  • terrain (NSString*)
  • prop (NSString*)
  • solid state (BOOL, prevents player & NPCs from occupying a cell)
  • height (float, allows offsetting tiles and props vertically)

Since each cell is also an object, I use 4*3+1+4 = 17 bytes per cell (assuming an atomic ‘BOOL’ actually occupies a single byte and counting the cell object reference). I could:

  • encode terrain and prop as byte values (2)
  • encode height as a short (2)
  • use C arrays (I’ll probably need to do this anyway to reduce the time spent allocating memory for the board), saving the cost of a cell object reference
  • Do away with the solid state (that could be implicit to terrain)

The resulting allocation would be 4 bytes per cell, for times less than the current allocation. Let’s not sweat it, that’s just a board twice as large.

I consider that a ‘board size’ factor between 5 and 100 will potentially impact design opportunities – in other words, the board has to be vastly wider to change the way I think about my design.

RLE compressing terrains and storing props as a list (versus allocating space for a single prop within each cell) may be much more effective in some cases. It’s worth noting that both methods somehow imply reduced variety in the game board. If larger means emptier, is it really worth it?

Let’s keep in mind that, to create a massive game board, we need either to slice it (can’t hold a complete board into memory) or… …approach the problem in a quite different way.

What’s in it for a game?

The speed of my player character was originally 5 pix / frame. I’m not sure about the frame rate – just OK actually. At 5 pix per frame with 32×16 tiles, it takes 15 seconds to cross a 25×25 board. That’s a really small board. I’d like a board big enough for exploring. Granted a game isn’t all about hanging around and visiting places.

A 500×500 board is 20 times wider, So it takes a round 20×15 = 300 seconds to cross it. Just about 5 minutes. But I feel 5pix/frame is too stately, so make it 8 pix per frame. Now we can cross the board in about 3 minutes.

OK then, going ‘around the world’ is one thing. How long would it take to explore all of the board’s corners? 9 minutes? I don’t think so. If we do it ‘screen by screen’, With 30 tiles on the height of a single screen, we have 50/3 ~ 15 rows to check, making about 15×3 = 45 minutes.

Ideally, I’d like to have a much larger board. But for now, I’ll worry more about how I populate the board, and less about how big it can be.