Skip to content

Archive

Tag: Memory

Don’t retain anything unless you must

Regarding reference counting, this is the idea I’d like a developer to at least consider. For several reasons.

  • A dangling pointer / weak reference needn’t be evil. During development hitting a dangling pointer is better than preventing an object to deallocate when it should. An object that exceeds it’s intended lifetime can behave in undesirable, unpredictable ways. All it takes to detect an invalid object is turning NSZombieEnabled on.
  • Everything retained needs to be released. Retain less = less work.
  • If the sums don’t add up (over-released/over-retained object) it’s easier to work out what’s wrong when the number of objects involved is small (e.g. 1, 2 or 3)
  • Potentially any object that retains a target will increase the target’s lifetime; this a priori translates into increased memory usage.
  • Whenever retaining an object we risk creating a cyclic reference. Of course ‘we know what we’re doing’ (and cyclic dependencies can be removed) but isn’t it just easier to avoid getting too many of these.

Unwanted objects that remain inside the runtime after they should have deallocated will harm you. The more your system is dynamic (e.g. game, simulation) the more these objects are likely to generate functional bugs that are hard to figure.

I read a little about automated reference counting (ARC) which we are getting in iOS5. If I understand correctly (reading here) the central idea of this article will translate to ‘don’t abuse strong references’. Now that I more or less get it I look forward to using ARC but I guess I’ll be waiting for another 6 months or so, being a happy laggard.

A quick introduction

Reference counting approaches memory management indirectly, using concurrent ownership:

  • Take ownership of an object by retaining it
  • Relinquish ownership by releasing the object.
  • When all owners of an object have released it, the object is deallocated.

indirect : we don’t explicitly deallocate the object.
concurrent: several objects can simultaneously retain the same target.

The basic rules are covered in many places, like here and here and from the horse’s mouth, here.

Reference counting is efficient, error prone and occasionally awkward.

More efficient than garbage collection: objects get deallocated as soon as their reference count reaches zero, whereas GC is heuristic and may cause your program to slow down unexpectedly while it’s doing its thing.

Error prone – programmers need to pair release/retain statements (either directly or indirectly). Mismatched statements cause a program to leak or crash. Additionally reference counting is hackable; it is easy to traffic the sums (either accidentally or by design) and obtain a valid program that handles memory correctly, while violating reference counting rules.

Awkward, notably when we know beforehand that we would like to deallocate a well defined subset of the runtime graph. A typical example is when you start a kind of ‘session’, allocate any number of objects in the course of the session and wish to deallocate all the objects at the end of the session. In such cases opting out may be somewhat short sighted yet remains attractive.

A design idea

There is a principle which I find rather productive: instead of thinking about whether an object should retain X or not, consider X then try to think about an object ‘up the runtime graph’ that should own X. Often there is an object Y such that, if Y deallocates, X should also deallocate. It could be the parent of X or maybe another object up the chain.

Now, if there is only one such object Y, then you don’t need to retain X anywhere else. You can even assert the retain count to ensure that X is unambiguously managed by Y.

Anti-patterns?

There are little recipes around (e.g. here and here) that you can use to ‘ease the pain of memory management’.  From the point of ownership these recipes work the same way GC does: easy way out of memory management issues, hard into functional bugs with all the enticing prospects of a muddle-through approach.

One point these approaches have in common is ‘if in doubt, retain’. I’m OK with that as long as I know (beyond reasonable doubt) that keeping the target alive won’t generate unwanted behavior. If the target is an observer that receives and processes notifications… …then if in doubt, don’t retain. A clean, happy crash will provide the decision point where you can say:

  • ‘Yea, this object should still be alive at this point’ (in which case maybe something else should have retained it) or…
  • ‘No, this object is dead and well dead, we should have sent a death note’

Additionally these approaches look incomplete. You need to use class extensions if you want to declare everything as a property without exposing all your ivars.

Weak references

An unretained field is a ‘weak reference’. At least in a first approach, the use of weak references is encouraged in a number of situations:

  • Backward references from a child to a parent
  • Listener sets. See a straightforward application here.
  • Same type objects cross-referencing each other.
  • Any situation where you feel unsure whether to claim ownership (retain) or not.

Implementation details

Maybe for historical reasons there are two approaches to enforcing reference counting in objective C:

  • The non intrusive approach revolves around tagging using properties and indirect access using self.x = . Although this approach looks theoretically better and safer there are practical details of how it is done in Objective C which I often find off-putting. For one I like to not declare properties until I want to expose public state, and I’m not used to class extensions, thus find myself unwilling to add class extensions to my .m files.
  • An older approach revolves around the [release] and [retain] statements. The advantage (easier reviewing/debugging) and inconvenience (intrusive approach leading to somewhat cluttered code) are the explicit way in which things are done. This leads to a weird situation because it makes it more likely that bugs are introduced while making the same bugs easier to fix.

Note about [autorelease]

[autorelease] is very convenient and helps avoid errors in many situations. Sadly enough when an error does occur and [autorelease] is the lucky guy that causes a target to deallocate, we get very little debugging information because [autorelease] doesn’t take effect until we exit the frame.

So I try to limit its usage to where it’s unavoidable.

What is covered elsewhere (or should be)

  • Cocoa collections (NSArray, NSSet, NSDictionary) retain all their elements. This can be a hindrance in some cases, but you can configure the underlying, toll-free bridged counterparts, as demonstrated here.
  • There are various approaches to notifying stakeholders when an object gets deallocated. I will try to write a quick article about an approach I find useful when implementing observer schemes.

On a mobile platform, memory is to be considered seriously. Unlike a desktop PC where we have 2, 4 (8?) GB to live on, the general feeling about the earlier iPhone and iTouch is that we have around 20MB to play with. Tiny sandbox.

The bottom line

If you start with one of the iPhone OS templates/examples, you likely have a view controller with an empty method like this:

- (void)didReceiveMemoryWarning {}

The bottom line is that it seems safer to arrange that your app does not trigger a memory warning under normal usage conditions. I haven’t really seen this written anywhere, just commonsense, where normal conditions would mean:

  • You haven’t just rebooted the device. If you’ve just rebooted, you are in ‘perfect conditions’. At the moment I tested my app and it triggers the memory warning 12 times or so. If I reboot and perform the same test, the warning never gets called.
  • You are not testing right after another app has crashed. Maybe this makes no difference, but it feels like abnormal conditions.
  • Your test involves normal gameplay, and lasts for the maximum expected duration for a play session.
  • You are testing on the tightest configuration possible (e.g. iTouch 2nd gen, 8GB, running OS4)

You can ‘recover’ from a memory warning – maybe you just need to free enough memory, and since my app did not crash after receiving 12 memory warnings, you might even think it’s OK. Now, do you really want to be running on the edge of disaster? I don’t think so.

What Instruments say

When I get a memory warning, instruments are displaying memory as follows:

  • 30 MB (process allocation) in Activity Monitor
  • 20 MB (live bytes) used by my process in ObjectAlloc

These values are consistent with the rumored 20MB, and also consistent with repeated complaints that different instruments provide different values. ObjectAlloc measures  live bytes – memory used by an app; activity monitor measures (a variable amount of) memory allocated to a process. I tend to give more consideration to what ObjectAlloc says, because this is easier to understand, easier to predict and easier to control. There are additional conditions that will cause a process to use more memory:

  • If we keep allocating / deallocating blocks of various sizes, we fragment memory and promote semi-transient waste.
  • It may be nearly impossible not to leak any memory over time (especially if we need a system library that leaks memory). Leaked memory is lost forever, and won’t show in ObjectAlloc.

So what do we need to do?

I am targeting 15MB max as peak allocation in ObjectAlloc. I think it should be possible to play my game end to end without exceeding this value and use this as ‘maximum expected play session duration’. I have fixed all known and known to be fixable memory leaks in my product; I want to have a go and try to find out if I can work around leaks in system libraries. I don’t expect to be able to fix all of them.

I have implemented didReceiveMemoryWarning – so I have a mechanism to release unused resources. Once we’ve done that, we want to release resources more often, so we stay under the maximum we choose (in my case 15MB). If we’re really tight, we might want to arrange assets to be lighter (e.g. use models with a lower poly count if we’re talking 3D apps)

A quick case study

My game is a 3D game. the bundle is about 17MB worth of data, including 1MB worth of code. I’m not using textures; I’m loading models on the fly, so I have no loading times.

I have two problems:

  • Except procedurally generated geometry, my allocations never expire unless a warning triggers the release mechanism
  • Most resources reload almost immediately. Why? because I test bounds for visibility (and other things). However bounds are not stored in separate files. In fact bounds are evaluated when an object loads. While the processing overhead is negligible, bounding boxes are released along with geometry.

If it was to do all over again, I might just store bounds separately. This might help deferring loading an object, so my ‘loading time’ for a stage would drop from about 1 second to nearly nothing – or maybe not – at the moment it doesn’t even seem asset loading is the main cost in scene setup.

As it stands, I am not doing it all over again; instead I want to retain bounds when I release assets, and release assets in a timely manner. While I’m doing this I also want to never release geometry that needs be displayed at the next frame. My resource release mechanism doesn’t know about that.

How to know when a resource is ready to be released?

In my rendering loop, there is a procedure that checks whether objects are likely to enter the viewing range. This is a quick and dirty way to skim the complete list of objects in the 3D scene. This procedure gets called less often than a regular visibility check – so the renderer needn’t check all the scene all the time (there are significantly better ways to code this).

So my first idea is, use a skip count. If an object hasn’t entered the near range for a while, we’re OK to release. I wouldn’t set the skip count too low – no need to release resources to often, as whenever we reallocate them, we actually incur memory (fragmentation) and performance (loading) overheads.

It looks good enough, nothing super-clever but OK. But… there is a catch.

Surely geometry is shared – in my implementation, Element is just a token representing an instance of an Object3D – shared geometry reused over and over. That’s where reference count comes in play, right? Whenever an Element instance determines that it doesn’t need its geometry, it releases its instance of Object3D and when the reference count reaches zero, Object3D gets deallocated, correct?

geometry is also mapped from file paths. So when a new element is created, we get the model from the map. Needless to say the map is also retaining each object once. As it is, all elements may release their instance of a geometry object, the map won’t know about it and we still haven’t got the geometry to deallocate.

There are ways around this problem, but we want to do things without wasting either time or too much memory, or changing the existing structures. Here’s the plan:

  1. Each geometry node will have used flag. Not a counter, just a boolean.
  2. Initially the used flag is clear (false).
  3. Whenever we update the display list, every object that’s nearby sets the used flag.
  4. From time to time (every 1 to 5 minutes, say), we iterate the list of values in the map, then…
  5. We release all models for which the used flag is clear – these models haven’t been used in a while, likely we won’t need them soon.
  6. We clear the flag for all remaining models.

I can generalize this a little bit checking accesses from Element3D.

    Implementation?

    I tried to implement the above quickly. Maybe a little too quickly. Ghosting bounds (retaining the bounds after geometry has been unloaded) went fine. Trying to automate memory release (based on the used flag) looked good in theory; in practice, it didn’t really work this time.

    So I did the simple thing. Force release (using the existing mechanism for handling memory warnings) manually when reaching a new stage in the game or performing a key action. The results are displayed below.

    Test Results

    I tested on an iPhone 3GS, mainly to save time. I use more memory on the 3GS because the rendering surface is bigger than the display (so I can antialias). The test glitched a little because I’d introduced a bug that failed the release mechanism between stage 3 and 4 in my game.

    • Startup: 0.6MB
    • Stage 1: 3.8-4.3MB
    • Force release as much as possible: 1.76MB
    • Stage 2: 4.67-10.06 (until Klinnburg village); 5.5 after picking the bread; climb to 12-13 MB
    • Stage 3: 13 – 20.6 (but fails to show chapter 3 notice and release memory when starting)
    • After releasing resources: 5.0mb
    • Stage 4: 12.18-16MB

    Conclusion

    The results of the test suggest that the resource release mechanism is somewhat inefficient. It looks like, if I had just 6 stages like the ones above, I’d hit the 20MB limit.

    Meanwhile, I don’t expect running into problems for this release. I don’t think my app leaked about 20MB in 40 minutes. I’ve done several passes to eliminate all leaks in the app code. One of the latest steps before submitting the app will be checking for leaks again, and looking into leaks generated by system libraries.

    Not everything that shows in a 3D view will earn you credit, especially when you get an amorphous, spiky, translucent mass of pixels.

    If you do, you may be having a ‘memory issue’. Let’s remind ourselves quickly how this is most likely to occur:

    1. Allocate memory with malloc() and call free(), but forget that another object is still using this memory block.
    2. Access memory beyond the bounds of your allocation.
    3. Attempt using an uninitialized section of a memory block.

    I feel tempted to make a kind of buffer class, so I can manage malloc-ated block. Or maybe a few buffer classes. Here’s why:

    • Reference counting needn’t be great, but it’s better than **nothing**. If we wrap our memory blocks and don’t leak the underlying pointers, the worse we can get is a zombie buffer, still much better than a wild card.
    • Unless we’re getting fanatical about performance (that will happen, eventually), we can afford wrapping allocations using objects.

    The bug I fixed today had been lying around for nearly three months. It was intermittent, didn’t cause crashes, and sometimes got so quiet that I thought I would eventually forget about it.

    So I did a minor improvement to my vertex color interpolator and it came back, with the familiar spiky blobs.

    Cleaning the mess.

    I don’t use alpha just yet, but I have to use four components per color. So I could figure what rendered incorrectly by testing alpha. Abusively, this meant iterating meshes before rendering them (!)

    Every mesh had about 15 to 25% garbage at the end of the matching memory block, so I could tell that either (a) I rendered more than I had allocated or (b) I had initialized less than I intended to use.

    Just looking at the output wouldn’t tell me that my models were incomplete. Because the models in this case are… …procedural grass. How obvious can it be that we’re missing every other blade?

    This falls into category 3. Getting increments wrong isn’t rare – so every time I wanted 7 blades of grass, I generated only 6.

    Lesson learned?

    Maybe… …maybe not. Here’s what I think:

    • Feels like a typical case where writing a test and fully validating the output would help.
    • Part of the danger when manipulating sensitive code is that it can also fail idiotically. malloc() makes me think about ways I can crash my game, leak memory, etc… but finally I just somehow missed out on filling up a memory block till it’s all done and done.
    • Next time I see the tell-tale signs of mis-managing memory, I really shouldn’t wait. Not initializing memory can also cause a program to crash. Plus I’ve been dragging along wondering whether I would get a repro on this. And yea. Sure, I finally did. It wasn’t even on my list anymore.

    Status update

    I’m getting a new app icon on monday. Will try to submit sometimes next week. Dev stack is clean today. I calibrated/cleaned up sound today. Cleaned up the artwork stack, then regenerated small chores. A lot of instrumentation on the roadmap for today (as in, Sunday).

    Getting there. Sigh.

    Because I’ve been spending a lot of time working with ‘memory-safe’ platforms, memory management in practice is a little new to me. Today I’ll be analyzing my current project and looking at ways we can use to fix memory related issues.

    Getting help from Instruments?

    The latest version of instruments (2.1 at the time of this writing) provides 3 tools for monitoring memory:

    • Object allocations
    • Leaks
    • Zombies (only in the simulator)

    I fire up my game using object allocations.

    Here’s the first things I notice:

    • There is an object alloc graph. On entering the game menu this grows to 1.39Mb then gets stable. That’s a good sign.
    • Upon entering the game usage grows to 5.55MB. But then…
    • Memory usage keeps growing at around 0.5MB/minute afterwards. That doesn’t look too good.

    The instrument is monitoring allocations by type, including:

    • object allocations per class
    • malloc calls, classified per byte size (malloc 8 bytes, malloc 24.00KB etc…)

    In the main view, the tool lets us plot either all, or just allocations for a given class. Neat. Pressing a little blue arrow on the right of any allocation lets us identify matching method calls along the instrument’s timeline. For example:

    1. I select ‘Vector’ because the number of vectors allocated is growing over time
    2. I notice that Vector allocations are growing linearly
    3. Scrubbing the timeline, I find a couple of calls responsible for eating up memory, in my case:
      • [PhotographyManager apply]
      • [TerrainUtilities move:toThisWalkableSurface:withMaxClimb]

    This is something I want to fix:

    • Memory allocated for vectors is never released.
    • Allocating memory from the game loop just isn’t good. Allocating and releasing takes processing time.

    If a function needs a ‘vector’ to do some calculations inside the game loop,  it’s probably a good idea to allocate a vector ‘once and for all’ so we can use it again and again.

    It’s not just when an app is idle… + digging up other unwanted allocations

    I iterated the test while getting my avatar to walk around. That uncovered a few more issues. Some allocation events aren’t just happening that often (for one thing, game logic is usually staggered; in my case, it most definitely is) so they won’t just ‘shine through’ and I quickly learned another way to discover unwanted allocation events:

    1. Have the app do whatever is suspected to cause unwanted allocations.
    2. Pause the app
    3. Click the gray arrow near ‘all allocations’
    4. Scrub the timeline and check the allocation history
    5. Try toggle between ‘All objects created’ and ‘Created & Still living’

    If there’s really nothing allocated, then the history view doesn’t change while scrubbing, it just sticks on whatever was the latest allocation.

    I use (5) to check what my app does versus the system. I really find it easier to not allocate anything inside the game loop, rather than allocating and deallocating stuff. This way, looking at ‘All objects created’ should only show system level allocations while scrubbing the history through the game loop. It’s easier for me to allocate a few buffers and never release them, versus allocating/deallocating stuff all the time.

    What’s the benefit?

    To make things completely obvious, let’s restate why doing this is not only useful, but altogether necessary:

    • If an app keeps allocating memory, it’s heading towards running out of it, and crashing.
    • If the game loop never allocates memory, the object allocation monitor will display a flat curve.
    • If the curve isn’t flat, or scrubbing through displays allocations other than system level, we have a problem.

    Why use the profiler?

    Another way we can investigate memory issues is… …looking at the code. While writing my code, I tried to avoid allocating stuff from the game loop. I did in part to avoid performance overheads, in part because I didn’t want to risk having too many memory issues to resolve.

    Using a profiler to check for memory issues saved my time. I caught around 12 issues in 3 hours. Whatever I fixed is, likely, what really needed to be fixed. That doesn’t mean all my code is memory safe. Way to go.

    But then, the unsafe code isn’t inside the game loop. So far so good.