Cocoa for Scientists (Part IV): Good References
Author: Drew McCormack
Website: http://www.macanics.net
In the last tutorial, you learned how objects get created and destroyed. In this one, we are going to look at memory management in Cocoa, which pertains to when objects get created and destroyed. Cocoa uses a system of memory management called reference counting, which is simple enough, but does take some practice. For those that find such manual schemes primitive, you'll be happy to know that Objective-C 2.0, which will be ushered in by Leopard, includes garbage collection, which will eventually make much of what we learn today redundant. In the meantime, though, you will need to understand the reference counting approach, because nearly all code written with Cocoa uses it at this point in time, even if your own code does not.
Counting References
Reference counting is conceptually simple: Whenever an object needs to make use of another object, it increases a counter in the object by one, and whenever it no longer needs it, it decreases it by one. The counter is called the retain count in Cocoa, and resides in the NSObject class, meaning virtually all objects in a Cocoa program have a retain count. If an object's retain count drops to zero, meaning no other objects need it anymore, it self-destructs, calling the dealloc method.
There are many ways to increase the retain count of an object. The first is simply to call the alloc method. Whenever an object is allocated, it automatically has a retain count of one. If you do nothing more with that object, it will remain in existence until the program exits. Other methods for creating objects, such as copy and new, also leave the object with a retain count of one.
Often, some part of your program will want to make use of an object that it did not create itself. In such cases, you want to ensure that the object in question does not disappear when it is still needed. To achieve this, the object's retain method can be called, which increases the retain count by one. This should ensure that the object is not deallocated until it is no longer needed.
When an object is no longer needed, you call either the release method, or the autorelease method. Both methods decrease the retain count by one, but they differ in the timing of the change. release will decrease the retain count immediately, and if it becomes zero, the object will be immediately deallocated.
autorelease decreases the retain count at some later time. For now, we won't concern ourselves with exactly when, other than to say the object will stay around long enough for any immediately executing code to complete. When you use autorelease, you are saying you are finished with the object, but maybe some other part of the program would like to make use of it for a bit longer. An example of this is when you need to return an object from a method, and you don't need the object anymore. If you call release, it will disappear before you can return it; if you call autorelease, you have time to return the object, and the calling code has a chance to retain it if it wants to keep it around.
An Example
Here is a simple example, to clarify the discussion above:
#import <Foundation/Foundation.h>
int main() {
NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
NSMutableData *data1 = [[NSMutableData alloc] init];
NSLog(@"Retain count of data1 is %u", [data1 retainCount] ); // Here, data1 gets a retain count of 1
[data1 retain];
NSLog(@"Retain count of data1 is %u", [data1 retainCount] ); // data1 now has a retain count of 2
NSMutableData *data2 = [data1 copy];
NSLog(@"Retain count of data1 is %u", [data1 retainCount] ); // data1 remains on 2
NSLog(@"Retain count of data2 is %u", [data2 retainCount] ); // data2 gets retain count of 1
[data2 release]; // data2 retain count goes to 0
// dealloc is invoked
[data1 release];
NSLog(@"Retain count of data1 is %u", [data1 retainCount] ); // data1 retain count goes to 1
[data1 autorelease];
NSLog(@"Retain count of data1 is %u", [data1 retainCount] ); // data1 retain count remains on 1
// but added to autorelease pool
[pool release]; // data1 retain count goes to 0
// dealloc is invoked
return 0;
}
To compile this, copy it into a text file called retaintest.m, and enter this command:
gcc -ObjC retaintest.m -framework Foundation
Run it by entering:
./a.out
You should get output something like this:
2006-12-04 09:43:33.119 a.out[23394] Retain count of data1 is 1 2006-12-04 09:43:33.119 a.out[23394] Retain count of data1 is 2 2006-12-04 09:43:33.119 a.out[23394] Retain count of data1 is 2 2006-12-04 09:43:33.119 a.out[23394] Retain count of data2 is 1 2006-12-04 09:43:33.120 a.out[23394] Retain count of data1 is 1 2006-12-04 09:43:33.120 a.out[23394] Retain count of data1 is 1
This example basically runs through the various memory management methods, demonstrating their effects. The class used for this is NSMutableData, but this choice was for the most part arbitrary — the same rules apply to other Cocoa classes. Hopefully you can see the rules discussed above in action in this example. Most of the example is straightforward, but the part where copy is invoked may have you scratching your head. It is important to realize that copy does not change the retain count of the object being copied, but does create a new object — a copy of the original — and this new object gets a retain count of one.
The other aspect of this example that may perplex you is the NSAutoreleasePool. This is a class that takes care of releasing objects for which the autorelease method is invoked. In particular, just before the autorelease pool gets deallocated, it invokes release on any objects that have had autorelease called, which may result in them being deallocated, as is the case here for data1.
Reference Counting Rule of Thumb
With so many methods involved in Cocoa's reference counting system, you may be thinking it is awfully complex. It isn't really, if you stick to a simple convention: Whenever you call one of the retain count increasing methods, like alloc, retain, or copy, you should balance that with a call to a retain count decreasing method like release or autorelease. If you remember this simple rule, you shouldn't run into too much trouble.
Accessor Methods
Cocoa has a few conventions to help you stick to this rule of thumb. In particular, Cocoa uses accessor methods to control access to objects used by a class. Using these methods greatly reduces the risk of memory management bugs.
Accessor methods tend to come in pairs, with a getter and a setter. The getter is used to retrieve an object, and a setter sets the object. Here is a simple class interface to illustrate:
#import <Cocoa/Cocoa.h>
@interface Matrix : NSObject {
NSData *data;
}
-(id)initWithData:(NSData *)data;
-(void)setData:(NSData *)data;
-(NSData *)data;
@end
This could form the basis of a matrix class in a linear algebra framework. The methods setData: and data are the setter and getter, respectively, of the instance variable data. An implementation of this class might look like this:
@implementation Matrix
-(id)initWithData:(NSData *)newData {
if ( self = [super init] ) {
[self setData:newData];
}
return self;
}
-(void)dealloc {
[self setData:nil];
[super dealloc];
}
-(void)setData:(NSData *)newData {
if ( newData != data ) {
[data release];
data = [newData retain];
}
}
-(NSData *)data {
return data;
}
@end
There are actually many different ways to write accessor methods, and the example above represents just once such approach. The getter in this case is very simple, just returning the instance variable. The setter is more complex: it first checks that the new data is not the same as the old data. If it is, the setter does not do anything; if it is not the same, the old data is released, and the new data is retained.
Note also that the initializer calls the setData: accessor, rather than just setting the data instance variable directly. This is good practice, because it means that nearly all the retaining/releasing takes place in one part of the class: the accessors.
There is one other method that needs to include reference counting method invocations: the dealloc method. When a Matrix object gets deallocated, it needs to release its data, otherwise there will be a memory leak. dealloc can do this either by invoking release directly on the data instance variable, or by calling the setter with an argument of nil, as shown here. (In Objective-C, if you send a message to nil, such as will occur in this example, it is simply ignored. This can save you having to test each time if a variable is nil, which is a requirement in languages like C and Fortran.)
The naming scheme used for the accessors here is a Cocoa convention, and an important one. The setter gets the name set, followed by the instance variable name. The getter has the same name as the instance variable. This is not at all arbitrary, because Cocoa assumes you will use this convention, and many parts of the frameworks actually rely on it. If you choose a different approach, your code will simply not be able to take advantage of much of the Cocoa frameworks. Lesson: stick to the convention.
You are Autoreleased
Memory management in Objective-C may seem a bit primitive if you are used to languages like Java and Python, but it is considerably more advanced than in languages like C and Fortran, and it does have its advantages. For example, Cocoa programs don't suffer the same performance problems that have plagued Java in the past.
As I indicated at the beginning, things are about to change, and Leopard will usher in real garbage collection. Apple seem to have done a good job of it too, and all indications are that performance will not suffer. Nonetheless, if you plan to do any Cocoa development in the coming years, you do need to understand how the reference counting scheme works, because it will be with us for some time yet. Luckily, it isn't really that difficult, as long as you stick to the rules and conventions established above.
Next time we will move out of memory management, and look at some more advanced aspects of Object-Oriented Programming (OOP), like inheritance and polymorphism, before moving into Xcode and the world of GUIs. Stay tuned.




Comments
Accessor Naming
Nice article Drew. I had to figure all this out the hard way a year or so ago so can't wait for this to move on to the more advanced stuff!
I have a question regarding the the naming of accessor methods. I guess your hint to 'stick to the convention' is due to this thing called 'key-value-coding' (something that I have never used but would be interested in knowing more about).
However, I often have C arrays (or pointers to double) as my instance variables. E.g.
@interface Vector : NSObject {
double *data;
}
In this case you actually want to set and get elements of the array. E.g. my setter and getter methods are:
-setDataAtIndex:to:
-dataAtIndex:
How does this fit in with key-value-coding. Or is it the case that is doesn't matter because the instance variables are not objects?
Cheers, keep up the good work.
Key Value Coding
Key value coding is getting a bit ahead of ourselves, but since you asked: You are right that the convention is important because it forms the basis of key value coding. To answer your question, you can implement indexed accessor methods. You can read more about these in the Apple docs:
file:///Developer/ADC%20Reference%20Library/documentation/Cocoa/Conceptual/KeyValueCoding/index.html
In your case, I think I would take a different approach. These KVC methods are more to manage relationships between objects. It seems you probably have a buffer of data, and you probably don't want to take the performance hit to use KVC for this.
What I would do is use NSData or NSMutableData instead of a raw pointer. You then access the raw data in the object using the bytes or mutableBytes methods. The advantage of using a class is that you get memory management and other Cocoa stuff for free.
You could then add accessor methods for the NSData object. I would just use straight setData: and data methods, rather than the indexed methods.
Anyway, we'll get to KVC sooner or later.
Drew
---------------------------
Drew McCormack
http://www.macanics.net
http://www.macresearch.org
NSData accessors
When performance is not an issue (e.g. changes triggered by user editing), having accessors with index can be nice, though. The code can be more readable than getting the NSData buffer for every place you need to change just one value (and having to cast the type). Also, you get type checking by the compiler, and at runtime, you can check bounds before modifying a value.
I agree that it would probably be overkill to try to get something KVC-compliant (in particular, you are supposed to implement insertAtIndex and removeAtIndex methods, I believe). To be KVC-compliant, you may basically have to switch to an NSArray completely, and maybe back to a buffer for tasks where performance is critical.
NSData
I can't comment on the KVC ideas you've raised in respect to accessing indexes of array using -bytes methods, as I have no experience with KVC way of doing things. However, I seemed to have overlooked NSData and NSMutableData so I will read the docs for those.
At the moment I'm malloc-ing the required memory when instantiating my objects.
The pure object orientate folk among us may be shocked at my current method accessing array elements (see below), as I'm using the C side of ObjC for the computation and the Obj side of ObjC as a convenient way of constructing my model.
Example:
//create a 'Vector' object which here is simply 5 segments of sizeof(double) malloc-ed in memory
Vector *vec = [[Vector alloc] initWithElements:5];
//get a pointer to the first double element of the 'Vector' by using dataPtr method
double *vec_data = [vec dataPtr];
//you can the access the elements in vec, simply as a C Array.
vec_data[2] = 1.0;
double thirdElement = vec_data[2];
For me this was the only way I could proceed due to the complexity of the equation and algorithms I was trying to implement in code. The objective-C syntax, with it's emphasis on brackets makes scientific code hard to read and write. However, in other respects ObjC is the perfect scientific language because it's IS just C; you can do the above trick!
Do you have any suggestion on this, can it be improved?
NSData
I think what you are doing is fine, except I would use NSData or NSMutableData instead of a raw buffer. It is a bit safer in terms of memory management, and you can do other stuff, like adding it to a property list, or writing to file very easily. You end up using the data in much the same way.
NSMutableData *data = [NSMutableData dataWithCapacity:sizeof(double)*1000];
double *buf = [data mutableBytes];
buf[10] = 5.0;
double thirdElement = buf[2];
[data writeToFile:@"somefile" atomically:YES];
etc.
Drew
Agreed
Hi Charles,
I agree. The best is probably to use an NSData/NSMutableData to store the buffer. If you need to represent that in the user interface in a table view or something, you could just add the appropriate indexed KVC methods that extract the appropriate element in the data, and do any type conversion necessary. with this approach, you could have the best of both worlds.
Drew
Can't remember...
Thanks for this installment of the tutorial. Memory management still seems a bit un-intuitive. In the accessor methods part I didn't quite catch where the release corresponding to the retain in the setData method is?
numberWIthDouble
i was just revisting the tutorials... in regard to tut 3&4, when i create a NSNumber object with
i don't have to explicitly release it like i do if i use
right? i am under the impression that with numberWithDouble, the NSNumber gets deallocated sometime later in the future. i don't have to match it with a release command in dealloc
Convenience initializers
Hi Ken,
You are right:
numberWithDouble:is a convenience initializer. It allocs and inits the object, but it also autoreleases it:+(NSNumber *)numberWithDouble:(double)d { return [[[NSNumber alloc] initWithDouble:d] autorelease]; }There are quite a few classes that have these, including
NSArrayandNSString.If you use one of these methods, you don't have to release the object, because you didn't call retain/copy/alloc to begin with. You only need to call release when you call one of those methods yourself.
What you need to keep in mind is that if you want to keep the number around, you will need to retain it, otherwise it will disappear. And if you retain it, you are then responsible for releasing it.
Drew
---------------------------
Drew McCormack
http://www.macanics.net
http://www.macresearch.org
Method argument symbols
In the Interface for the method declarations
-(id)initWithData:(NSData *)data;
-(void)setData:(NSData *)data;
the symbol of the arguments is ``data''. Is there a reason why in the Implementation
-(id)initWithData:(NSData *)newData {
...
}
-(void)setData:(NSData *)newData {
...
}
the symbol used for the arguments changes to ``newData''?
Data
The variable names used in the declaration (header file) are not important, and need not be the same as in the implementation file. I used 'newData' in the implementation in order to avoid a name clash with the instance variable 'data'.
Drew
---------------------------
Drew McCormack
http://www.maccoremac.com
http://www.macanics.net
http://www.macresearch.org