Cocoa for Scientists (Part XXI): Multi-Threading Your App with NSOperation and NSOperationQueue

Author Drew McCormack
Web Site: www.mentalfaculty.com

There has been some discussion about the future of parallelism on MacResearch recently, and whether the current course set by Intel and others toward multi-core systems is the best approach for the long term. Like it or not, industry players — including Apple — are following Intel’s lead and designing their software and hardware around the shared memory model. Mac OS X 10.5 (Leopard) introduces some significant improvements in the realm of multi-threading, including the Cocoa classes NSOperation and NSOperationQueue. Today, I am going to show you how these classes can be incorporated painlessly into your applications to provide task-level parallelism for expensive calculations.

Threading Made Easy

Before Leopard, if you wanted to add threading to your app for the purpose of improving performance on multi-core systems, you had to take care of everything yourself, from spawning and killing threads, to scheduling tasks to achieve good load balancing and correct execution order. The NSOperation and NSOperationQueue classes alleviate all of that, allowing you to simply define your tasks, set any dependencies that exist, and fire them off. The NSOperationQueue class takes care of all the threading issues, from determining how many threads should be spawned, to ensuring your tasks — which each correspond to an instance of NSOperation — are initiated in the appropriate order.

Expression Tree Very Pretty...

To demonstrate how you use these two classes, I’m going to create a simple engine for evaluating arbitrary matrix expressions. I will define some classes that represent floating-point matrices, and some basic operations such as multiplication and addition. Instances of these classes can be combined to form an expression tree, which can be evaluated to give the value of the expression. You can download the source code for this tutorial and follow along, or try out the finished application.

The root class of the expression inheritance tree will be MatrixExpression. The leaves of the tree will represent literal matrices, and belong to the class LiteralMatrixExpression. Operators, such as multiplication and addition, will be of the class CompoundMatrixExpression, and will represent branch nodes in the expression tree. Below is a class diagram showing the matrix expression classes.

Matrix Expression Class Diagram.Matrix Expression Class Diagram.

The application we are going to build, Operative, will only evaluate a single matrix expression:

A.A.A + B.(A.B + B)

(The points delineate matrix multiplication.) This expression could be represented by the following expression tree.

Matrix Expression Tree.Matrix Expression Tree.

All matrix expression classes will derive from the NSOperation class, which will allow them to be queued up for evaluation. Dependencies will be added between parent expressions, such as a multiplication operator, and its children. This will ensure that the leaves are evaluated first, followed by their parents, and so forth, until the complete expression has been evaluated. Other than setting these dependencies, no other information will be passed to the NSOperationQueue about how things should be evaluated, and in what order. We will leave all of that to Cocoa.

Expression Classes

The base expression class is MatrixExpression. Here is its interface:

#import <Cocoa/Cocoa.h>

@interface MatrixExpression : NSOperation {
    NSData *elements;
}

@property (readonly, assign) NSData *elements;
@property (readonly, assign) NSUInteger numberOfRows;
@property (readonly, assign) NSUInteger numberOfColumns;

@end

As you can see, I am using the new properties of Objective-C 2.0, which were also introduced in Leopard. In fact, I am going to adopt Obj-C 2.0 completely, and use garbage collection. If you need to know more about this, I recommend taking a look in Apple’s documentation, and reading through Scott Stevenson’s blog postings on garbage collection and properties.

A MatrixExpression has a number of properties, namely the number of rows and columns in the matrix, and the values of the elements of the matrix. MatrixExpression is an abstract class, because we cannot provide any useful implementation for two of the properties — numberOfColumns and numberOfRows. The MatrixExpression.m file reflects this

#import "MatrixExpression.h"

@implementation MatrixExpression

@dynamic numberOfRows;
@dynamic numberOfColumns;
@synthesize elements;

@end

The @synthesize keyword indicates to the compiler that accessor methods setElements: and elements should be generated for the instance variable elements. We don’t have instance variables for numberOfColumns and numberOfRows, so we use the @dynamic keyword to tell the compiler that implementations will be supplied elsewhere. (We will add implementations in the subclasses.)

A literal matrix is represented by the LiteralMatrixExpression class.

#import <Cocoa/Cocoa.h>
#import "MatrixExpression.h"

@interface LiteralMatrixExpression : MatrixExpression {
    NSUInteger numberOfRows;
    NSUInteger numberOfColumns;
}

-(id)initWithFloatElementData:(NSData *)elements numberOfRows:(NSUInteger)numberOfRows numberOfColumns:(NSUInteger)numberOfColumns;

@end

Here we do find instance variables to store the number of rows and columns of the matrix, along with an initializer to set them. The implementation file (LiteralMatrixExpression.m) overrides some of the properties so that they have a private setter method.

@interface LiteralMatrixExpression ()
@property (readwrite, assign) NSUInteger numberOfRows;
@property (readwrite, assign) NSUInteger numberOfColumns;
@property (readwrite, assign) NSData *elements;
@end

@implementation LiteralMatrixExpression

@synthesize numberOfRows;
@synthesize numberOfColumns;
@synthesize elements;

-(id)initWithFloatElementData:(NSData *)newElements numberOfRows:(NSUInteger)newNumberOfRows numberOfColumns:(NSUInteger)newNumberOfColumns 
{
    if ( self = [super init] ) {
        self.numberOfRows = newNumberOfRows;
        self.numberOfColumns = newNumberOfColumns;
        self.elements = newElements;
    }
    return self;
}

@end

Overriding of properties in this way is well discussed in Scott’s blog post; in any case, it is not really necessary to understand this in detail to follow the remainder of the tutorial.

Compound expressions are ones that operate on other expressions, like a multiplication operator. To represent these, we introduce the CompoundMatrixExpression class.

#import <Cocoa/Cocoa.h>
#import "MatrixExpression.h"

@interface CompoundMatrixExpression : MatrixExpression {
    MatrixExpression *leftExpression;
    MatrixExpression *rightExpression;
}

@property (readonly, assign) MatrixExpression *leftExpression;
@property (readonly, assign) MatrixExpression *rightExpression;

-(id)initWithLeftExpression:(MatrixExpression *)newLeftExpression andRightExpression:(MatrixExpression *)newRightExpression;

@end

Just as the LiteralMatrixExpression, it subclasses MatrixExpression. The initializer takes a left and a right expression. (We make the simplifying assumption that all operators are binary in this app.) The CompoundMatrixExpression.m file stores these expressions in instance variables, and also adds a dependency to each with the addDependency: method inherited from NSOperation:

#import "CompoundMatrixExpression.h"

// Private
@interface CompoundMatrixExpression ()

@property (readwrite, assign) MatrixExpression *leftExpression;
@property (readwrite, assign) MatrixExpression *rightExpression;

@end

@implementation CompoundMatrixExpression

@synthesize leftExpression;
@synthesize rightExpression;

-(id)initWithLeftExpression:(MatrixExpression *)newLeftExpression andRightExpression:(MatrixExpression *)newRightExpression {
    if ( self = [super init] ) {
        self.leftExpression = newLeftExpression;
        self.rightExpression = newRightExpression;
        [self addDependency:newLeftExpression];
        [self addDependency:newRightExpression];
    }
    return self;
}

@end

MultiplyMatrixExpression and AddMatrixExpression are two concrete subclasses of CompoundMatrixExpression. They don’t define any new properties, but they do implement the operation they represent. For example, the MultiplyMatrixExpression.h file is practically empty

#import <Cocoa/Cocoa.h>
#import "CompoundMatrixExpression.h"

@interface MultiplyMatrixExpression : CompoundMatrixExpression {
}
@end

The MultiplyMatrixExpression.m does all the work

#import "MultiplyMatrixExpression.h"
#import <Accelerate/Accelerate.h>


@interface MultiplyMatrixExpression ()

@property (readwrite, assign) NSData *elements;

@end


@implementation MultiplyMatrixExpression

@synthesize elements;

-(NSUInteger)numberOfRows {
    return self.leftExpression.numberOfRows;
}

-(NSUInteger)numberOfColumns {
    return self.rightExpression.numberOfColumns;
}

-(void)main {
    if ( self.isCancelled ) return;
    NSMutableData *newElements = [NSMutableData dataWithLength:(self.numberOfRows*self.numberOfColumns*sizeof(float))];
    cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, self.numberOfRows, self.numberOfColumns, self.rightExpression.numberOfRows, 1.0f, leftExpression.elements.bytes, leftExpression.numberOfColumns, rightExpression.elements.bytes, rightExpression.numberOfColumns, 0.0f, newElements.mutableBytes, self.numberOfColumns);
    self.elements = newElements;
}

@end

Accessor methods are defined to return the number of rows and columns based on the sub-expressions, and a main method is included to actually evaluate the result of the matrix multiplication. main comes from NSOperation and should be overridden to perform the task at hand. In this case, a BLAS routine from the Accelerate framework has been used to actually carry out the task of multiplying the left expression by the right expression, and putting the result in the elements variable.

Note also that a check is made to see if isCancelled is YES. The reason for this is that you are responsible for canceling any tasks when an operation is cancelled. You should check isCancelled wherever convenient, and cleanup and exit if it evaluates true.

Controller and View

That’s a quick tour of the expression classes used for the calculations, but how do you put it all together into an expression tree, and then evaluate the tree in parallel? We’ll do that in the Controller class. Here’s Controller.h:

#import <Cocoa/Cocoa.h>

@class MatrixExpression;

@interface Controller : NSObject {
    BOOL isRunning;
    NSOperationQueue *operationQueue;
    NSTimeInterval time;
    NSDate *startDate;
    NSTimer *timer;
    NSUInteger numberOfCores;
    MatrixExpression *completeExpression;
    IBOutlet NSWindow *window;
}

@property (readwrite, assign) BOOL isRunning;
@property (readwrite, assign) NSTimeInterval time;
@property (readwrite, assign) NSUInteger numberOfCores;

-(IBAction)toggleRunning:(id)sender;

@end

We won’t go into any detail here, because most of it is related to the UI, and has little to do with NSOperation. Controller.m includes the interesting code:

#import "Controller.h"
#import "MatrixExpression.h"
#import "MultiplyMatrixExpression.h"
#import "AddMatrixExpression.h"
#import "LiteralMatrixExpression.h"


@implementation Controller

@synthesize isRunning;
@synthesize time;
@synthesize numberOfCores;

+(void)initialize {
    setenv("VECLIB_MAXIMUM_THREADS", "1", 1);
}

-(id)init
{
    self = [super init];
    if (self != nil) {
        self.numberOfCores = 1;
    }
    return self;
}

-(void)toggleRunning:(id)sender 
{
    // If already running, cancel and exit
    if ( ! self.isRunning ) {
        [completeExpression removeObserver:self forKeyPath:@"isFinished"];
        [operationQueue cancelAllOperations];
        [timer invalidate];
        operationQueue = nil;
        completeExpression = nil;
        self.isRunning = NO;
        return;
    }

    // If not running, start up the operation queue
    self.isRunning = YES;
    startDate = [NSDate date];
    self.time = 0.0;
    timer = [NSTimer scheduledTimerWithTimeInterval:1.0 target:self selector:@selector(updateTime) userInfo:nil repeats:YES];

    // Begin by initializing A and B
    NSUInteger n = 3000;
    NSMutableData *aData = [NSMutableData dataWithLength:n*n*sizeof(float)];
    NSMutableData *bData = [NSMutableData dataWithLength:n*n*sizeof(float)];
    float *aValues = aData.mutableBytes;
    float *bValues = bData.mutableBytes;
    NSUInteger i;
    for ( i = 0; i < n; ++i ) {
        aValues[i] = 0.1;
        bValues[i] = 0.2;
    }
    LiteralMatrixExpression *a = [[LiteralMatrixExpression alloc] initWithFloatElementData:aData numberOfRows:n numberOfColumns:n];
    LiteralMatrixExpression *b = [[LiteralMatrixExpression alloc] initWithFloatElementData:bData numberOfRows:n numberOfColumns:n];

    // Create an expression tree for A * A * A + B * ( A * B + B )
    MatrixExpression *aByA = [[MultiplyMatrixExpression alloc] initWithLeftExpression:a andRightExpression:a];
    MatrixExpression *leftTerm = [[MultiplyMatrixExpression alloc] initWithLeftExpression:aByA andRightExpression:a];
    MatrixExpression *aByB = [[MultiplyMatrixExpression alloc] initWithLeftExpression:a andRightExpression:b];
    MatrixExpression *aByBPlusB = [[AddMatrixExpression alloc] initWithLeftExpression:aByB andRightExpression:b];
    MatrixExpression *rightTerm = [[MultiplyMatrixExpression alloc] initWithLeftExpression:b andRightExpression:aByBPlusB];
    completeExpression = [[AddMatrixExpression alloc] initWithLeftExpression:leftTerm andRightExpression:rightTerm];

    // Add operations to a operation queue
    operationQueue = [NSOperationQueue new];
    [operationQueue setMaxConcurrentOperationCount:numberOfCores];
    [operationQueue addOperation:a];
    [operationQueue addOperation:b];
    [operationQueue addOperation:aByA];
    [operationQueue addOperation:leftTerm];
    [operationQueue addOperation:aByB];
    [operationQueue addOperation:aByBPlusB];
    [operationQueue addOperation:rightTerm];
    [operationQueue addOperation:completeExpression];

    // Observe the complete expression to see when it is done
    [completeExpression addObserver:self forKeyPath:@"isFinished" options:0 context:NULL];
}

-(void)observeValueForKeyPath:(NSString *)keyPath ofObject:(id)object change:(NSDictionary *)change context:(void *)context
{
    if ( [keyPath isEqual:@"isFinished"] && completeExpression == object ) {
        self.isRunning = NO;
        [completeExpression removeObserver:self forKeyPath:@"isFinished"];
        self.time = [[NSDate date] timeIntervalSinceDate:startDate];
        operationQueue = nil;
        completeExpression = nil;
        [timer invalidate];
    }
    else {
        [super observeValueForKeyPath:keyPath ofObject:object change:change context:context];
    }
}

-(void)updateTime 
{
    self.time = round([[NSDate date] timeIntervalSinceDate:startDate]);
}

@end

The toggleRunning: method has the meat of the calculation setup. It begins by creating two 3000x3000 matrices, a and b, and initializes them.

    // Begin by initializing A and B
    NSUInteger n = 3000;
    NSMutableData *aData = [NSMutableData dataWithLength:n*n*sizeof(float)];
    NSMutableData *bData = [NSMutableData dataWithLength:n*n*sizeof(float)];
    float *aValues = aData.mutableBytes;
    float *bValues = bData.mutableBytes;
    NSUInteger i;
    for ( i = 0; i < n; ++i ) {
        aValues[i] = 0.1;
        bValues[i] = 0.2;
    }
    LiteralMatrixExpression *a = [[LiteralMatrixExpression alloc] initWithFloatElementData:aData numberOfRows:n numberOfColumns:n];
    LiteralMatrixExpression *b = [[LiteralMatrixExpression alloc] initWithFloatElementData:bData numberOfRows:n numberOfColumns:n];

For each matrix, we create a LiteralMatrixExpression object. These objects are then used to build up the expression tree.

    // Create an expression tree for A * A * A + B * ( A * B + B )
    MatrixExpression *aByA = [[MultiplyMatrixExpression alloc] initWithLeftExpression:a andRightExpression:a];
    MatrixExpression *leftTerm = [[MultiplyMatrixExpression alloc] initWithLeftExpression:aByA andRightExpression:a];
    MatrixExpression *aByB = [[MultiplyMatrixExpression alloc] initWithLeftExpression:a andRightExpression:b];
    MatrixExpression *aByBPlusB = [[AddMatrixExpression alloc] initWithLeftExpression:aByB andRightExpression:b];
    MatrixExpression *rightTerm = [[MultiplyMatrixExpression alloc] initWithLeftExpression:b andRightExpression:aByBPlusB];
    completeExpression = [[AddMatrixExpression alloc] initWithLeftExpression:leftTerm andRightExpression:rightTerm];

The instance variable completeExpression represents the whole expression that we wish to evaluate. Now we add all of these expressions to an NSOperationQueue queue, which will run them all, taking care of dependencies and spawning threads as appropriate.

    // Add operations to a operation queue
    operationQueue = [NSOperationQueue new];
    [operationQueue setMaxConcurrentOperationCount:numberOfCores];
    [operationQueue addOperation:a];
    [operationQueue addOperation:b];
    [operationQueue addOperation:aByA];
    [operationQueue addOperation:leftTerm];
    [operationQueue addOperation:aByB];
    [operationQueue addOperation:aByBPlusB];
    [operationQueue addOperation:rightTerm];
    [operationQueue addOperation:completeExpression];

Note that you don’t have to start the queue calculating: it automatically starts when you give it an operation to work on. The method setMaxConcurrentOperationCount: has been used here to set the number of threads to use in the calculation, so that it can be controlled in the interface. Generally, you wouldn’t do this, and instead let NSOperationQueue decide how many threads should be spawned for the system it is running on.

The rest of Controller.m simply updates the user interface, so its left as an exercise for the user.

Tearing Down

You can now download the finished application and run a few tests. Test how long the calculation takes as you vary the number of threads. I was surprised to find that my 1.83GHz MacBook Pro is around twice as slow as my older 2GHz Powermac G5. (I had assumed they would be comparable, but those PPC chips are floating point monsters!)

Hopefully that gives you some insight into the workings of NSOperation and NSOperationQueue. They are pretty straightforward to use, and certainly simplify the problem of task-level multi-threading considerably. We also got a glimpse of some of the enhancements in Objective-C 2.0, including properties and garbage collection. All in all, Leopard has plenty of nice stuff for Scientific Cocoa developers.

AttachmentSize
OperativeSourceCode.zip75.74 KB
Operative.zip55.16 KB

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Back in the old days...

Great article Drew!
Regarding your G5 vs the MacBook Pro: it all brings the old stevenotes back to mind with the Phil Schiller Mac vs PC Photoshop shootouts, I guess they really were faster after all ;-)

great tutorial

Drew, this is a very nice tutorial (it seems I always say that every time you publish something, I should make that a keyboard shortcut!).
I like that you made everything subclasses of NSOperation, and made all the code so slim. This shows how powerful this NSOperation concept is.
Thanks!

So it would be a little

So it would be a little limited to have my task hierarchy dictated to me so presumably I can create instances of NSOperation to wrap my objects/tasks that can inherit from anything right?

Inherit from NSOperation

Yes. This tutorial was just to show one example of how you could use it. You should come up with classes for your particular problem.

Note that you can also use NSOperation without subclassing at all, by using delegate methods. This may also be a good solution for you.

Drew

---------------------------
Drew McCormack
http://www.maccoremac.com
http://www.macanics.net
http://www.macresearch.org

Setting stacksize in NSOperation family of classes?

With OS 10.5 Apple introduced a -setStackSize method for NSThreads, something that is very useful for scientific computation. But NSOperation threads appear to be restricted to small stacks. Is there a way to increase the stacksize for threads running in the NSOperationQueue?

Stacksize

I have never tried to do this, but you could probably do it by using non-concurrent operations, that is, by overriding the start method and creating your own threads, rather than leaving it to Cocoa.

---------------------------
Drew McCormack
http://www.maccoremac.com
http://www.macanics.net
http://www.macresearch.org

Thanks, that works.

Thanks, Drew, that works nicely. There appears to be a bit of a performance hit from manually creating threads this way, but they no longer crash due to insufficient stack space. (Apple's Threading Programming Guide says "letting operation objects create threads for you is often more efficient than doing it yourself. Operation queues work directly with the kernel ... this kernel support also extends to the creation of the threads themselves, which are often maintained in thread pools to reduce the startup costs associated with creating new threads.")

Brad

Missing graphic

Just a quick note:

The reference to the matrix expression class diagram in "Cocoa for Scientists (Part XXI)" is incorrect.

It is coded as...
http://www.macresearch.org/files/images/ExpressionClassDiagram.preview.png

but should be...
http://www.macresearch.org/files/images/ExpressionClassDiagram.png

How to make mac OS X work on

How to make mac OS X work on a computer other than mac? I have a toshiba m300 laptop with vista, and i have virtual box on it. I have a mac os x install dvd and it will not install on virtual box. Can someone please help me!

NSOperations with Eigen library

I am using the Eigen 2 library to make my code more readable (than BLAS) and, in trial apps, this has worked very well. However, I have been extremely disappointed in the multithreading performance I get even with Snow Leopard.

I have tried every variation of NSOperation, etc. that I can think of to no avail. Some threading variants are even *slower* than their single-thread counterparts.

Currently, I am investigating tcmalloc as a possible improvement. This compiles via makefile but I have not been able to get its output to work in an Xcode/Objective-C++ environment. Has anyone managed to do this -- esp., is there some sample code out there anywhere?

Thanks for any tips.

Michael P. McLaughlin