optimising C/C++ code on G5s etc
Hi there everyone
I have just joined up on this "board" and thought I'd initiate myself by posting a question/some thoughts
I have a discrete element code (in C) which i have been running v. sucessfully on my G5s , but, like all these things I'de obviously like to run it quicker/more often/higher resolution etc etc
at the moment I have a "tight" piece of code - by that I mean that I have spent many hours really paring it down so that it is absolutely true to the physics of the problem but lean and mean :)
I currently compile this in Xcode, highest optimizations, etc etc
the resulting code is fast, realy fast and a huge improvement, but i was wondering ....
are there any basic tips/flags that I may not be aware of...????.(I am going to take the leap into parallelization fairly soon)....and what pitfalls should I be aware of??
any help appreciated
Stuart Hardy
s



Did you see this post on compiler flags?
http://www.macresearch.org/getting_the_most_out_of_your_apps_knowing_how_to_use_the_g4_and_g5_compiler_flags_for_best_performance
Re: optimising C/C++ code on G5s etc
A couple of things to try.
1) Have you profiled your code with Shark? It's part of the CHUD tools available from the Apple site.
2) There is a mailing list that Apple hosts called performance-optimization (lists.apple.com) if you aren't on it already, you may wish to post there. A lot of experts monitor that list (including people from the performance group at Apple) so you can often get good answers there. Of course the first thing they'll ask is to see a Shark trace :)
3) In terms of compilers. What are you using? The IBM compilers in some cases (many in my hands) will produce code that is sometimes 2x faster than gcc. Although between gcc 4.0 and XLC I've found the performance gains to be less than between gcc 3.3 and XLC.
But the first thing to profile your application in Shark (if you haven't already got one).
Hope that helps,
Dave
optimising c/c++ etc
Thanks for the comments ...they were really quite useful...some thoughts/answers below..
1) I am using Xcode/gcc4.0 - and overall am quite happy with the quality/fidelity of code
optimizations etc....
2) I don't have xlc - am I correct in thinking that it's broken under 10.4.6 - or is that just on glass of wine too many?
3) As regards profiling and CHUD tools etc - the answer is no, but i am actively following this up
maybe the kind of codes i write are a little bit different but often I don't need optimized libraries etc for matrix, or others solvers....these discrete element codes often just need brute force speed and hand-crafted lean-ness...
In many ways I am really ending up at the realization that I have to start/take the plunge into MPI/paralellizing my code
anyone else followed this path and have any tips??
best regards, Stuart
G5 Cluster (DP 2.0GHz x 4) used for simulation of geological processes....
10.4.6 x3, 10.4.6 server x1
optimising c/c++ etc
Hi Stuart,
Regarding:
2) Correct. XLF/XLC are not supported un 10.4. Both will work (XLF fully and XLC will compile but not link). However, you can use XLF to link XLC compiled code. There are other things you can do to "fix" the XLC portion, but that requires modifying some system files, which you may not want to do.
3) I would highly recommend using Shark (as part of CHUD), since as is often the case, the places where we think a program is spending its time, it's really not. Shark is probably one of the best (and easiest to use) profiling tools for perf op.
Before spending any time trying to parallelize via threading or MPI I'd very much recommend making sure that all obvious things in your scalar code are running as well as possible. Again, Shark is your friend and what we think may be an issue often is not, and things we don't think to look at are big problems.
Regards,
Dave
Python
Hi,
My comment is somwhat related to topic; that's why I stick to this forum.
Is there any smart why I can include and test python-scripts using Xcode.
I'm pretty new to Xcode, and I always used python as a basis for calling
c-programs as a subroutine. That's because I have a major interrest in both,
speed and saving time troubling myself with c.
I'd also like to have a Xcode internal command line in connection with the
debugger and couldn't find it yet: For example I would like to use the debugger
also when I call a program, having the call include some different argv - Arguments.
Beste Grüße, Justus
RE: Python and Xcode
Have you seen these?
http://pythonmac.org/wiki/XcodeIntegration
http://ulaluma.com/pyx/archives/2004/02/running_python.html
update on optimising etc
Well I thought I'd update you all, after your useful advice and comments.
I have run shark over my code and (surprise, surprise) some of my major issues
with speed etc come from trig functions.....I use cos, sin, atan, atan2 *a lot*, and so in many ways (and I am sure I am wrong..) I suppose it's difficult to do much with library functions...because I need lots of precision etc etc
as I understand it there are "fast math" flags with gcc, but these have to used with care...
the interesting thing is that, in using shark and standing back objectively and looking at my code and tweaking it, the code itself is now much better/easier to understand...
still not much faster mind you :)
thanks to everyone for their useful comments...I'll keep you all posted
Best Regards - Stuart
G5 Cluster (DP 2.0 GHz x 4) used for simulation of geological processes....
10.4.6 x3, 10.4.6 server x1
Re: update on optimising etc
Hi Stuart,
Not all is lost if you are finding that trig functions are slowing you down. If you can rearrange your code (or if you are already doing a lot of trig functions at the same time) you may be able to vectorize your code. Or better yet the Accelerate library already has many high performance versions of trig functions that operate on single vector quantities or if you are on 10.4 vForce (also part of Accelerate) can be used to operate on arrays of data to do a lot of similar trig functions at once.
If you require double precision or greater, then Accelerate won't work necessarily. But there are still options (including writing your own trancendentals that can be pipelined/inlined, as opposed to using the standard math library versions (which usually have to guarantee a right answer sometimes at the expense of speed).
Just something to think about.
Regards,
Dave