LAPACK/BLAS from FORTRAN on 10.6

Dear all,

since updating to 10.6, I've been struggling with strange crashes in connection with LAPACK/BLAS routines used in my C++ code, when using the Apple LAPACK and BLAS. They are strange in this sense, that I get a "Bus error" in a reproducible fashion, however depending on whether optimization was. Further, even adding an innocent statement such as cout << "Hi" may prevent the crashes - which probably means that there is some subtle memory corruption issue. Up to now, it's hard for me to tell if it's my own fault, or Apple's, but the same code runs fine under a variety of Linux machines. (Desperately waiting for valgrind on 10.6 ...)

Anyway, during my search I finally found a reproducible problem (not susceptible to optimization). Consider the following Fortran code:

PROGRAM TEST

COMPLEX*16 A(1), B(1),C
COMPLEX*16 ZDOTC
EXTERNAL ZDOTC

A(1)=1;
B(1)=2;

C=ZDOTC( 1, A, 1, B, 1 )

WRITE (*,*) C

END

(It is essentially just a complicated way of calculating A(1)*B(1))
If I compile this with gfortran (4.5 from hpc.sourceforge.net) or g95 (current version from www.g95.org) and link against the Apple BLAS (-lblas), I always get a Bus error (gfortran) or segfault (g95) at C=ZDOTC ...

Note that this does not happen if I use the manually compiled version of BLAS from netlib.org.

I was wondering if anybody else could reproduce this problem, just to make sure that I didn't overlook something simple ...

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

confirmed...

I don't have a solution either, but I can confirm that the above code also crashes with a Bus error on my machine when linked against Apple's blas. My machine is Mac OSX 10.6.1, tried gfortran-4.5 as well as gfortran-4.3 from MacPorts. The code works fine when linked against a self-compiled blas from netlib. The crashes seem to be specific to the complex DOT's:

* COMPLEX*16 + ZDOTC --> Bus error
* COMPLEX + CDOTC --> Segmentation fault
* REAL + SDOT --> works
* DOUBLE PRECISION + DDOT --> works

Crashes occur irrespective of the optimization level used. Seems to be a bug in Apple's BLAS on 10.6. Did you already file a bug report on https://bugreport.apple.com/?

I haven't filed a bug yet,

I haven't filed a bug yet, but I will now. Thanks for the confirmation!

Another test

I have now filed a bug with Apple. Seems there was something like this before on PowerPC: http://developer.apple.com/hardwaredrivers/ve/errata.html (Note that this page is several years old, and g77 is esentially dead)

However, could I ask you for another test? I have singled out another problem, this time a more subtle one. This involves a crash that only appears when compiler optimization is on, but the crash happens for both the Apple gcc4.2 and the MacPorts gcc4.4. Hence, I assume that it's not really a compiler bug, but rather a problem of Apple's LAPACK:

Download the test program (it is somewhat large, but that's how much I could break it down) from

http://dl.getdropbox.com/u/769003/test.cc

and compile with

g++ test.cc -O3 -llapack -lblas

For me the program fails with a bus error if optimization is on and the target is 64 bits. However, the program does not fail using the reference LAPACK from netlib, nor on several different Linux machines that I can access.

Again, I'd only like to see if you experience the same problem, I don't expect any solutions.

bus error, too ...

The code also fails with a bus error in zunmhr on my machine:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_PROTECTION_FAILURE at address: 0x00007fff5fc00000
0x00007fff8018ec9a in s_cat ()
(gdb) where
#0 0x00007fff8018ec9a in s_cat ()
#1 0x00007fff805449d7 in zunmhr_ ()
#2 0x2020202020202020 in ?? ()

In my case, it crashes irrespective of the optimization setting. It does not crash when linked against netlib's blas/lapack. Same results with Apple's gcc-4.2 and MacPort's gcc-4.3.

Thanks again

Thanks again, steja. I'll file another bug. My work depends crucially on a functional LAPACK and BLAS, and I always get nervous when I find a crash there - could just as well be a bug in my code. Your help is truly appreciated.

Are you sure though about the optimization? When I compile with -O0, the code doesn't crash. (But that's not so important, I guess)

The strange thing about the crash is that it happens in s_cat: that's a function that just concatenates strings ...

So, I had the same problem

So, I had the same problem to with lapack and blas. The problem also persisted with a self compiled lapack and blas library until I moved that self compiled library to my home directory, in which i also compiled the program.
(The library was situated in my /opt/.. directory)

Now I dont have any problems any more. It's probably a issue with user rights and not with the Mac OS X lblas and lapack framework.

User rights?

Dear David,

which problem did you exactly have? This information would certainly help a lot.

In addition, in which respect did the self-compiled LAPACK and BLAS differ when being installed in different directories? If it has something to do with user rights, there should be a difference, right?

Any updates

Are there any updates as to the compatibility of GCC/Fortran/gfortran/g95 and Snow Leopard?
I am waiting to upgrade until this issue is settled.

As always, thank you in advance!

marc

update to SL

Could you specify the issues you are experiencing in more detail? I installed Snow Leopard about two weeks after it came out, and I am using gcc (from MacPorts as well as the Apple Developer tools) and gfortran since then without any problems.

Regarding gfortran, I should add that I still use it regularly to compile some of my old (tried and tested) code, but I don't use Fortran anymore for actual software development, so more delicate issues could have escaped my notice.

I had exactly the same

I had exactly the same problem with the ZDOTC routine of LAPACK and BLAS, as mentioned in the first post of this thread. When using that routine, the program would crash. Their was no difference in the compilation options for the Lapack routines. The only discernible difference is in the place located on the disk, and hence also in the user rights.
Once in my home directory, no problems at all, and once in the "/opt/sw/lin" directory which did not work.

In the meantime I ran the fix user rights option in the harddisk system utility program of Mac OSX and I also upgraded to 10.6.2. Today, I tested my program with the MacOSX LAPACK and BLAS libraries using the "-framework acclerate" option as a Linking option and had now NO problems at all. The last time I tried this, ZDOTC did not work.
I can't tell if this is due to the update or do fixing user rights in the system utility program.

More on LAPACK/BLAS from FORTRAN on 10.6

This seems to be a problem with the underlying definition interface between the fortran and c routines. I seem to remember running into this before when I compiled SLaPACK a couple of years ago. I will dig up my notes that provided a work around.

If I try the fortran code above, I get the same crash. However, if one uses the strictly C interface and the code snippet

#include '<'Accelerate/Accelerate.h'>'
#include '<'stdlib.h'>'
#include '<'stdio.h'>'

int main()
{
__CLPK_doublecomplex a[1],b[1],c;
a[0].r = 1;
b[0].r = 2;
a[0].i = 0;
b[0].i = 0;

cblas_zdotc_sub(1,a,1,b,1,&c);

fprintf(stderr,"%g,%g\n",c.r,c.i);
return(0);
}

Then there is no problem. Again, this is likely to be caused, not by the libraries, but by the odd way they are defined for fortran from the c-versions. As Fortran is not the actual code used for the accelerated versions of lapack and blas furnished by apple, I'm not surprised. I do know that in the end I used ATLAS instead of apple's version to compile SLaPACK. It proved slightly faster and worked.

Hope this helps.

10.5.8 gfortran and ifort tests

On 10.5.8 all versions of gfortran fail (4.3, 4.4 and 4.5), but ifort works
[mac27:~] bmcinnes% gfortran-mp-4.3 test.f90 -lblas
[mac27:~] bmcinnes% ./a.out
Segmentation fault
[mac27:~] bmcinnes% gfortran-mp-4.4 test.f90 -lblas
[mac27:~] bmcinnes% ./a.out
Segmentation fault
[mac27:~] bmcinnes% gfortran test.f90 -lblas
[mac27:~] bmcinnes% ./a.out
Segmentation fault
[mac27:~] bmcinnes% gfortran -v
Using built-in specs.
Target: i386-apple-darwin8.10.1
Configured with: /tmp/gfortran-20090604/ibin/../gcc/configure --prefix=/usr/local/gfortran --enable-languages=c,fortran --with-gmp=/tmp/gfortran-20090604/gfortran_libs --enable-bootstrap
Thread model: posix
gcc version 4.5.0 20090604 (experimental) [trunk revision 148180] (GCC)
[mac27:~] bmcinnes% ifort test.f90 -lblas
[mac27:~] bmcinnes% ./a.out
(2.00000000000000,0.000000000000000E+000)
[mac27:~] bmcinnes%

ABI mismatch

The issue here is that gfortran (incorrectly) assumes that the library functions are using the gfortran calling conventions, whereas they are actually using the g77/f2c calling conventions, which were the de facto standard for OS X fortran compilers at the time that the Accelerate.framework was published.

In the g77 calling convention, functions that return a complex value behave as though they were SUBROUTINEs with an additional first argument in which the result is to be returned. At the ABI level, the result is stored to a memory location pointed to by this implicit first argument; they behave as though they were C functions declared like:

void function(complex double *result, other arguments ... )

In the gfortran calling convention, on the other hand, functions with a complex return type return the value in register, and do not have an implicit memory location argument.

Unfortunately, Apple cannot change the calling convention in use by the Accelerate.framework, as this would break binary compatibility with existing working applications. (This is why languages and compilers that need to interface with system libraries should NEVER change calling conventions on a platform -- you break it, you buy it).

There are two workarounds, neither one of which is particularly satisfying:

(a) explicitly treat *DOTC and *DOTU as subroutines:

PROGRAM TEST
COMPLEX*16 A(1), B(1),C
EXTERNAL ZDOTC
A(1)=1;
B(1)=2;
CALL ZDOTC(C, 1, A, 1, B, 1)
WRITE (*,*) C
END

(b) compile your gfortran codes with the -ff2c flag, which will cause gfortran to use the old calling convention.

If you make only a few calls to *DOTC/*DOTU, and do not require cross-platform compatibility, (a) is likely the better solution. If you need to have portable code and want a quick fix, (b) should work. A better solution is probably to write a wrapper that does the calling convention translation for you, so you don't need to change your program itself.

The best solution would probably be for gfortran to provide a means to decorate functions to indicate that they use the g77/f2c calling conventions.

One more thing

If you think there is a bug in OS X, or other Apple software, please file a bug! You need to have an ADC account, but a free account works just fine.

Some Apple engineers do read the forums here, but in general, word of your troubles will reach the person who can respond to them much more quickly if you report the issue straight away.

Make sure to request features you have need for, too!

The -ff2c flag seems to

The -ff2c flag seems to work, but in practice only with -m32, as SDOT fails with -ff2c -m64 (always returns 0).

So, with -m64, the only way SDOT works correctly is with the never calling convention (no -ff2c), and the only way CDOTU works correctly is with the older calling convention (with -ff2c). AFAIK, they are both in the same library (/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib)!

Here is the test program I used:

program main

real sdot,a(1),b(1),w
external sdot
complex cdotu,ac(1),bc(1),wc
external cdotu

a(1) = 1e0
b(1) = 2e0
w = sdot(1,a,1,b,1)
PRINT *,a(1),b(1),a(1)*b(1),w

ac(1) = cmplx(1e0,1e0)
bc(1) = cmplx(1e0,2e0)
wc = cdotu(1,ac,1,bc,1)
PRINT *,ac(1),bc(1),ac(1)*bc(1),wc

if (w .ne. a(1)*b(1)) stop 1

if (wc .ne. ac(1)*bc(1)) stop 1

end

And here the output (same results with gfortran-4.2, gfortran-mp-4.3, and gfortran-mp-4.5):

$ gfortran-4.2 -ff2c -m64 my_conftest2.f -lblas
$ ./a.out
1.000000 2.000000 2.000000 0.000000
( 1.000000 , 1.000000 ) ( 1.000000 , 2.000000 ) ( -1.000000 , 3.000000 ) ( -1.000000 , 3.000000 )
STOP 1
$ gfortran-4.2 -m64 my_conftest2.f -lblas
$ ./a.out
1.000000 2.000000 2.000000 2.000000
Segmentation fault
$ otool -L ./a.out
./a.out:
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib (compatibility version 1.0.0, current version 219.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 125.0.1)
$

Bug Report filed

Bug Report filed 4/10/2010

Summary

libBLAS.dylib (at /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/), when called from FORTRAN, requires different calling conventions for different functions.

- CDOTU (library symbol _cdotu_) requires "C calling convention" (compiler option -ff2c). Without this option the program always produces a segmention fault.
- SDOT (library symbol _sdot_) requires normal fortran calling convention (no compiler option). With -ff2c this function always return 0, when compiled for 64 bits (-m64).

Steps to Reproduce:

1. Create file my_conftest.f: (spaces in the beginning of lines are significant for FORTRAN):
program main

real sdot,a(1),b(1),w
external sdot
complex cdotu,ac(1),bc(1),wc
external cdotu

a(1) = 1e0
b(1) = 2e0
w = sdot(1,a,1,b,1)
PRINT *,a(1),b(1),a(1)*b(1),w

ac(1) = cmplx(1e0,1e0)
bc(1) = cmplx(1e0,2e0)
wc = cdotu(1,ac,1,bc,1)
PRINT *,ac(1),bc(1),ac(1)*bc(1),wc

if (w .ne. a(1)*b(1)) stop 1

if (wc .ne. ac(1)*bc(1)) stop 1

end

2. Install GNU Fortran 4.2.4 for Mac OS X 10.6 (Snow Leopard) from http://r.research.att.com/tools/

Direct link to dmg: http://r.research.att.com/gfortran-42-5646.pkg

Installation warns about overwriting cc1 (/usr/libexec/gcc/i686-apple-darwin10/4.2.1/cc1), so you may want to save the original, if using the latest OSX SDK (3.2.2).

(same results also with macports gfortran 4.3 and 4.5)

3. Compile with different options and inspect results:

3.1 64-bit

3.1.1 64-bit with -ff2c

$ gfortran-4.2 -ff2c -m64 my_conftest2.f -lblas
$ ./a.out
1.000000 2.000000 2.000000 0.000000
( 1.000000 , 1.000000 ) ( 1.000000 , 2.000000 ) ( -1.000000 , 3.000000 ) ( -1.000000 , 3.000000 )
STOP 1

(STOP 1 means that incorrect result was computed (2.0 vs. 0.0)

3.1.2 64-bit without -ff2c

$ gfortran-4.2 -m64 my_conftest2.f -lblas
$ ./a.out
1.000000 2.000000 2.000000 2.000000
Segmentation fault

(Call to CDOTU causes a segmentation fault)

3.2 32-bit:

3.2.1 32-bit with -ff2c

$ gfortran-4.2 -m32 -ff2c my_conftest2.f -lblas
$ ./a.out
1.000000 2.000000 2.000000 2.000000
( 1.000000 , 1.000000 ) ( 1.000000 , 2.000000 ) ( -1.000000 , 3.000000 ) ( -1.000000 , 3.000000 )
$

(This demonstrates the expected outcome == no error)

3.2.2 32-bit without -ff2c

$ gfortran-4.2 -m32 my_conftest2.f -lblas
$ ./a.out
1.000000 2.000000 2.000000 2.000000
Segmentation fault
$

(same failure as with 64-bit without -ff2c)

3.3 Verify the libraries referenced by the program:

$ otool -L ./a.out
./a.out:
/System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib (compatibility version 1.0.0, current version 219.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 125.0.1)
$

Expected Results

The test program given above is designed to verify that SDOT and CDOTU give correct results, see 3.2.1 above.

Actual Results

In 32-bit mode, using -ff2c provides an effective workaround for CDOTU producing a segmentation fault. However, the same does not work in 64-bit mode, as -ff2c causes SDOT to always return zero. SDOT works correctly without -ff2c.

It seems that the 64-bit ABI for libBLAS is broken, at least when called from FORTRAN, using portable FORTRAN code. The program can be modified to emulate the f2c calling convention just for the complex functions (such as CDOTU), by converting CDOTU to a function call, where the result is the 1st parameter. That kind of code, however, is not portable, and requires also other changes in the program code (i.e., the declaration for CDOTU must be changed as well).

Regression

As summarized above, an effective workaround (compiler option) exists for 32-bit code, bit not for 64-bit code.

Second thougths... It seems

Second thougths...

It seems odd that -ff2c calling convention should have anything to do with how SDOT is called, since it is not returning a complex type. So maybe this is a bug in gfortran, not the libblas.

The issue with SDOT on

The issue with SDOT on 64-bit Intel is a legitimate bug with the BLAS not properly obeying the g77/f2c calling conventions. Specifically, the g77/f2c convention is for single precision results to be returned as double precision, but the OS X BLAS is incorrectly returning the result in single precision. When that single precision result is interpreted as a double precision value by the calling routine, the result is a very tiny (denormal) double-precision value, which rounds to exactly zero when converted to single.

Thanks for reporting the bug.

Dear all, since i just

Dear all,
since i just wanted to start coding Fortran on OsX i tried to build my own BLAS using ATLAS to avoid using Apples own BLAS. I built ATLAS with the -O3 -m64 options and the full LAPACK and the "Segmentation fault" still persists using the code provided by user jarno !

MacBook-Pro:~ sbarthel$ gfortran -O3 -m64 check.f95 -L/usr/local/atlas/lib/ -latlas
MacBook-Pro:~ sbarthel$ ./a.out
1.0000000 2.0000000 2.0000000 2.0000000
Segmentation fault
MacBook-Pro:~ sbarthel$

Please do not hesitate to correct me (i am new to this field), but if this was not compiler related the error should vanish in my custom built? My compiler version is the gcc 4.5.0 package.

Hello, I think you may still

Hello,

I think you may still be linking to the original apple provided ATLAS library with -latlas. See "Problems with linking/missing LAPACK routines on OS X" On the ATLAS Errata document on the ATLAS blas source forge website. I would post the link myself, but apparently that triggers the spam protection.

You may need to try specifically telling the compiler which atlas lib .a file to link specifically, as it searches the apple lib directories first.

jarno, any word back from

jarno, any word back from Apple concerning your bug report?

I'm trying to build a ScaLAPACK on x86_64 that is linked to the Accelerate framework and this issue is making it extremely difficult. If everything is compiled with -ff2c the library is mostly functional. However, some PBLAS level 1 single precision routines fail:

PSDOT
PSASUM
PSCASUM

Running the BLAS testsuite from http://www.netlib.org/blas/ using -ff2c reveals more problems in the level 1 BLAS:

Failed single precision real routines:
SDOT
SNRM2
SASUM

Failed single precision complex routines:
SCASUM
SCNRM2

The double precision routines appear to be just fine in the presence of -ff2c.

As a side note, does anyone know of a good shootout between Apple's Accelerate framework and ATLAS? It looks like I may have to abandon Accelerate as sorting out where and where not to use -ff2c with ScaLAPACK is going to be a nightmare for bigger software packages.

yeah, me

yeah, me too!
_____________________________________________
Se en masse fede elbiler fra Danmark. køb eller lej elbiler.