OS X server 10.4/5, NFS and Intel
Hello,
We are experiencing odd behavior on two of my intel Macs running server 10.4 and 10.5. These system are development platforms that are used for nightly regression test of a complex framework of codes. In general these systems are configured into our NIS environment and make use of automounts for source and users home directories. Other than the NIS hooks they are stock OS X server with the latest updates with the exception that they serve up the Intel suite of compilers.
The behavior we are experiencing is that during the night (when these regression tests run) we will loose our NFS mounts and on the 10.5 system the SecurityAgent.app goes nuts and continuously respawns with SecurityAgent and authorizationhost. The only resolution is to reboot the servers.
Unfortunately, the logs do not provide much in the way of clues and I don't understand Apple's logging mechanisms to increase the logging levels or if other logging information is located outside the /var/log directory.
The only suspect that I have at this point is a possible link with the Intel compilers that run on these boxes. It has been reported that the licensing will only permit one compile operation at a time and running multiple compiles that might be typically called from a cron job or other script will hose the system where the only resolution is a reboot.
Has anyone in this forum experienced similar symptoms in any situation that is similar to what we're experiencing?
Thanks in advance.
Joe



Losing NFS mounts
Hello,
We are also suffering a similiar problem. Our problem is a little different, in that our NFS mounts disappear at seemly random times, but reappear immediately after without reboots being required. We also are not using the Intel compiler.
We have a G5 Xserve for a head node, that is serving out a couple of shares via NFS. It also acts as a Xgrid controller, managing agents that include a cluster of g5 xserves, and a variety of desktop machines (g5 mac pros, intel mac pros, and intel imacs). These machines all mount the NFS shares using automount as provided by Open Directory. The grid is set up with Kerberos authentication, but the NFS mounts are not.
The jobs we submit to our grid will have between 10 to 2000 tasks per job. These tasks are mainly CPU intensive, but we do have a couple the I/O intensive (essentially data gather jobs). These jobs usually have no problems. But occasionally, we will see a task fail. When looking into the task, it is claiming that it cannot find the NFS mounts. This can happen with both CPU and I/O intensive tasks.
When we log onto the node that had the failure, the mounts exist. What is even more odd about this is that the node in question will have had other similar tasks running at the same time, and they do not report any problem with the mounts.
We are somewhat stuck at this point. We have monitored both the server, and the network. Both seem increases in activity, but below being saturated. We have seen no problems in the logs, either on the NFS server or the client. It does not seem to associated with either CPU or I/O load. The NFS mounts just disappear (and reappear) silently.
So, we add our voice to Joe's in looking for answers to this problem.
Dale