Introducing Remote Activity: Mac-Native Job Monitoring

Around a year ago I decided I would finally bite the bullet and learn Core Data, a relatively new framework for storing data in Cocoa. I find the best way to learn something is to jump in the deep end with a new project, so I thought one up. I contacted a couple of guys that knew more about Core Data than I did — Alexander Griekspoor and Charles Parnot — and gave them no choice but to help me in my quest. They graciously obliged, and after much stopping and starting throughout the year, the application in question is now ready for a public alpha release.

Remote Activity is that application. It is like a distributed version of Activity Monitor: you add your remote hosts (eg clusters, supercomputers, workstations, grids) to it, and it queries them regularly for running jobs, and summarizes the status of all of your calculations in one window. Out of the box it supports a number of batch systems — SGE, LSF, PBS, and Xgrid — and can also be used with the ps to monitor interactive runs.

Batch system specifications are loaded via a plugin architecture, so you can even write your own if it is not included by default. Any scripting language or executable will work, as long as it runs on the remote machine — you do not need to program plugins in Cocoa.

In addition to basic monitoring, Remote Activity can also generate notifications when a job changes status. For example, if you have a long running job, you can ask to receive an email or have Remote Activity popup a dialog box when it finishes.

Remote Activity has a summary feature that can be very useful. Click the summary button, and the window shinks down, only showing job totals for each remote machine. Click the button again, and the window expands to reveal full details of running jobs. This makes it very easy to keep an eye on changes without having to leave the Remote Activity window taking up valuable screen real estate.

To get started using Remote Activity, just download it, and add some remote hosts. You need to have access to your remote hosts without typing a password at this point, which means you need to have SSH or RSH configured properly.

I will probably release the source code under a BSD license when the application is a bit more stable. If you would be interested in having Remote Activity open sourced, please let me know.

Comments

Xgrid in Remote Activity

About Xgrid in Remote Activity
--------------------------

Username
---------

Even though username is not needed by Xgrid, enter a bogus username in the field. Otherwise, remote activity does not work.
(Drew: bug report!)

Password
--------

There is one feature not documented that allows to use Xgrid controller passwords, and also to use the the Keychain to store the password:

CASE 1 : VERY EASY
If you have used GridStuffer, Xgrid FUSE or xgridstatus to set a Keychain password, it will be available automagically!! You will first get a message to allow Remote Activity to use the Keychain item. And you don't need to do anything else.

CASE 2 : EASY

In the environment variables for a host, you can set the password with:
XGRID_CONTROLLER_PASSWORD

The problem with the above is that the password is then "in the clear".

CASE 3 : SECURE

Another option is to set the password with:
XGRID_CONTROLLER_KEYCHAIN
Then, on the first connection by Remote Activity, the password will be stored in the keychain. You can then remove the variable from the Host settings in Remote Activity. The password in the keychain will be used automatically if present.

Shell: local or remote?
------------------
For Xgrid, you would usually use 'bash' as the Shell, so the local user Keychain will be used. I am not sure what will happen when using ssh to connect to an Xgrid from a remote host (e.g. to get past a firewall thru port 22 and then access the Xgrid controller).

Plain Awesome

Thanks Drew, that's really wonderful. I can't say how much that's going to help as I set up my new cluster. I already have accounts on other queues, so it really saves a lot of work.

Fantastic!

I'll give it some beta testing in the next few weeks.

Gatekeepers and Log Files

Hi Drew,

Looks great. Any suggestions for accessing a batch system behind a gatekeeper? That is, I can't SSH directly to a machine that can query the LSF batch system. ssh-agent makes it all pretty seamless for interactive work, but I don't see how I could configure Remote Activity to make the jump.

Do you have plans to let users access stdout and stderr from the batch jobs? This would be extremely useful if something looks funny while monitoring a job.

Neat idea

I ran into two problems using ps: it only shows jobs as the current user (this would be fine if, say, there was a way to integrate loadavg / swap stats into the system status display) and it shows all of its own jobs, causing three sshd/ps/python jobs to show up every time it refreshes. In the general case this seems like a great example of why you'd want custom probe options but I doubt it's ever useful to sample the probe itself, which could be filtering all processes descended from the sshd, which would require walking the process tree but since you're already using python that's pretty easy to do:

http://improbable.org/chris/Software/Enhanced%20Process%20Status%20%28ps%29
http://improbable.org/chris/Software/Enhanced%20Process%20Status%20%28ps%29.diff

This also specifies the field list completely to avoid system-dependent variations (e.g. on our Linux systems all of the processes were listed as elapsed_cpu+command) and has a filter list for processes which we really don't care about (e.g. bash, top, sshd). This is primitive now but could easily be extended by e.g. slurping the contents of /etc/shells.

Nice Script

Nice work with the ps script. I'll test and include it, if you don't mind, and add you to the credits. This is why I want to open source the program, and why having simple plugins is such a great idea: I don't have time to support all of these batch systems well, and others can easily chip in.

Thanks again,
Drew

---------------------------
Drew McCormack
http://www.maccoremac.com
http://www.macanics.net
http://www.macresearch.org

Gatekeepers

That's a good question. How do you normally access the machine? Via a second machine?

I wonder if something could be done with an SSH tunnel. Perhaps I could add an option for the SSH port to use, and you could setup a tunnel via a second machine.

In theory, you could also create a custom batch system script. It would run on the second machine, and issue its own ssh command to get to the LSF machine and run the standard LSF script. An ugly hack, but it would probably work.

If anyone else has ideas, please chime in.

Drew

---------------------------
Drew McCormack
http://www.maccoremac.com
http://www.macanics.net
http://www.macresearch.org

SSH tunnels

Actually, I do already have a spot for ssh options in the Host info, but it turns out I forgot to actually do anything with what you do there. That's alpha software for you.

I'll fix this, so that it should be possible to use a different port for SSH, and thereby a tunnel via the gatekeeper. Does that seem like it would work?

Drew

---------------------------
Drew McCormack
http://www.maccoremac.com
http://www.macanics.net
http://www.macresearch.org

Sparkle

Drew: it would be a good idea to add Sparkle asap to the application. I just did that for Xgrid FUSE and it is very easy. Not only it works well, but the integration process and usage is very well explained by Andy M. (sorry still can't spell his last name!).

* Add the framework (link anc copy)
* Instantiate an SUUpdater object in the main nib
* Create an appcast on a web site
* Create an info/changelog page on the web

Maybe I could do it fo ryou

Re: Sparkle

Hi Charles,

Thanks for the suggestion. I recently added Sparkle to Mental Case, and it was pretty easy.
It's just a question of finding the time. Even little things like that. There are lots of them.

Growl would also be nice.

Drew

---------------------------
Drew McCormack
http://www.maccoremac.com
http://www.macanics.net
http://www.macresearch.org

It's worth a shot; now about SGE ...

Sounds like it's worth a shot; I'm glad I didn't work too hard on trying to set up the tunneling in the current version :-) . At the moment, though, I find myself unable to monitor a remote SGE cluster accessible directly via SSH. Remote Activity appears to connect successfully, but it doesn't find my running jobs. I'd like to try to debug this before attempting to set up the more complicated LSF monitor. What do you use to query SGE? Some wrapper around qstat?

Feel free to PM or email me if you're prefer to take this offline. Thanks, Adam

P.S. The Xgrid monitoring set up was very smooth indeed. It's great to see GridEZ starting to catch on.

RockOS and SGE

I'm also not able to display the running jobs in our cluster running Rocks 4.2.1. as Operating System, and SGE 6.0u8.

I tried to connect via SSH on the frontnode as a normal user, but it doesn't display anything.
---
I'm not telling you that you should believe me. Learn the facts, and the origins behind the facts, and make up your own damn mind. That's why you have one.

Nor I do... Debian Cluster

Nor I do... Debian Cluster and SGE 6.1

F.J M.

SGE

Hi Guys,
I'm working with Adam on this, and if we find the problem, we will let you know.
Drew

---------------------------
Drew McCormack
http://www.maccoremac.com
http://www.macanics.net
http://www.macresearch.org

OpenSSL

One thing I should point out is that at this point Remote Activity requires that openssl is available on the remote machine. I may change this in future, but for now you need to be able to run the openssl command on the remote machine.

Drew

---------------------------
Drew McCormack
http://www.maccoremac.com
http://www.macanics.net
http://www.macresearch.org

Feel free to include it

Also - I had to use a web search to find this article again. If you're going to open source it, maybe it'd be good to use something like Google Code which has RSS feeds?

Remote Activity is on Google Code

Actually, it is already on Google Code. I announced that last week on MR. You can find the project here:

http://code.google.com/p/remoteactivity/

Anyone wanting to contribute is most welcome.

Drew

---------------------------
Drew McCormack
http://www.maccoremac.com
http://www.macanics.net
http://www.macresearch.org

Managing other Unix boxes (Sun, Linux, etc.)

Is there anyway to use this software to monitor other Unix flavors like Sun?

Other Unixes

There is nothing in Remote Activity stopping you monitoring other Unix systems --- that's what is designed for. If your Sun box is running SGE, for example, Remote Activity should be able to show you your running calculations.
The ps system is written for Mac OS X, but the others should work with any system.

Drew

---------------------------
Drew McCormack
http://www.maccoremac.com
http://www.macanics.net
http://www.macresearch.org

Other Unixes and Xgrid

And, of course, the Xgrid plug-in will also not work on non-OS X machines :-)