Applescript Tutorial 8: Rendering chemical structures embedded in graphics file

Author: drc
Web Site: http://www.macinchem.org

Rich Apodaca has been discussing embedding molecular information in images of molecules, such as a PNG file depicting a 2D structure. As we move to a more web-centric view of the world it is apparent that much of research information will be only available via the web, whilst images of chemical structures are usually adequate for a human viewer the chemical structure cannot be indexed and subsequently searched. In a subsequent article Rich showed a method of extracting the information as text. In this tutorial I'm going to show how to use applescript to extract the information from the PNG file and then display the structure in a couple of chemical display packages in an editable form.

This script will require a couple of things, ChemBioDraw (aka ChemDraw), MacPymol (http://pymol.sourceforge.net/), and the excellent ExifTool by Paul Harvey (http://www.sno.phy.queensu.ca/~phil/exiftool/). ExifTool is a platform-independent Perl library plus a command-line application for reading, writing and editing meta information in image, audio and video files. You will also need a couple of image files.

This file is Lipitor generated by Geoff Hutchison this contains the chemical information embedded as both SMILES and molfile format.

The second is rosiglitazone from Rich Apodaca which has the chemical information embedded in molfile format only. You can drag these images to your desktop to work with.

The first part of the script simply asks the user to choose the image file, and than creates the POSIX path to the file since this is needed by ExifTool.
The next part creates a three button dialog box allowing the user to choose the application to view the resulting structure.

The main part of the script then uses ExifTool to extract the metadata, in the case of ChemDraw we can generate the structure from either the SMILES string or molfile data. As written the script first extracts the SMILES and then checks to see if a string has been extracted. If there is no SMILES it then gets the molfile information. If using the molfile we need to save the data to a file using the write to file routine (actually saved into temporary items folder) called temp.mol. This is then opened using ChemBioDraw. If the SMILES data is present then we can simply use menu item scripting within ChemBioDraw to create the structure using the "Paste Special" option "SMILES". The metadata sometimes contains tabs so use the find and replace routine to remove them.

As an alternative MacPymol can be used to display the structures, since MacPymol cannot convert SMILES to structures (this will be possible using the next version of OpenBabel which will be released early 2008), we can only use the molfile info.

If we now run the script choosing the rosiglitazone.png file, then selecting ChemBioDraw for display you should get this result.

Using lipitor1.png and MacPymol you should see this.

You can download a copy of the script here.

set theMetadata to ""

set theFile to (choose file with prompt "Choose a image file")
set the_path to theFile as string
--display dialog the_path
set posix_path to POSIX path of the_path


display dialog "How would you like to display the structure? " buttons {"Cancel", "ChemDraw", "MacPyMol"} default button 1
if the button returned of the result is "ChemDraw" then
	--set theScript to "exiftool -SMILES -b  /Users/username/Desktop/lipitor1.png"
	
	set theScript to "exiftool -SMILES -b  " & posix_path
	
	set theMetadata to (do shell script theScript)
	
	if theMetadata is "" then
		set theScript to "exiftool -molfile -b  " & posix_path
		
		--use as text to remove non-printing characters
		set theMetadata to (do shell script theScript) as text
		
		
		set target_file to (path to temporary items folder as string) & "temp.mol"
		write_to_file(theMetadata, target_file, false)
		
		tell application "CS ChemBioDraw Ultra"
			activate
			open file target_file
		end tell
	else
		
		set this_text to (replace_chars(theMetadata, tab, ""))
		set the clipboard to this_text as text
		
		tell application "CS ChemBioDraw Ultra"
			activate
			
			if enabled of menu item "Paste" then do menu item "SMILES" of menu "Paste Special" of menu "Edit"
			
		end tell
		
	end if
else if the button returned of the result is "MacPymol" then
	
	set theScript to "exiftool -molfile -b  " & posix_path
	
	--use as text to remove non-printing characters
	set theMetadata to (do shell script theScript) as text
	
	
	set target_file to (path to temporary items folder as string) & "temp.mol"
	write_to_file(theMetadata, target_file, false)
	
	tell application "MacPyMOL"
		activate
		open file target_file
	end tell
else
	quit
end if

--Routines
on replace_chars(this_text, search_string, replacement_string)
	set AppleScript's text item delimiters to the search_string
	set the item_list to every text item of this_text
	set AppleScript's text item delimiters to the replacement_string
	set this_text to the item_list as string
	set AppleScript's text item delimiters to ""
	return this_text
end replace_chars

on write_to_file(this_data, target_file, append_data)
	try
		set the target_file to the target_file as text
		set the open_target_file to open for access file target_file with write permission
		if append_data is false then set eof of the open_target_file to 0
		write this_data to the open_target_file starting at eof
		close access the open_target_file
		return true
	on error
		try
			close access file target_file
		end try
		return false
	end try
end write_to_file