Applescript Tutorial 3

Author: Chris Swain
Website: http://www.macinchem.org

Reading, Writing and using Lists

The following Applescript uses Chemdraw to calculate to calculate a variety of molecular properties and then stores them as individual values. These can then be used as demonstrated rather trivially by the display dialog command. The script can be downloaded here.

tell application "CS ChemDraw Ultra"
	
	set the_SMILES to SMILES of selection
	set Elem_Anal to Elemental Analysis of selection
	set Exact_mass to Exact Mass of selection
	set Mol_Form to Molecular Formula of selection
	set Mol_weight to Molecular Weight of selection
	
	
	set Chem_props to "SMILES " & the_SMILES & return & "Chem Analysis " \
                             & Elem_Anal & return & "Molecular Formula " \
                             & Mol_Form & return & "Molecular Weight " \
                             & Mol_Form & return & "Molecular Weight " & Mol_weight
	
	
	
	display dialog Chem_props
end tell

This is fine if all you have to do is calculate the properties for a single molecule but what if you want to perform the calculation of a list of structures. Suppose you have a file containing a series of structures in SMILES format, the file should look like this, a tab deliminated list with SMILES string followed by compound name.

    c1ccccc1 benzene

    Ic1ccccc1 iodobenzene

    O=C1CCCCC1 cyclohexanone

    NC1CCCCC1 cyclohexamine

    CN(C)c1cccnc1 3-dimethylaminopyridine

    N1(c2ccccc2)CCNCC1 phenylpiperazine

You can download the file here temp_mac.txt.zip control click on the link and choose "Download linked file ....". What we need to do now is have the user choose a file, read the contents and then store the data in a list. Lists are just a group of values stuck between {} for example {1,2,3} or {1,"b","hello",{1,3,5}}. As you can see you can mix types, and even have a list within a list. So in the script below we first define the list we will read the molecules into, then get the user to choose a file, read the contents of the file into theData.

set mol_list to {}
set theData to ""

set theFile to (choose file with prompt "Select the file:" of type {"TEXT"}) as alias

open for access theFile

set theData to read theFile using delimiter return

close access 

If you copy and paste the above text into Script Editor, compile select "Event Log" and click "Run" you can choose the temp_mac.txt file and you should see a result as shown below. Each of the lines is read as a value into the list:-

{"c1ccccc1 benzene", "Ic1ccccc1 iodobenzene", "O=C1CCCCC1 cyclohexanone", "NC1CCCCC1 cyclohexamine", "CN(C)c1cccnc1 3-dimethylaminopyridine", "N1(c2ccccc2)CCNCC1 phenylpiperazine"}

Having read the file we will of course want to write out the results at some point so this seems a good time to think about the the file we will be saving to. We do this with the help of a simple sub-routine, we want to save the results in the same folder as the file we read in. We pass "theFile" to the sub-routine which returns the folder in which it resides. It is a simple task to append the output file name.


set the_file_path to GetParentPath(theFile)

set theSaveFile to the_file_path & "test2.smi"

on GetParentPath(theFile)
	tell application "Finder" to return container of theFile as text
end GetParentPath

So now we have all the data into a list we can begin to manipulate it, first we need to get the SMILES strings. At the moment the first item in the list is "c1ccccc1 benzene" we need to seperate the two terms. First change the text delimiter to "tab" then a simple repeat loop selects each item in theData and copies it to the end of a new list called "mol_list". Remember to change the delimiter back!

set text item delimiters to tab
repeat with i from 1 to count of theData
	set theLine to text items of item i of theData
	copy theLine to the end of mol_list
end repeat
set text item delimiters to ""


The result is a list of lists:-

{{"c1ccccc1", "benzene"}, {"Ic1ccccc1", "iodobenzene"}, {"O=C1CCCCC1", "cyclohexanone"}, {"NC1CCCCC1", "cyclohexamine"}, {"CN(C)c1cccnc1", "3-dimethylaminopyridine"}, {"N1(c2ccccc2)CCNCC1", "phenylpiperazine"}}

We can select both the "SMILES" and "name" of each item of "mol_list" and use "ChemDraw to calculate the properties.

set the_compound to item i of mol_list
set the_SMILES to item 1 of the_compound
set the_name to item 2 of the_compound
--display dialog the_SMILES
--display dialog the_name
set the clipboard to the_SMILES

However getting ChemDraw to create the chemical structure from the SMILES string is not straight-forward, there is not a "Paste SMILES" command in the Applescript dictionary. So we script the menus to paste the SMILES. The rest of the ChemDraw commands you have seen before. We then combine all the different data items for a single compound into a list "mol_props_list" and then add them to the end of "all_mol_list"

tell application "CS ChemDraw Ultra"
	
	activate
	
	if enabled of menu item "Paste" then do menu item "SMILES" of menu "Paste Special" of menu "Edit"
	
	set the_CD_SMILES to SMILES of selection
	set Elem_Anal to Elemental Analysis of selection
	set Exact_mass to Exact Mass of selection
	set Mol_Form to Molecular Formula of selection
	set Mol_weight to Molecular Weight of selection
	
	
	copy the_SMILES to the end of mol_props_list
	copy the_name to the end of mol_props_list
	copy the_CD_SMILES to the end of mol_props_list
	copy Elem_Anal to the end of mol_props_list
	copy Exact_mass to the end of mol_props_list
	copy Mol_Form to the end of mol_props_list
	copy Mol_weight to the end of mol_props_list
	
	if enabled of menu item "Paste" then do menu item "Clear" of menu "Edit"
	--display dialog (item 3 of mol_props_list)
end tell
copy mol_props_list to the end of all_mols_list

It only remains to convert the list to tab delimited text and then save the result. The repeat loop does the conversion and the sub-routine adds each line to the file. It is probably worth mentioning that having regularly used snippets of code as sub-routines certainly helps the cut and paste school of programming!

repeat with i from 1 to num_compounds
	set mol_list to item i of all_mols_list
	-- convert list to text
	set old_delim to AppleScript's text item delimiters
	set AppleScript's text item delimiters to tab
	set mol_list to mol_list as text
	--set mol_list to mol_list & "\n"  needs UNIX line endings
	set mol_list to mol_list & "
"
	set AppleScript's text item delimiters to old_delim
	my write_to_file(mol_list, theSaveFile, true)
end repeat



on write_to_file(this_data, target_file, append_data)
	try
		set the target_file to the target_file as text
		set the open_target_file to N
			open for access file target_file with write permission
		if append_data is false then N
			set eof of the open_target_file to 0
		write this_data to the open_target_file starting at eof
		close access the open_target_file
		return true
	on error
		try
			close access file target_file
		end try
		return false
	end try
end write_to_file

The result should look something like this:-

    c1ccccc1 benzene c1ccccc1 C, 92.26; H, 7.74 78.0469501926 C6H6 78.11184

    Ic1ccccc1 iodobenzene Ic1ccccc1 C, 35.32; H, 2.47; I, 62.21 203.9435931605 C6H5I 204.00837

    O=C1CCCCC1 cyclohexanone O=C1CCCCC1 C, 73.43; H, 10.27; O, 16.30 98.0731649431 C6H10O 98.143

    NC1CCCCC1 cyclohexamine NC1CCCCC1 C, 72.66; H, 13.21; N, 14.12 99.1047994225 C6H13N 99.17412

    CN(C)c1cccnc1 3-dimethylaminopyridine CN(c1cnccc1)C C, 68.82; H, 8.25; N, 22.93 122.0843983314 C7H10N2 122.1677

    N1(c2ccccc2)CCNCC1 phenylpiperazine N1(CCNCC1)c2ccccc2 C, 74.03; H, 8.70; N, 17.27 162.1156984598 C10H14N2 162.23156

    c1ccccc1 benzene c1ccccc1 C, 92.26; H, 7.74 78.0469501926 C6H6 78.11184

    Ic1ccccc1 iodobenzene Ic1ccccc1 C, 35.32; H, 2.47; I, 62.21 203.9435931605 C6H5I 204.00837

    O=C1CCCCC1 cyclohexanone O=C1CCCCC1 C, 73.43; H, 10.27; O, 16.30 98.0731649431 C6H10O 98.143

    NC1CCCCC1 cyclohexamine NC1CCCCC1 C, 72.66; H, 13.21; N, 14.12 99.1047994225 C6H13N 99.17412

    CN(C)c1cccnc1 3-dimethylaminopyridine CN(c1cnccc1)C C, 68.82; H, 8.25; N, 22.93 122.0843983314 C7H10N2 122.1677

    N1(c2ccccc2)CCNCC1 phenylpiperazine N1(CCNCC1)c2ccccc2 C, 74.03; H, 8.70; N, 17.27 162.1156984598 C10H14N2 162.23156

The complete script is available here Chem_Props_Mac.scpt.

UNIX rears its head again

The problem is SMILES often arrive as UNIX files, and there are two different line ending conventions in Mac OS X: Mac-style (lines end with return: "\r" or ASCII character 13) and Unix-style (lines end with line-feed: "\n" or ASCII character 10), so if we try to read a Unix file available here temp_unix.txt.zip (control click on the link and choose "Download linked file ....".) we have a problem. As you can see the entire text has been read in as a single value.

The next tutorial will deal with this type of issue

Download a PDF version of this article with full syntax colouring