Applescript Tutorial 3
Author: Chris Swain
Website: http://www.macinchem.org
Reading, Writing and using Lists
The following Applescript uses Chemdraw to calculate to calculate a variety of molecular properties and then stores them as individual values. These can then be used as demonstrated rather trivially by the display dialog command. The script can be downloaded here.
tell application "CS ChemDraw Ultra"
set the_SMILES to SMILES of selection
set Elem_Anal to Elemental Analysis of selection
set Exact_mass to Exact Mass of selection
set Mol_Form to Molecular Formula of selection
set Mol_weight to Molecular Weight of selection
set Chem_props to "SMILES " & the_SMILES & return & "Chem Analysis " \
& Elem_Anal & return & "Molecular Formula " \
& Mol_Form & return & "Molecular Weight " \
& Mol_Form & return & "Molecular Weight " & Mol_weight
display dialog Chem_props
end tell
This is fine if all you have to do is calculate the properties for a single molecule but what if you want to perform the calculation of a list of structures. Suppose you have a file containing a series of structures in SMILES format, the file should look like this, a tab deliminated list with SMILES string followed by compound name.
-
c1ccccc1 benzene
Ic1ccccc1 iodobenzene
O=C1CCCCC1 cyclohexanone
NC1CCCCC1 cyclohexamine
CN(C)c1cccnc1 3-dimethylaminopyridine
N1(c2ccccc2)CCNCC1 phenylpiperazine
You can download the file here temp_mac.txt.zip control click on the link and choose "Download linked file ....". What we need to do now is have the user choose a file, read the contents and then store the data in a list. Lists are just a group of values stuck between {} for example {1,2,3} or {1,"b","hello",{1,3,5}}. As you can see you can mix types, and even have a list within a list. So in the script below we first define the list we will read the molecules into, then get the user to choose a file, read the contents of the file into theData.
set mol_list to {}
set theData to ""
set theFile to (choose file with prompt "Select the file:" of type {"TEXT"}) as alias
open for access theFile
set theData to read theFile using delimiter return
close access
If you copy and paste the above text into Script Editor, compile select "Event Log" and click "Run" you can choose the temp_mac.txt file and you should see a result as shown below. Each of the lines is read as a value into the list:-
{"c1ccccc1 benzene", "Ic1ccccc1 iodobenzene", "O=C1CCCCC1 cyclohexanone", "NC1CCCCC1 cyclohexamine", "CN(C)c1cccnc1 3-dimethylaminopyridine", "N1(c2ccccc2)CCNCC1 phenylpiperazine"}

Having read the file we will of course want to write out the results at some point so this seems a good time to think about the the file we will be saving to. We do this with the help of a simple sub-routine, we want to save the results in the same folder as the file we read in. We pass "theFile" to the sub-routine which returns the folder in which it resides. It is a simple task to append the output file name.
set the_file_path to GetParentPath(theFile) set theSaveFile to the_file_path & "test2.smi" on GetParentPath(theFile) tell application "Finder" to return container of theFile as text end GetParentPath
So now we have all the data into a list we can begin to manipulate it, first we need to get the SMILES strings. At the moment the first item in the list is "c1ccccc1 benzene" we need to seperate the two terms. First change the text delimiter to "tab" then a simple repeat loop selects each item in theData and copies it to the end of a new list called "mol_list". Remember to change the delimiter back!
set text item delimiters to tab repeat with i from 1 to count of theData set theLine to text items of item i of theData copy theLine to the end of mol_list end repeat set text item delimiters to ""
The result is a list of lists:-
{{"c1ccccc1", "benzene"}, {"Ic1ccccc1", "iodobenzene"}, {"O=C1CCCCC1", "cyclohexanone"}, {"NC1CCCCC1", "cyclohexamine"}, {"CN(C)c1cccnc1", "3-dimethylaminopyridine"}, {"N1(c2ccccc2)CCNCC1", "phenylpiperazine"}}
We can select both the "SMILES" and "name" of each item of "mol_list" and use "ChemDraw to calculate the properties.
set the_compound to item i of mol_list set the_SMILES to item 1 of the_compound set the_name to item 2 of the_compound --display dialog the_SMILES --display dialog the_name set the clipboard to the_SMILES
However getting ChemDraw to create the chemical structure from the SMILES string is not straight-forward, there is not a "Paste SMILES" command in the Applescript dictionary. So we script the menus to paste the SMILES. The rest of the ChemDraw commands you have seen before. We then combine all the different data items for a single compound into a list "mol_props_list" and then add them to the end of "all_mol_list"
tell application "CS ChemDraw Ultra" activate if enabled of menu item "Paste" then do menu item "SMILES" of menu "Paste Special" of menu "Edit" set the_CD_SMILES to SMILES of selection set Elem_Anal to Elemental Analysis of selection set Exact_mass to Exact Mass of selection set Mol_Form to Molecular Formula of selection set Mol_weight to Molecular Weight of selection copy the_SMILES to the end of mol_props_list copy the_name to the end of mol_props_list copy the_CD_SMILES to the end of mol_props_list copy Elem_Anal to the end of mol_props_list copy Exact_mass to the end of mol_props_list copy Mol_Form to the end of mol_props_list copy Mol_weight to the end of mol_props_list if enabled of menu item "Paste" then do menu item "Clear" of menu "Edit" --display dialog (item 3 of mol_props_list) end tell copy mol_props_list to the end of all_mols_list
It only remains to convert the list to tab delimited text and then save the result. The repeat loop does the conversion and the sub-routine adds each line to the file. It is probably worth mentioning that having regularly used snippets of code as sub-routines certainly helps the cut and paste school of programming!
repeat with i from 1 to num_compounds set mol_list to item i of all_mols_list -- convert list to text set old_delim to AppleScript's text item delimiters set AppleScript's text item delimiters to tab set mol_list to mol_list as text --set mol_list to mol_list & "\n" needs UNIX line endings set mol_list to mol_list & " " set AppleScript's text item delimiters to old_delim my write_to_file(mol_list, theSaveFile, true) end repeat on write_to_file(this_data, target_file, append_data) try set the target_file to the target_file as text set the open_target_file to N open for access file target_file with write permission if append_data is false then N set eof of the open_target_file to 0 write this_data to the open_target_file starting at eof close access the open_target_file return true on error try close access file target_file end try return false end try end write_to_file
The result should look something like this:-
-
c1ccccc1 benzene c1ccccc1 C, 92.26; H, 7.74 78.0469501926 C6H6 78.11184
Ic1ccccc1 iodobenzene Ic1ccccc1 C, 35.32; H, 2.47; I, 62.21 203.9435931605 C6H5I 204.00837
O=C1CCCCC1 cyclohexanone O=C1CCCCC1 C, 73.43; H, 10.27; O, 16.30 98.0731649431 C6H10O 98.143
NC1CCCCC1 cyclohexamine NC1CCCCC1 C, 72.66; H, 13.21; N, 14.12 99.1047994225 C6H13N 99.17412
CN(C)c1cccnc1 3-dimethylaminopyridine CN(c1cnccc1)C C, 68.82; H, 8.25; N, 22.93 122.0843983314 C7H10N2 122.1677
N1(c2ccccc2)CCNCC1 phenylpiperazine N1(CCNCC1)c2ccccc2 C, 74.03; H, 8.70; N, 17.27 162.1156984598 C10H14N2 162.23156
c1ccccc1 benzene c1ccccc1 C, 92.26; H, 7.74 78.0469501926 C6H6 78.11184
Ic1ccccc1 iodobenzene Ic1ccccc1 C, 35.32; H, 2.47; I, 62.21 203.9435931605 C6H5I 204.00837
O=C1CCCCC1 cyclohexanone O=C1CCCCC1 C, 73.43; H, 10.27; O, 16.30 98.0731649431 C6H10O 98.143
NC1CCCCC1 cyclohexamine NC1CCCCC1 C, 72.66; H, 13.21; N, 14.12 99.1047994225 C6H13N 99.17412
CN(C)c1cccnc1 3-dimethylaminopyridine CN(c1cnccc1)C C, 68.82; H, 8.25; N, 22.93 122.0843983314 C7H10N2 122.1677
N1(c2ccccc2)CCNCC1 phenylpiperazine N1(CCNCC1)c2ccccc2 C, 74.03; H, 8.70; N, 17.27 162.1156984598 C10H14N2 162.23156
The complete script is available here Chem_Props_Mac.scpt.
UNIX rears its head again
The problem is SMILES often arrive as UNIX files, and there are two different line ending conventions in Mac OS X: Mac-style (lines end with return: "\r" or ASCII character 13) and Unix-style (lines end with line-feed: "\n" or ASCII character 10), so if we try to read a Unix file available here temp_unix.txt.zip (control click on the link and choose "Download linked file ....".) we have a problem. As you can see the entire text has been read in as a single value.

The next tutorial will deal with this type of issue
Download a PDF version of this article with full syntax colouring 


