PERL
NCSAstore.pl
Description:
This is a script to put on the end of your NCSA batch files, to do parallel uploads to mass storage at the end of your runs. It should be possible to adjust this script to work with other supercomputers too.
Sample usage is
# Designate storage directory on MSS
set STORAGE_DIR="Runs/Flash25/bhl/exe01"
# Make the file list
\ls -1 --hide-control-chars *hdf* *log flash.dat > files.txt
Author's Email:
rge21@pas.rochester.edu
Author's Full Name:
Richard Edgar
Author's Homepage:
http://www.pas.rochester.edu/~rge21/computing/programs/ncsastore/index.shtml
Script File:
NCSAstore.pl (2.8 KB)
EnTuned.pl
Description:
Entuned.pl reads a newline delimited list of Ensembl numbers from a text file, accesses the ensembl.org website to find the corresponding Entrez and UniGene numbers. the output of the program is a comma delimited text file containing 7 fields (ensembl.org URL for the specific ensemble number, the ensemble number, the URL for the Entrez number, the Entrez number, the URL for the UniGene number, the UniGene number, the description of the gene from the ensembl.org web page). This program converts Enemble numbers to Entrez and UniGene numbers.
Author's Email:
paul_a_wilson@mac.com
Author's Full Name:
Paul A. Wilson, Ph.D., if a title must be used, author prefers motorcyclist over doctor
Author's Homepage:
homepage.mac.com/paul_a_wilson
Script File:
EnTuned.pl.tar.gz (2.74 KB)
alpha.pl: translate Excel column headers into numerical rank
Description:
alpha.pl:
Translates alphabetical string of one or two letters into a numerical value using modulo-26 addition; use to find column order when parsing very wide Excel input.
Usage:
alpha.pl /excel column header/
alpha.pl AB
Script File:
alpha.pl.gz (571 bytes)
Dinucleotide shuffle with Altschul&Erickson Algorithm
Description:
This script is an implementation of the Altschul&Erickson algorithm for exact dinucleotide shuffling.
The following modules should be intalled: Graph, Bio::DB::Fasta, Bio::Seq, Bio::SeqIO. all of them are available at CPAN (http://search.cpan.org)
Author's Full Name:
Diego Mauricio Riaño-Pachón
Author's Homepage:
http://www.geocities.com/dmrp.geo
Script File:
dishuffleseq.pl.gz (3.84 KB)
combine/permute a list from a file or pipe
Description:
combo -[pc]
Perform combinatoric transformations on a list of elements
separated by newline. Input may be a filename, or '-' to
read from STDIN. Combinations/Permutations are written to
STDOUT, one per line, with elements separated by tab.
Options:
-p permute list; this is the default behavior
-c combine list; this parameter requires an integer value
for how many of the list elements should be included
in the combination
Author's Email:
allenday@ucla.edu
Author's Full Name:
Allen Day
Author's Homepage:
http://search.cpan.org/~allenday
Script File:
combo.gz (746 bytes)
extract_genes.pl
Description:
extract_genes.pl - extract genomic sequences from NCBI files using BioPerl. This script is a simple solution to the problem of
extracting genomic regions corresponding to genes. There are other solutions, this particular approach uses genomic sequence
files from NCBI and gene coordinates from Entrez Gene.
Author's Email:
osborne1@optonline.net
Author's Full Name:
Brian Osborne
Author's Homepage:
http://bioperl.org
Script File:
extract_genes.pl.zip (1.76 KB)
Updates NCBI Blast Databases (e.g. for cron job)
Description:
fetch_ncbi_db.pl is a script I wrote to automatically update the blast databases from NCBI. We regularly need to make sure the databases are up to date, so we set up a weekly cron job to download them. It does not check if the files have been updated before downloading them, since the databases we use are updated very regularly.
Author's Full Name:
Alexander Richter
Script File:
fetch_ncbi_db.pl.gz (1 KB)
Shanon-Weiner Calculator
Description:
Give this script a data matrix in a comma separated document of abundances of different species (each species is a column, each plot is a row), and it will output a file which will not only have your original data, but also an additional column for total abundance of all species, and the Shanon-Weiner Diversity Index.
Script File:
shanon.pl.gz (228 bytes)
dp.pl
Description:
This perl script reads a text file containing a list of PDB ID's. The script downloads the fasta formatted sequence file corresponding to the PDB ID's in the test file from the RCSB website. The domain information corresponding to the PDB ID's is downloaded from the NCBI MMDB website. PSIPRED and the domain prediction program PPRODO is run on each PDB ID. The results of the domain prediction and the domains information from MMDB are written to a text file. This script was written for testing domain prediction software. It may serve as a good example for writing similar scripts. PSIPRED, PPRODO, and a network connection is required be this script. PPRODO can be found at http://gene.kias.re.kr/~jlee/pprodo/
Author's Email:
paul_a_wilson@mac.com
Author's Full Name:
Paul A. Wilson
Author's Homepage:
http://homepage.mac.com/paul_a_wilson/
Script File:
dp.pl.tar.gz (4.55 KB)


