batch_blast.pl |
batch_blast.pl - Do NCBI-BLAST searches for a set of fasta files
This documentation refers to batch_blast version $Rev: 562 $
batch_blast.pl -i DirToProcess -o OutDir -d DbDir -c ConfigFile
-i, --indir # Directory of fasta files to process -d, --db-dir # Directory to hold program output -o, --outdir # Path to the base output directory -c, --config # Path to the config file
Given a directory of softmasked fasta files, this will BLAST the files against a the set of BLAST formatted databases specified in the configuration file.
All of the BLAST output files will be stored in a directory name for the intput file in a subdirectory named blast. (ie /home/myhome/infile/blast/infile_blastdb.blo).
Path of the directory containing the sequences to process.
Path of the directory to place the program output.
Path of the directory containing the blast formatted databases.
Path to the batch_blast config file. This is a tab delimited text file indicating the required information for each of the databases to blast against. Lines beginning with # are ignored.
Full path to the NCBI blastall program. Default is blastall.
Path to a file that will be used to log program status. If the file already exists, additional information will be concatenated to the existing file.
Short overview of how to use program from command line.
Show program usage with summary of options.
Show program version.
Show the full program manual. This uses the perldoc command to print the POD documentation for the program.
Run the program with maximum output.
Run the program with minimal output.
Run the program without doing the system commands.
Error messages generated by this program and possible solutions are listed below.
The input directory does not contain fasta files in the expected format. This could happen because you gave an incorrect path or because your sequence files do not have the expected *.fasta extension in the file name.
The output directory could not be created at the path you specified. This could be do to the fact that the directory that you are trying to place your base directory in does not exist, or because you do not have write permission to the directory you want to place your file in.
The location of the configuration file is indicated by the --config option at the command line. This is a tab delimited text file indicating required information for each of the databases to blast against. Lines beginning with # are ignored, and data are in six columns as shown below:
The blastall program to use. DAWG-PAWS will support blastn, tblastx, and blastx format.
This is the suffix which will be added to the end of your blast output file. You can use this option to set different extensions for different types of blast. For example *.bln for blastn output and *.blx for blastx output.
DAWG-PAWS supports output in the default pairise format (0), the tabular format (8), and the tabular format with comments (9). See the blastall documentation for a full list of all possible alignment output options.
The maximum e-value to include in the blast report
The name of the database that is being blasted against.
This is the place to indicate additional options in your BLAST command such as multiple processors (-a 2) or use lowercase filtering (-U). Options should be space separated. For a list of all options available in blast, type blastall --help at the comand line.
An example config file:
#-----------------------------+ # BLASTN: TIGR GIs | #-----------------------------+ blastn bln 8 1e-5 TaGI_10 -a 2 -U blastn bln 8 1e-5 AtGI_13 -a 2 -U blastn bln 8 1e-5 ZmGI_17 -a 2 -U #-----------------------------+ # TBLASTX: TIGR GIs | #-----------------------------+ tblastx blx 8 1e-5 TaGI_10 -a 2 -U tblastx blx 8 1e-5 AtGI_13 -a 2 -U tblastx blx 8 1e-5 ZmGI_17 -a 2 -U
The latest version of the NCBI blastall program can be downloaded from: ftp://ftp.ncbi.nih.gov/blast/executables/LATEST
This module is required to copy the BLAST results.
This module is required to accept options at the command line.
If you find a bug with this software, file a bug report on the DAWG-PAWS Sourceforge website: http://sourceforge.net/tracker/?group_id=204962
The current version is limited to using the NCBI version of BLAST.
The config file must have UNIX formatted line endings. Because of this any config files that have been edited in programs such as MS Word must be converted to a UNIX compatible text format before being used with batch_blast.
The batch_blast.pl program is part of the DAWG-PAWS package of genome annotation programs. See the DAWG-PAWS web page ( http://dawgpaws.sourceforge.net/ ) or the Sourceforge project page ( http://sourceforge.net/projects/dawgpaws ) for additional information about this package.
A manuscript is being submitted describing the DAWGPAWS program. Until this manuscript is published, please refer to the DAWGPAWS SourceForge website when describing your use of this program:
JC Estill and JL Bennetzen. 2009. The DAWGPAWS Pipeline for the Annotation of Genes and Transposable Elements in Plant Genomes. http://dawgpaws.sourceforge.net/
GNU GENERAL PUBLIC LICENSE, VERSION 3
http://www.gnu.org/licenses/gpl.html
THIS SOFTWARE COMES AS IS, WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. USE AT YOUR OWN RISK.
James C. Estill <JamesEstill at gmail.com>
STARTED: 07/23/2007
UPDATED: 03/24/2009
VERSION: $Rev: 562 $
batch_blast.pl |