batch_blast.pl


NAME

batch_blast.pl - Do NCBI-BLAST searches for a set of fasta files


VERSION

This documentation refers to batch_blast version $Rev: 562 $


SYNOPSIS

Usage

    batch_blast.pl -i DirToProcess -o OutDir -d DbDir -c ConfigFile

Required Arguments

    -i, --indir    # Directory of fasta files to process
    -d, --db-dir   # Directory to hold program output
    -o, --outdir   # Path to the base output directory
    -c, --config   # Path to the config file


DESCRIPTION

Given a directory of softmasked fasta files, this will BLAST the files against a the set of BLAST formatted databases specified in the configuration file.

All of the BLAST output files will be stored in a directory name for the intput file in a subdirectory named blast. (ie /home/myhome/infile/blast/infile_blastdb.blo).


REQUIRED ARGUMENTS

-i,--indir

Path of the directory containing the sequences to process.

-o,--outdir

Path of the directory to place the program output.

-d,--db-dir

Path of the directory containing the blast formatted databases.

-c, --config

Path to the batch_blast config file. This is a tab delimited text file indicating the required information for each of the databases to blast against. Lines beginning with # are ignored.


OPTIONS

--blast-path

Full path to the NCBI blastall program. Default is blastall.

--logfile

Path to a file that will be used to log program status. If the file already exists, additional information will be concatenated to the existing file.

--usage

Short overview of how to use program from command line.

--help

Show program usage with summary of options.

--version

Show program version.

--man

Show the full program manual. This uses the perldoc command to print the POD documentation for the program.

--verbose

Run the program with maximum output.

-q,--quiet

Run the program with minimal output.

--test

Run the program without doing the system commands.


DIAGNOSTICS

Error messages generated by this program and possible solutions are listed below.

ERROR: No fasta files were found in the input directory

The input directory does not contain fasta files in the expected format. This could happen because you gave an incorrect path or because your sequence files do not have the expected *.fasta extension in the file name.

ERROR: Could not create the output directory

The output directory could not be created at the path you specified. This could be do to the fact that the directory that you are trying to place your base directory in does not exist, or because you do not have write permission to the directory you want to place your file in.


CONFIGURATION AND ENVIRONMENT

Configuration File

The location of the configuration file is indicated by the --config option at the command line. This is a tab delimited text file indicating required information for each of the databases to blast against. Lines beginning with # are ignored, and data are in six columns as shown below:

Col 1. Blast program to use [ tblastx | blastn | blastx ]

The blastall program to use. DAWG-PAWS will support blastn, tblastx, and blastx format.

Col 2. Extension to add to blast output file. (ie. bln )

This is the suffix which will be added to the end of your blast output file. You can use this option to set different extensions for different types of blast. For example *.bln for blastn output and *.blx for blastx output.

Col 3. Alignment output options (-m options from blast)

DAWG-PAWS supports output in the default pairise format (0), the tabular format (8), and the tabular format with comments (9). See the blastall documentation for a full list of all possible alignment output options.

Col 4. Evalue threshold

The maximum e-value to include in the blast report

Col 5. Database name

The name of the database that is being blasted against.

Col 6. Additional blast command line options

This is the place to indicate additional options in your BLAST command such as multiple processors (-a 2) or use lowercase filtering (-U). Options should be space separated. For a list of all options available in blast, type blastall --help at the comand line.

An example config file:

 #-----------------------------+
 # BLASTN: TIGR GIs            |
 #-----------------------------+
 blastn bln     8       1e-5    TaGI_10 -a 2 -U
 blastn bln     8       1e-5    AtGI_13 -a 2 -U
 blastn bln     8       1e-5    ZmGI_17 -a 2 -U
 #-----------------------------+
 # TBLASTX: TIGR GIs           |
 #-----------------------------+
 tblastx        blx     8       1e-5    TaGI_10 -a 2 -U
 tblastx        blx     8       1e-5    AtGI_13 -a 2 -U
 tblastx        blx     8       1e-5    ZmGI_17 -a 2 -U


DEPENDENCIES

Required Software

Required Perl Modules


BUGS AND LIMITATIONS

Bugs

Limitations


SEE ALSO

The batch_blast.pl program is part of the DAWG-PAWS package of genome annotation programs. See the DAWG-PAWS web page ( http://dawgpaws.sourceforge.net/ ) or the Sourceforge project page ( http://sourceforge.net/projects/dawgpaws ) for additional information about this package.


REFERENCE

A manuscript is being submitted describing the DAWGPAWS program. Until this manuscript is published, please refer to the DAWGPAWS SourceForge website when describing your use of this program:

JC Estill and JL Bennetzen. 2009. The DAWGPAWS Pipeline for the Annotation of Genes and Transposable Elements in Plant Genomes. http://dawgpaws.sourceforge.net/


LICENSE

GNU GENERAL PUBLIC LICENSE, VERSION 3

http://www.gnu.org/licenses/gpl.html

THIS SOFTWARE COMES AS IS, WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. USE AT YOUR OWN RISK.


AUTHOR

James C. Estill <JamesEstill at gmail.com>


HISTORY

STARTED: 07/23/2007

UPDATED: 03/24/2009

VERSION: $Rev: 562 $

 batch_blast.pl