batch_hmmer.pl


NAME

batch_hmmer.pl - Run hmmsearch in batch mode


VERSION

This documentation refers to program version $Rev: 561 $


SYNOPSIS

Usage

    batch_hmmer.pl -i InDir -o OutDir -c Config.cfg [--gff]

Required Arguments

    --indir         # Path to the input dir containing fasta files
    --outdir        # Path to the base output directory
    --config        # Configuration file for running hmmer
    --gff           # Path to gff output file


DESCRIPTION

Given a config file this will run the hmmsearch program for each parameter set in the configuration file for each fasta sequence file in the input directory. This will also convert the results to GFF format if requested.


REQUIRED ARGUMENTS

-i,--indir

Path of the input directory. This is the directory that contains the

-o,--outdir

Path of the base output directory.

-c,--config

Path of the configuration file specifying the parameters used to run the hmmer program. This config file includes (1) the name of the parameter set, (2) the directory of the hmm models, and (3) command line arguments for hmmsearch.


OPTIONS

--gff

Produce a GFF output file for each of the query sequence for each of the parameter sets in the input file.

-q,--quiet

Run the program with minimal output.

--verbose

Run the program in verbose mode.

--version

Show program version.

--usage

Short overview of how to use program from command line.

--help

Show program usage with summary of options.

--man

Show the full program manual. This uses the perldoc command to print the POD documentation for the program.


DIAGNOSTICS

Error messages generated by this program and possible solutions are listed below.

ERROR: No fasta files were found in the input directory

The input directory does not contain fasta files in the expected format. This could happen because you gave an incorrect path or because your sequence files do not have the expected *.fasta extension in the file name.

ERROR: Could not create the output directory

The output directory could not be created at the path you specified. This could be do to the fact that the directory that you are trying to place your base directory in does not exist, or because you do not have write permission to the directory you want to place your file in.


CONFIGURATION AND ENVIRONMENT

Configuration File

The location of the configuration file is indicated by the --config option at the command line. This is a tab delimited text file. The columns of data represent the following information

col 1.

Name of the parameter set

col 2. Dir of the hmmer models

All hmm profile models in this directory will be searched using the hmmer arguments in col 3.

col 3.Arguments for hmmer

The arguments for hmmer should be separate by spaces and can include the following arguments:

-A <n>

sets alignment output limit to <n> best domain alignments

-E <x>

sets E value cutoff (globE) to <= x

-T <x>

sets T bit threshold (globT) to >= x

-Z <n>

sets Z (# seqs) for E-value calculation

--compat

make best effort to use last version's output style

--cpu <n>

run <n> threads in parallel (if threaded)

--cut_ga

use Pfam GA gathering threshold cutoffs

--cut_nc

use Pfam NC noise threshold cutoffs

--cut_tc

use Pfam TC trusted threshold cutoffs

--domE <x>

sets domain Eval cutoff (2nd threshold) to <= x

--domT <x>

sets domain T bit thresh (2nd threshold) to >= x

--forward

use the full Forward() algorithm instead of Viterbi

--informat <s>

sequence file is in format <s>

--null2

turn OFF the post hoc second null model

--pvm

run on a Parallel Virtual Machine (PVM)

--xnu

turn ON XNU filtering of target protein sequences

An example that will do hmmsearch for four directories of hmm models using an evalue threshold of 0.00001 follows:

    rice_mite  /$HOME/HMMData/db/hmm/rice_mite_models/  -E 0.00001
    rice_mule  /$HOME/HMMData/db/hmm/rice_mule_models/  -E 0.00001
    tpase      /$HOME/HMMData/db/hmm/tpase_models/      -E 0.00001
    pfam       /$HOME/HMMData/db/hmm/pfam/              -E 0.00001


DEPENDENCIES

Required Software

Required Perl Modules


BUGS AND LIMITATIONS

Bugs

Limitations


SEE ALSO

The batch_blast.pl program is part of the DAWG-PAWS package of genome annotation programs. See the DAWG-PAWS web page ( http://dawgpaws.sourceforge.net/ ) or the Sourceforge project page ( http://sourceforge.net/projects/dawgpaws ) for additional information about this package.


REFERENCE

A manuscript is being submitted describing the DAWGPAWS program. Until this manuscript is published, please refer to the DAWGPAWS SourceForge website when describing your use of this program:

JC Estill and JL Bennetzen. 2009. The DAWGPAWS Pipeline for the Annotation of Genes and Transposable Elements in Plant Genomes. http://dawgpaws.sourceforge.net/


LICENSE

GNU GENERAL PUBLIC LICENSE, VERSION 3

http://www.gnu.org/licenses/gpl.html

THIS SOFTWARE COMES AS IS, WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. USE AT YOUR OWN RISK.


AUTHOR

James C. Estill <JamesEstill at gmail.com>


HISTORY

STARTED: 09/17/2007

UPDATED: 03/24/2009

VERSION: $Rev: 561 $

 batch_hmmer.pl