batch_hmmer.pl |
batch_hmmer.pl - Run hmmsearch in batch mode
This documentation refers to program version $Rev: 561 $
batch_hmmer.pl -i InDir -o OutDir -c Config.cfg [--gff]
--indir # Path to the input dir containing fasta files --outdir # Path to the base output directory --config # Configuration file for running hmmer --gff # Path to gff output file
Given a config file this will run the hmmsearch program for each parameter set in the configuration file for each fasta sequence file in the input directory. This will also convert the results to GFF format if requested.
Path of the input directory. This is the directory that contains the
Path of the base output directory.
Path of the configuration file specifying the parameters used to run the hmmer program. This config file includes (1) the name of the parameter set, (2) the directory of the hmm models, and (3) command line arguments for hmmsearch.
Produce a GFF output file for each of the query sequence for each of the parameter sets in the input file.
Run the program with minimal output.
Run the program in verbose mode.
Show program version.
Short overview of how to use program from command line.
Show program usage with summary of options.
Show the full program manual. This uses the perldoc command to print the POD documentation for the program.
Error messages generated by this program and possible solutions are listed below.
The input directory does not contain fasta files in the expected format. This could happen because you gave an incorrect path or because your sequence files do not have the expected *.fasta extension in the file name.
The output directory could not be created at the path you specified. This could be do to the fact that the directory that you are trying to place your base directory in does not exist, or because you do not have write permission to the directory you want to place your file in.
The location of the configuration file is indicated by the --config option at the command line. This is a tab delimited text file. The columns of data represent the following information
Name of the parameter set
All hmm profile models in this directory will be searched using the hmmer arguments in col 3.
The arguments for hmmer should be separate by spaces and can include the following arguments:
sets alignment output limit to <n> best domain alignments
sets E value cutoff (globE) to <= x
sets T bit threshold (globT) to >= x
sets Z (# seqs) for E-value calculation
make best effort to use last version's output style
run <n> threads in parallel (if threaded)
use Pfam GA gathering threshold cutoffs
use Pfam NC noise threshold cutoffs
use Pfam TC trusted threshold cutoffs
sets domain Eval cutoff (2nd threshold) to <= x
sets domain T bit thresh (2nd threshold) to >= x
use the full Forward()
algorithm instead of Viterbi
sequence file is in format <s>
turn OFF the post hoc second null model
run on a Parallel Virtual Machine (PVM)
turn ON XNU filtering of target protein sequences
An example that will do hmmsearch for four directories of hmm models using an evalue threshold of 0.00001 follows:
rice_mite /$HOME/HMMData/db/hmm/rice_mite_models/ -E 0.00001 rice_mule /$HOME/HMMData/db/hmm/rice_mule_models/ -E 0.00001 tpase /$HOME/HMMData/db/hmm/tpase_models/ -E 0.00001 pfam /$HOME/HMMData/db/hmm/pfam/ -E 0.00001
The hmmer program is required: http://hmmer.janelia.org/. Specifically this program requires the hmmsearch program.
This module is required to copy the BLAST results.
This module is required to accept options at the command line.
This module allows for word wrapping and hanging indents of printed text. This allows for a more readable output for long strings
This module is part of the BioPerl package. This allows for parsing of the the output from the hmmsearch program.
If you find a bug with this software, file a bug report on the DAWG-PAWS Sourceforge website: http://sourceforge.net/tracker/?group_id=204962
The DAWG-PAWS package currently does not include programs for creating hmm profile models. These must be created externally using the hmmer package of programs.
The config file must have UNIX formatted line endings. Because of this any config files that have been edited in programs such as MS Word must be converted to a UNIX compatible text format before being used with batch_blast.
The batch_blast.pl program is part of the DAWG-PAWS package of genome annotation programs. See the DAWG-PAWS web page ( http://dawgpaws.sourceforge.net/ ) or the Sourceforge project page ( http://sourceforge.net/projects/dawgpaws ) for additional information about this package.
A manuscript is being submitted describing the DAWGPAWS program. Until this manuscript is published, please refer to the DAWGPAWS SourceForge website when describing your use of this program:
JC Estill and JL Bennetzen. 2009. The DAWGPAWS Pipeline for the Annotation of Genes and Transposable Elements in Plant Genomes. http://dawgpaws.sourceforge.net/
GNU GENERAL PUBLIC LICENSE, VERSION 3
http://www.gnu.org/licenses/gpl.html
THIS SOFTWARE COMES AS IS, WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. USE AT YOUR OWN RISK.
James C. Estill <JamesEstill at gmail.com>
STARTED: 09/17/2007
UPDATED: 03/24/2009
VERSION: $Rev: 561 $
batch_hmmer.pl |