batch_findmite.pl


VERSION

This documentation refers to batch_findmite version $Rev: 566 $


SYNOPSIS

Usage

    batch_findmite.pl -i InDir -o OutDir -c ConfigFile [-gff]

Required Arguments

    -i, --indir    # Directory of fasta files to process
    -o, --outdir   # Path to the base output directory
    -c, --config   # Path to the config file
    --gff          # Produce output in GFF format


DESCRIPTION

The batch_findmite program will do a FINDMITE analysis for each parameter set in your configuration file for each query sequence in your input directory. The results from FINDMITE have a VERY high false positive rate so you will need to further evaluate your results to find the true MITEs in your query sequence.


REQUIRED ARGUMENTS

-i,--indir

Path of the input directory. This is a directory that contains the fasta files to process.

-o,--outdir

Path of the base output directory.

-c,--config

Path to the configuration file that includes the parameter sets for running the FINDMITE program. These parameters represent the answers to the series of questions that must be answered when running FINDMITE. Any lines starting with # are ignored.

EXAMPLE

  #------------------------------------------------------------------
  # Name        Rep     TIR  Mis AT GC ATTA     2Base   Min     Max
  #------------------------------------------------------------------
  TA_11         TA      11   1   y  y  y        85      30      700
  TA_12         TA      12   1   y  y  y        85      30      700

For a detail description, see the Paramters File heading under the CONFIGURATION AND ENVIRONMENT section of the full program documentation.


OPTIONS

-f,--fasta

Create a fasta file of all predicted MITEs. A different fasta file will be created for each of the parameter set names. This will currently append to the existing fasta data file in the output directory.

-gff

Produce a GFF output file that indicates where the predicted MITEs are on the query sequence.

-q,--quiet

Run the program with minimal output. Does not require user interaction.

--verbose

Run the program with maximal output.

--usage

Short overview of how to use program from command line.

--help

Show program usage with summary of options.

--version

Show program version.

--man

Show the full program manual. This uses the perldoc command to print the POD documentation for the program.


DIAGNOSTICS

Error messages generated by this program and possible solutions are listed below.

ERROR: No fasta files were found in the input directory

The input directory does not contain fasta files in the expected format. This could happen because you gave an incorrect path or because your sequence files do not have the expected *.fasta extension in the file name.

ERROR: Could not create the output directory

The output directory could not be created at the path you specified. This could be do to the fact that the directory that you are trying to place your base directory in does not exist, or because you do not have write permission to the directory you want to place your file in.


CONFIGURATION AND ENVIRONMENT

Configuration File

The path to the configuration file is indicated at the command line with -c or --config.

This file is a space delimited text file that indicates the parameters to use when running the findmite program. These parameters represent the answers to the series of questions that must be answered when running FINDMITE.

EXAMPLE

  #------------------------------------------------------------------
  # Name        Rep     TIR  Mis AT GC ATTA     2Base   Min     Max
  #------------------------------------------------------------------
  TA_11         TA      11   1   y  y  y        85      30      700
  TA_12         TA      12   1   y  y  y        85      30      700

The columns above represent the following information:

Col. 1

Base name to assign to putative mites

Col. 2

Direct Repeat

Col. 3

Length of the Terminal Inverted Repeat (TIR)

Col. 4

Number of mismatches

Col. 5

Boolean to fileter the A/T. This must be set to y or n.

Col. 6

Boolean to Filter C/G This must be set to y or n.

Col. 7

Boolean to filter AT/TA This must be set to y or n.

Col. 8

Proporiton of 2Base to filter. This must be an integer between 0 and 100.

Col. 9

Minimum distance between TIRs This must be an integer.

Col. 10

Maximum distance between TIRs This must be an integer.


DEPENDENCIES

Required Software

Required Perl Modules


BUGS AND LIMITATIONS

Limitations


SEE ALSO

The batch_blast.pl program is part of the DAWG-PAWS package of genome annotation programs. See the DAWG-PAWS web page ( http://dawgpaws.sourceforge.net/ ) or the Sourceforge project page ( http://sourceforge.net/projects/dawgpaws ) for additional information about this package.


REFERENCE

A manuscript is being submitted describing the DAWGPAWS program. Until this manuscript is published, please refer to the DAWGPAWS SourceForge website when describing your use of this program:

JC Estill and JL Bennetzen. 2009. The DAWGPAWS Pipeline for the Annotation of Genes and Transposable Elements in Plant Genomes. http://dawgpaws.sourceforge.net/


LICENSE

GNU GENERAL PUBLIC LICENSE, VERSION 3

http://www.gnu.org/licenses/gpl.html

THIS SOFTWARE COMES AS IS, WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. USE AT YOUR OWN RISK.


AUTHOR

James C. Estill <JamesEstill at gmail.com>


HISTORY

STARTED: 08/30/2007

UPDATED: 03/24/2009

VERSION: $Rev: 566 $

 batch_findmite.pl