batch_findmite.pl |
This documentation refers to batch_findmite version $Rev: 566 $
batch_findmite.pl -i InDir -o OutDir -c ConfigFile [-gff]
-i, --indir # Directory of fasta files to process -o, --outdir # Path to the base output directory -c, --config # Path to the config file --gff # Produce output in GFF format
The batch_findmite program will do a FINDMITE analysis for each parameter set in your configuration file for each query sequence in your input directory. The results from FINDMITE have a VERY high false positive rate so you will need to further evaluate your results to find the true MITEs in your query sequence.
Path of the input directory. This is a directory that contains the fasta files to process.
Path of the base output directory.
Path to the configuration file that includes the parameter sets for running the FINDMITE program. These parameters represent the answers to the series of questions that must be answered when running FINDMITE. Any lines starting with # are ignored.
EXAMPLE
#------------------------------------------------------------------ # Name Rep TIR Mis AT GC ATTA 2Base Min Max #------------------------------------------------------------------ TA_11 TA 11 1 y y y 85 30 700 TA_12 TA 12 1 y y y 85 30 700
For a detail description, see the Paramters File heading under the CONFIGURATION AND ENVIRONMENT section of the full program documentation.
Create a fasta file of all predicted MITEs. A different fasta file will be created for each of the parameter set names. This will currently append to the existing fasta data file in the output directory.
Produce a GFF output file that indicates where the predicted MITEs are on the query sequence.
Run the program with minimal output. Does not require user interaction.
Run the program with maximal output.
Short overview of how to use program from command line.
Show program usage with summary of options.
Show program version.
Show the full program manual. This uses the perldoc command to print the POD documentation for the program.
Error messages generated by this program and possible solutions are listed below.
The input directory does not contain fasta files in the expected format. This could happen because you gave an incorrect path or because your sequence files do not have the expected *.fasta extension in the file name.
The output directory could not be created at the path you specified. This could be do to the fact that the directory that you are trying to place your base directory in does not exist, or because you do not have write permission to the directory you want to place your file in.
The path to the configuration file is indicated at the command line with -c or --config.
This file is a space delimited text file that indicates the parameters to use when running the findmite program. These parameters represent the answers to the series of questions that must be answered when running FINDMITE.
EXAMPLE
#------------------------------------------------------------------ # Name Rep TIR Mis AT GC ATTA 2Base Min Max #------------------------------------------------------------------ TA_11 TA 11 1 y y y 85 30 700 TA_12 TA 12 1 y y y 85 30 700
The columns above represent the following information:
Base name to assign to putative mites
Direct Repeat
Length of the Terminal Inverted Repeat (TIR)
Number of mismatches
Boolean to fileter the A/T. This must be set to y or n.
Boolean to Filter C/G This must be set to y or n.
Boolean to filter AT/TA This must be set to y or n.
Proporiton of 2Base to filter. This must be an integer between 0 and 100.
Minimum distance between TIRs This must be an integer.
Maximum distance between TIRs This must be an integer.
The batch_findmite.pl program is dependent on the FINDMITE program. A version of FINDMITE compiled for RedHat Linux is available at: http://jaketu.biochem.vt.edu/dl_software.htm
This module is required to copy the BLAST results.
This module is required to accept options at the command line.
If you find a bug with this software, file a bug report on the DAWG-PAWS Sourceforge website: http://sourceforge.net/tracker/?group_id=204962
The config file must have UNIX formatted line endings. Because of this any config files that have been edited in programs such as MS Word must be converted to a UNIX compatible text format before being used with batch_blast.
The batch_blast.pl program is part of the DAWG-PAWS package of genome annotation programs. See the DAWG-PAWS web page ( http://dawgpaws.sourceforge.net/ ) or the Sourceforge project page ( http://sourceforge.net/projects/dawgpaws ) for additional information about this package.
A manuscript is being submitted describing the DAWGPAWS program. Until this manuscript is published, please refer to the DAWGPAWS SourceForge website when describing your use of this program:
JC Estill and JL Bennetzen. 2009. The DAWGPAWS Pipeline for the Annotation of Genes and Transposable Elements in Plant Genomes. http://dawgpaws.sourceforge.net/
GNU GENERAL PUBLIC LICENSE, VERSION 3
http://www.gnu.org/licenses/gpl.html
THIS SOFTWARE COMES AS IS, WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. USE AT YOUR OWN RISK.
James C. Estill <JamesEstill at gmail.com>
STARTED: 08/30/2007
UPDATED: 03/24/2009
VERSION: $Rev: 566 $
batch_findmite.pl |