batch_findgaps.pl |
batch_findgaps.pl - Annotate gaps in a fasta file
This documentation refers to batch_findgaps version $Rev: 581 $
batch_findgaps.pl -i DirToProcess -o OutDir
-i, --indir # Directory of fasta files to process -o, --outdir # Path to the base output directory
Runs the RepeatMasker program for a set of input FASTA files against a set of repeat library files & then converts the repeat masker *.out file into the GFF format and then to the game XML format for visualization by the Apollo genome anotation program.
Path of the directory containing the sequences to process.
Path of the directory to place the program output.
Path to a file that will be used to log program status. If the file already exists, additional information will be concatenated to the existing file.
The character that is treated as the gap character. B By default this is N. This option takes a single character as its argument.
The minimum gap length. This option takes a integer as its option.
Use this option to generate a game xml file of the output. This option requires that you have apollo on your local machine since the program uses apollo to translate from gff to game xml.
Specify the path to your local installation of apollo.
Short overview of how to use program from command line.
Show program usage with summary of options.
Show program version.
Show the full program manual. This uses the perldoc command to print the POD documentation for the program.
Run the program with minimal output.
Run the program without doing the system commands.
The input directory does not contain fasta files in the expected format. This could happen because you gave an incorrect path or because your sequence files do not have the expected *.fasta extension in the file name.
The output directory could not be created at the path you specified. This could be do to the fact that the directory that you are trying to place your base directory in does not exist, or because you do not have write permission to the directory you want to place your file in.
No configuration files or environmental variables are required to use this program.
This program requires the Apollo Genome Annotation Curation tool to convert the gff output to the game.xml format. This can be obtained at http://apollo.berkeleybop.org/current/index.html. While Apollo is used to convert to game.xml, the batch_findgaps.pl program can be used without apollo to generate gff foramt files.
This module is required to copy the BLAST results.
This module is required to accept options at the command line.
Early versions of this script would assign an incorrect end position the the 3' end of a gap.
If you find a bug with this software, file a bug report on the DAWG-PAWS Sourceforge website: http://sourceforge.net/tracker/?group_id=204962
Due to the way that regular expressions are coded in PERL, the characters that can be used to indicate gaps must be hard coded. The characters that are currently hard coded for recognition by batch_findgaps are n, N, x, and X. If there are additional characters you would like to add as a recognized gap character, file a Feature Request on the DAWG-PAWS poject page on Sourceforge ( http://sourceforge.net/tracker/?group_id=204962&atid=991722 ).
BLAST output file must currently end with blo, bln, or blx. For example a BLASTx output may be named BlastOut.blx while a BLASTN output may be names BlastOut.bln. FASTA files must end with a fasta or fa extension. For examples must have names like my_seq.fasta or my_seq.fa.
The batch_findgaps.pl program is part of the DAWG-PAWS package of genome annotation programs. See the DAWG-PAWS web page ( http://dawgpaws.sourceforge.net/ ) or the Sourceforge project page ( http://sourceforge.net/projects/dawgpaws ) for additional information about this package.
A manuscript is being submitted describing the DAWGPAWS program. Until this manuscript is published, please refer to the DAWGPAWS SourceForge website when describing your use of this program:
JC Estill and JL Bennetzen. 2009. The DAWGPAWS Pipeline for the Annotation of Genes and Transposable Elements in Plant Genomes. http://dawgpaws.sourceforge.net/
GNU GENERAL PUBLIC LICENSE, VERSION 3
http://www.gnu.org/licenses/gpl.html
James C. Estill <JamesEstill at gmail.com>
STARTED: 08/01/2007
UPDATED: 09/29/2008
VERSION: $Rev: 581 $
batch_findgaps.pl |