batch_mask.pl - Run RepeatMasker and parse results to a GFF format file.
This documentation refers to program version $Rev: 323 $
batch_mask.pl -i DirToProcess -o OutDir -c ConfigFile
-i, --indir # Directory of fasta files to process
-o, --outdir # Path to the base output directory
-c, --config # Path to the config file
Runs the RepeatMasker program for a set of input
FASTA files against a set of repeat library files &
then converts the repeat masker *.out file into the
GFF format and then to the game XML format for
visualization by the Apollo genome anotation program.
- -i,--indir
-
Path of the directory containing the sequences to process.
- -o,--outdir
-
Path of the directory to place the program output.
- -c, --config
-
Configuration file that lists the database names and paths of the
fasta files to use as masking databases.
- -p,--num-proc
-
The number of processors to use for RepeatMasker. Default is one.
- --engine
-
The repeatmasker engine to use: [crossmatch|wublast|decypher].
The default is to use the crossmatch engine.
- --apollo
-
Use the apollo program to convert the file from gff to game xml.
The default is not to use apollo.
- --rm-path
-
The full path to the RepeatMasker binary.
- --logfile
-
Path to a file that will be used to log program status.
If the file already exists, additional information will be concatenated
to the existing file.
- -q,--quiet
-
Run the program with minimal output.
- --test
-
Run the program without doing the system commands.
- --usage
-
Short overview of how to use program from command line.
- --help
-
Show program usage with summary of options.
- --version
-
Show program version.
- --man
-
Show the full program manual. This uses the perldoc command to print the
POD documentation for the program.
Error messages generated by this program and possible solutions are listed
below.
- ERROR: No fasta files were found in the input directory
-
The input directory does not contain fasta files in the expected format.
This could happen because you gave an incorrect path or because your sequence
files do not have the expected *.fasta extension in the file name.
- ERROR: Could not create the output directory
-
The output directory could not be created at the path you specified.
This could be do to the fact that the directory that you are trying
to place your base directory in does not exist, or because you do not
have write permission to the directory you want to place your file in.
The major configuration file for this program is the list of
datbases indicated by the -c flag.
This file is a tab delimited text file.
Lines beginning with # are ignored.
EXAMPLE
#-------------------------------------------------------------
# DBNAME DB_PATH
#-------------------------------------------------------------
TREP_9 /db/repeats/Trep9.nr.fasta
TIGR_Trit /db/repeats/TIGR_Triticum_GSS_Repeats.v2.fasta
# END
The columns above represent the following
- Col. 1
-
The name of the repeat library
This will be used to name the output files from the analysis
and to name the data tracks that will be used by Apollo.
- Col. 2
-
The path to the fasta format file containing the repeats.
- File::Copy
This module is required to copy the BLAST results.
- Getopt::Long
This module is required to accept options at the command line.
- Limited RepeatMasker version testing
This program has been tested with RepeatMasker v 3.1.6
- No Env Options
Currently this program does not make use of variables in the user
environment. However, it would be useful to define program paths
and some common options in the environment.
The batch_mask.pl program is part of the DAWG-PAWS package of genome
annotation programs. See the DAWG-PAWS web page
( http://dawgpaws.sourceforge.net/ )
or the Sourceforge project page
( http://sourceforge.net/projects/dawgpaws )
for additional information about this package.
GNU GENERAL PUBLIC LICENSE, VERSION 3
http://www.gnu.org/licenses/gpl.html
James C. Estill <JamesEstill at gmail.com>
STARTED: 04/10/2006
UPDATED: 12/10/2007
VERSION: $Rev: 323 $