batch_mask.pl


NAME

batch_mask.pl - Run RepeatMasker and parse results to a GFF format file.


VERSION

This documentation refers to program version $Rev: 323 $


SYNOPSIS

Usage

    batch_mask.pl -i DirToProcess -o OutDir -c ConfigFile

Required Arguments

    -i, --indir    # Directory of fasta files to process
    -o, --outdir   # Path to the base output directory
    -c, --config   # Path to the config file


DESCRIPTION

Runs the RepeatMasker program for a set of input FASTA files against a set of repeat library files & then converts the repeat masker *.out file into the GFF format and then to the game XML format for visualization by the Apollo genome anotation program.


REQUIRED ARGUMENTS

-i,--indir
Path of the directory containing the sequences to process.

-o,--outdir
Path of the directory to place the program output.

-c, --config
Configuration file that lists the database names and paths of the fasta files to use as masking databases.


OPTIONS

-p,--num-proc
The number of processors to use for RepeatMasker. Default is one.

--engine
The repeatmasker engine to use: [crossmatch|wublast|decypher]. The default is to use the crossmatch engine.

--apollo
Use the apollo program to convert the file from gff to game xml. The default is not to use apollo.

--rm-path
The full path to the RepeatMasker binary.

--logfile
Path to a file that will be used to log program status. If the file already exists, additional information will be concatenated to the existing file.

-q,--quiet
Run the program with minimal output.

--test
Run the program without doing the system commands.

--usage
Short overview of how to use program from command line.

--help
Show program usage with summary of options.

--version
Show program version.

--man
Show the full program manual. This uses the perldoc command to print the POD documentation for the program.


DIAGNOSTICS

Error messages generated by this program and possible solutions are listed below.

ERROR: No fasta files were found in the input directory
The input directory does not contain fasta files in the expected format. This could happen because you gave an incorrect path or because your sequence files do not have the expected *.fasta extension in the file name.

ERROR: Could not create the output directory
The output directory could not be created at the path you specified. This could be do to the fact that the directory that you are trying to place your base directory in does not exist, or because you do not have write permission to the directory you want to place your file in.


CONFIGURATION AND ENVIRONMENT

Configuration File

The major configuration file for this program is the list of datbases indicated by the -c flag.

This file is a tab delimited text file. Lines beginning with # are ignored.

EXAMPLE

  #-------------------------------------------------------------
  # DBNAME       DB_PATH
  #-------------------------------------------------------------
  TREP_9         /db/repeats/Trep9.nr.fasta
  TIGR_Trit      /db/repeats/TIGR_Triticum_GSS_Repeats.v2.fasta
  # END

The columns above represent the following

Col. 1
The name of the repeat library This will be used to name the output files from the analysis and to name the data tracks that will be used by Apollo.

Col. 2
The path to the fasta format file containing the repeats.


DEPENDENCIES

Required Software

Required Perl Modules


BUGS AND LIMITATIONS

Bugs

Limitations


SEE ALSO

The batch_mask.pl program is part of the DAWG-PAWS package of genome annotation programs. See the DAWG-PAWS web page ( http://dawgpaws.sourceforge.net/ ) or the Sourceforge project page ( http://sourceforge.net/projects/dawgpaws ) for additional information about this package.


LICENSE

GNU GENERAL PUBLIC LICENSE, VERSION 3

http://www.gnu.org/licenses/gpl.html


AUTHOR

James C. Estill <JamesEstill at gmail.com>


HISTORY

STARTED: 04/10/2006

UPDATED: 12/10/2007

VERSION: $Rev: 323 $

 batch_mask.pl