batch_ltrseq.pl |
batch_ltrseq.pl - Run the LTR_seq program in batch mode.
This documentation refers to program version $Rev: 559 $
batch_ltrseq.pl -i in_dir -o out_dir
-i # Intput directory of files to process # These must be in fasta file format -o # Base output directory
This program runs the LTR_seq program for each file in the input directory. An optional configuartion file can also be used to run the LTR_seq program for multiple LTR_seq parameter combinations. If no configuration file is provided, batch_ltrseq.pl will run LTR_seq using default parameters.
Path of the directory containing the sequences to process.
Path of the directory to place the program output.
Path to a configuration file. This config file will allow the batch_ltrseq.pl program to run the LTR_seq program for multiple parameter combinations for every fasta file in the input sequence directory. If no configuartion file is used, the LTR_seq program will be run with default parameters.
The configuration file will be a two column, tab delimited text file with the following options:
This is the name that will be used in the gff file to tag the configuration set. For example def for default options or old for an optional set to try to finder LTR retrotransposon insertions that are old.
Path to the LTR_Seq config file describing the parameter set to use. This file should follow the format used for LTR_Seq config files. If this value is set to DEF, then batch_ltrseq.p will run the LTR_seq program with default parameters.
An example configuration file is shown below:
def DEF long long_ltr.cfg old lod_ltr.cfg
This configuration file would run three sets of parameter files for every sequence in the input directory.
Run the batch_ltrseq.pl program in test mode. This will create the required directory structure, test for existence of files and other program functions. However, this will not actually run the LTR_seq program.
Short overview of how to use program from command line.
Show program usage with summary of options.
Show program version.
Show the full program manual. This uses the perldoc command to print the POD documentation for the program.
Run the program with minimal output.
The program could not create the directory needed to hold the output for the sequence file. It is possible that you not have the proper access to create a directory in the path that you specified. It is also possible that the directory that you are trying to create this subdirectory in does not exist.
This program can optionally make use of a configuration file. The path to configuration file is declared with the --config option at the command line. This config file will allow the batch_ltrseq.pl program to run the LTR_seq program for multiple parameter combinations for every fasta file in the input sequence directory. If no configuartion file is used, the LTR_seq program will be run with default parameters.
The configuration file will be a two column, tab delimited text file with the following options:
This is the name that will be used in the gff file to tag the configuration set. For example def for default options or old for an optional set to try to finder LTR retrotransposon insertions that are old.
Path to the LTR_Seq config file describing the parameter set to use. This file should follow the format used for LTR_Seq config files. If this value is set to DEF, then batch_ltrseq.p will run the LTR_seq program with default parameters.
An example configuration file is shown below:
def DEF long long_ltr.cfg old old_ltr.cfg
This configuration file would run three sets of LTR_seq configurations for every sequence in the input directory. These configurations would include (1) the default parameter set, (2) the parameter set name long that is specified by the LTR_seq config file long_ltr.cfg, and (3) the parameter set name old that is specified by the parameter set old_ltr.cfg.
Please see the LTR_seq documentation for specific information on the LTR_seq configuration files. The configuration files in LTR_seq allow you to specifiy the following parameters:
The starting positions of the 5' and 3' LTRs should be separated by a minimum of this distance in base pairs.
The starting positions of the 5' and 3' LTRs should be separated by a maximum of this distance in base pairs.
The 5' and 3' LTRs can individually span a maximum of this distance in base pairs.
The 5' and 3' LTRs can individually span a minimum of this distance in base pairs.
The 5' and 3' LTRs should contain an exact match of this length in base pairs.
Dynamic programming score for a match (alignment).
Dynamic programming score for a mismatch (alignment).
Dynamic programming score for a gap opening (alignment).
Dynamic programming score for a gap continuation (alignment).
Dynamic programming score for aligning an `N' with any other base (alignment).
Dynamic programming score threshold in percentage. This allows upto 15% difference in the scores of the optimal alignment vs. the best possible alignment given 100% identity.
Length of the target site repeat.
The LTR_seq documentation specifies that this parameter should not be changed and is for internal use only.
An example LTR_seq config file is shown below:
window 12 Dmin 600 Dmax 15000 LTRmax 2000 match 2 mismatch -5 gap -1 hgap -6 AlignmentWithN -5 MaxScoreRatioThreshold 15 LTRmin 100 TSR_len 6 LTRminExactMatch 30
This program does not make use of variables set in the user environment.
The following software is required for the batch_ltrseq.pl program to run properly.
The ltr_seq is the latest incarnation of the ltr_par program described by Kalyanaraman and Aluru J. Bioinform Comput Biol 4: 197-216. A binary of this program is available upon contacuting the program author: ananth <at> eecs.wsu.edu. The author's current homepage is: http://www.eecs.wsu.edu/~ananth/
This module is required to accept options at the command line.
If you find a bug with this software, file a bug report on the DAWG-PAWS Sourceforge website: http://sourceforge.net/tracker/?group_id=204962
The config file must have UNIX formatted line endings. Because of this any config files that have been edited in programs such as MS Word must be converted to a UNIX compatible text format before being used with batch_ltrseq.
A manuscript is being submitted describing the DAWGPAWS program. Until this manuscript is published, please refer to the DAWGPAWS SourceForge website when describing your use of this program:
JC Estill and JL Bennetzen. 2009. The DAWGPAWS Pipeline for the Annotation of Genes and Transposable Elements in Plant Genomes. http://dawgpaws.sourceforge.net/
GNU GENERAL PUBLIC LICENSE, VERSION 3
http://www.gnu.org/licenses/gpl.html
THIS SOFTWARE COMES AS IS, WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. USE AT YOUR OWN RISK.
James C. Estill <JamesEstill at gmail.com>
STARTED: 09/06/2007
UPDATED: 03/24/2009
VERSION: $Rev: 559 $
batch_ltrseq.pl |