cnv_repmask2gff.pl |
batch_mask.pl - Convert RepeatMasker output to the gff format.
This documentation refers to program version $Rev: 600 $
cnv_repmask2gff.pl -i infile.out -o outfile.gff
-i # Path to the repeatmasker file to convert -o # Path to the gff output file
This program will convert the output from RepeatMasker to the standard gff file format.
Path to the intput file that contains the repeatmasker out file to convert to the gff format. If this option is not specified, the program will expect input from STDIN.
Path to the gff output file. If and outfile file path is not specified, the program will write the output to STDERR.
The parameter name to append to the source program name. This information will be appended to the second column of the gff output file. This is used to specify if you masked with the same database using a different parameter set or if you used a different database.
The program name to use. This is the data in the second column of the gff output file. Be default, this is set to 'repeatmasker'. This option allows you to specify other program names if desired.
Identifier for the sequence file that was masked with repeatmasker. The out file from repeatmasker may have truncated your original file name, and this option allows you to use the full sequence name.
Write all of the annotations to be in the positive (plus) strand. By default the program will interpret strand results reported as 'C' to be in the negative strand orientation. Setting the --plus flag will report all RepeatMasker hit results as occurring in the plus strand orientation.
Append the results to an existing gff file. This must be used in conjunctions with the --outfile option.
Run the program with minimal output.
Run the program without doing the system commands.
Short overview of how to use program from command line.
Show program usage with summary of options.
Show program version.
Show the full program manual. This uses the perldoc command to print the POD documentation for the program.
The typical use of this program will be to convert a repeat masker output file to the gff format.
cnv_repmask2gff.pl -i HEX3045G05_TREP.out -o HEX3045G05_TREP.gff
This will produce a GFF file similar to the following:
HEX3045G05 repeatmasker exon 469 493 25 - . AT_rich HEX3045G05 repeatmasker exon 716 754 25 + . AT_rich HEX3045G05 repeatmasker exon 1764 2069 469 + . TREP20 HEX3045G05 repeatmasker exon 1816 2105 507 + . TREP58 HEX3045G05 repeatmasker exon 1920 2248 450 + . TREP214 ...
It may be also be useful to run repeatmasker against a number of different databases. You would therefore want to specify the database used in your gff output file. This can be specified using the --param option, to specify the database in the parameter tag. For example, if you used the TREP database as your database for masking:
cnv_repmask2gff.pl -i rm_result.out -o rm_resout.gff --param TREP
This will append the parameter tag 'TREP' to the source column (col 2) of the gff output file and will produce a GFF file simlar to the following:
HEX3045G05 repeatmasker:TREP exon 469 493 25 - . AT_rich HEX3045G05 repeatmasker:TREP exon 716 754 25 + . AT_rich HEX3045G05 repeatmasker:TREP exon 1764 2069 469 + . TREP20 HEX3045G05 repeatmasker:TREP exon 1816 2105 507 + . TREP58 HEX3045G05 repeatmasker:TREP exon 1920 2248 450 + . TREP214 ...
It may also be useful for you to specify a different program source name depending on the needs of your individual pipeline. You can do this using the --program option. For example, to shorten the full name repeatmasker to 'RM', you could use the following command
cnv_repmask2gff.pl -i rm_result.out -o rm_result.gff --program RM
This will changed the source id in the second column of the output to RM and will result in a GFF output file similar to the following:
HEX3045G05 RM exon 469 493 25 - . AT_rich HEX3045G05 RM exon 716 754 25 + . AT_rich HEX3045G05 RM exon 1764 2069 469 + . TREP20 HEX3045G05 RM exon 1816 2105 507 + . TREP58 HEX3045G05 RM exon 1920 2248 450 + . TREP214 ...
This can also be used in conjunction with the param tag:
cnv_repmask2gff.pl -i result.out -o result.gff --program RM --param TREP
This will result in a GFF file similar to the following
HEX3045G05 RM:TREP exon 469 493 25 - . AT_rich HEX3045G05 RM:TREP exon 716 754 25 + . AT_rich HEX3045G05 RM:TREP exon 1764 2069 469 + . TREP20 HEX3045G05 RM:TREP exon 1816 2105 507 + . TREP58 ...
It is possible that the repeatmasker out file will truncate the name of your source sequence. You can restore this to the original name using the --name option. For example, if your full name was HEX3045G05_A001 you could specify this as:
cnv_repmask2gff.pl -i result.out -o result.gff --name HEX3045G05_A001
This will result in a GFF output file similar to the following:
HEX3045G05_A001 repeatmasker exon 469 493 25 - . AT_rich HEX3045G05_A001 repeatmasker exon 716 754 25 + . AT_rich HEX3045G05_A001 repeatmasker exon 1764 2069 469 + . TREP20 HEX3045G05_A001 repeatmasker exon 1816 2105 507 + . TREP58 HEX3045G05_A001 repeatmasker exon 1920 2248 450 + . TREP214 ...
The error messages that can be generated will be listed here.
You will see this message if you did not specify an input file with the -i or --input options.
This program does not make use of a configuration file or variables set in the user's environment.
This program is designed to parse output file produce by the RepeatMasker program. RepeatMaske is available for download from: http://www.repeatmasker.org/
This program does not make use of Perl modules outside of the normal suite of modules present in a typical installation of perl.
If you find a bug with this software, file a bug report on the DAWG-PAWS Sourceforge website: http://sourceforge.net/tracker/?group_id=204962
This program has been tested with RepeatMasker v 3.1.6. If you find that this program is not compatible with other versions of RepeatMasker, please file a BUG report an include an output file that is not able to be parsed.
A manuscript is being submitted describing the DAWGPAWS program. Until this manuscript is published, please refer to the DAWGPAWS SourceForge website when describing your use of this program:
JC Estill and JL Bennetzen. 2009. The DAWGPAWS Pipeline for the Annotation of Genes and Transposable Elements in Plant Genomes. http://dawgpaws.sourceforge.net/
GNU General Public License, Version 3
http://www.gnu.org/licenses/gpl.html
THIS SOFTWARE COMES AS IS, WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. USE AT YOUR OWN RISK.
James C. Estill <JamesEstill at gmail.com>
STARTED: 04/10/2006
UPDATED: 03/24/2009
VERSION: $Rev: 600 $
cnv_repmask2gff.pl |