cnv_genemark2gff.pl |
cnv_genemark2gff.pl - Convert genemark output to gff format
This documentation refers to program version $Rev: 592 $
cnv_genemark2gff.pl -i infile.genemark -o outfile.gff
--infile # Path to the input file to translate # If not provided, assumes input from STDIN --outfile # Path to the output gff file # If not provided, writes output to STDOUT --seqname # The id of the sequence analyzed
Converts the output from the genemark.hmm program to the gff format. This has been tested to work with gmhmme2 and gmhmme3. All exons are currently tagged as 'exon' in the gff output file. This is for compatibility with the Apollo genome annotation curation program.
Path to the genemark file to translate to gff. If an infile is not specified, then the program will expect input from standard input.
Path to the gff output file. If an outfile is not specified, the progrm will write the gff file to standard output.
This is the value listed as the source sequence in the gff output file. While not a specifically required variable, the default value for this in unknown. This will generally be set to the BAC ID or contig ID.
This is the source program name used in the gff output file. By default this is set to be GeneMarkHMM. This option allows you to set the source program to any value that you would want.
Short overview of how to use program from command line.
Show program usage with summary of options.
Show program version.
Show the full program manual. This uses the perldoc command to print the POD documentation for the program.
Run the program with maximum output.
Run the program without doing the system commands.
The typical use of this program will be to parse a file produce from the genemark.hmm program.
cnv_genemark2gff.pl -i HEX2493A05_genemark_hv.out --seqname HEX2493A05 -o HEX2493A05_genemark_hv.gff
This will produce a gff output file similar to the following:
HEX2493A05 GeneMarkHMM exon 683 1393 . + . RNA0001 HEX2493A05 GeneMarkHMM exon 1736 2084 . + . RNA0001 HEX2493A05 GeneMarkHMM exon 2195 2515 . + . RNA0001 HEX2493A05 GeneMarkHMM exon 2696 2803 . + . RNA0001 HEX2493A05 GeneMarkHMM exon 2918 3035 . + . RNA0001 HEX2493A05 GeneMarkHMM exon 3058 3131 . + . RNA0001 HEX2493A05 GeneMarkHMM exon 3219 3502 . + . RNA0001 HEX2493A05 GeneMarkHMM exon 3552 3559 . + . RNA0002 HEX2493A05 GeneMarkHMM exon 3711 3801 . + . RNA0002 HEX2493A05 GeneMarkHMM exon 3947 4711 . + . RNA0002 ...
The --seqname option used above allows you to specify the value written in the first column of the gff file. If the --seqname was not specified like the following:
cnv_genemark2gff.pl -i HEX2493A05_genemark_hv.out -o HEX2493A05_genemark_hv.gff
The gff output would be similar to the following:
unknown_src GeneMarkHMM exon 683 1393 . + . RNA0001 unknown_src GeneMarkHMM exon 1736 2084 . + . RNA0001 unknown_src GeneMarkHMM exon 2195 2515 . + . RNA0001 unknown_src GeneMarkHMM exon 2696 2803 . + . RNA0001 unknown_src GeneMarkHMM exon 2918 3035 . + . RNA0001 unknown_src GeneMarkHMM exon 3058 3131 . + . RNA0001 unknown_src GeneMarkHMM exon 3219 3502 . + . RNA0001 unknown_src GeneMarkHMM exon 3552 3559 . + . RNA0002 unknown_src GeneMarkHMM exon 3711 3801 . + . RNA0002 unknown_src GeneMarkHMM exon 3947 4711 . + . RNA0002 ...
It is also possible to designate the second column of the gff output file using the --program option. This can be used to specify the training data use for gene predictions. This will allow you to later separate gene models for different training data sets. For example if I used the wheat training matrix, I may do the following:
cnv_genemark2gff.pl -i HEX2493A05_genemark_hv.out --seqname HEX2493A05 -o HEX2493A05_genemark_hv.gff --program GeneMark:wheat
This will produce output similar to the following:
HEX2493A05 GeneMark:wheat exon 683 1393 . + . RNA0001 HEX2493A05 GeneMark:wheat exon 1736 2084 . + . RNA0001 HEX2493A05 GeneMark:wheat exon 2195 2515 . + . RNA0001 HEX2493A05 GeneMark:wheat exon 2696 2803 . + . RNA0001 HEX2493A05 GeneMark:wheat exon 2918 3035 . + . RNA0001 HEX2493A05 GeneMark:wheat exon 3058 3131 . + . RNA0001 HEX2493A05 GeneMark:wheat exon 3219 3502 . + . RNA0001 HEX2493A05 GeneMark:wheat exon 3552 3559 . + . RNA0002 HEX2493A05 GeneMark:wheat exon 3711 3801 . + . RNA0002 HEX2493A05 GeneMark:wheat exon 3947 4711 . + . RNA0002 ...
The error messages that can be generated will be listed here.
This program does not make use of a configuration file or any variables set in the user environment.
This program parses output from the genemark.hmm program. This program is available from http://opal.biology.gatech.edu/GeneMark/
This module is part of bioperl. Information on installing biperl is available from: http://bioperl.open-bio.org/wiki/Installing_BioPerl
If you find a bug with this software, file a bug report on the DAWG-PAWS Sourceforge website: http://sourceforge.net/tracker/?group_id=204962
This program is known to work with output produced from gmhmme2 and gmhmme3.
The program is part of the DAWG-PAWS package of genome annotation programs. See the DAWG-PAWS web page ( http://dawgpaws.sourceforge.net/ ) or the Sourceforge project page ( http://sourceforge.net/projects/dawgpaws ) for additional information about this package.
A manuscript is being submitted describing the DAWGPAWS program. Until this manuscript is published, please refer to the DAWGPAWS SourceForge website when describing your use of this program:
JC Estill and JL Bennetzen. 2009. The DAWGPAWS Pipeline for the Annotation of Genes and Transposable Elements in Plant Genomes. http://dawgpaws.sourceforge.net/
GNU General Public License, Version 3
http://www.gnu.org/licenses/gpl.html
THIS SOFTWARE COMES AS IS, WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. USE AT YOUR OWN RISK.
James C. Estill <JamesEstill at gmail.com>
STARTED: 10/30/2007
UPDATED: 03/24/2009
VERSION: $Rev: 592 $
cnv_genemark2gff.pl |