| - Convert fgenesh gene predictions to gff format
This documentation refers to program version $Rev: 591 $ -i infile.txt -o outfile.gff
--infile # Path to fgenesh result to convert --outfie # Path to the gff format output
This program converts output from the fgenesh program to the gff format. If the fgenesh output file appears to be saved from the web, the program will attempt to first strip the HTML tags from the text before converting to the GFF format.
Path of the input file. This should a text file of the result of the fgenesh gene prediction program. If an input file is not specified, then the program will expect input from STDIN.
Path of the gff file that is produced by the program. If an output file is not specified, the program will write output to STDOUT.
Use this to convert the output from the softberry website if you saved the text in html format.
The label used to describe the parameter set used for the the annotation program. This identifier will be appended the source column (col 2) in the GFF output.
This is the name of the sequence that was annotated. This will be used in the source column (col 1) of the gff output file. By default, the program will use the name of the sequence as specified in the fgenesh output file.
Short overview of how to use program from command line.
Show program usage with summary of options.
Show program version.
Show the full program manual. This uses the perldoc command to print the POD documentation for the program.
Run the program with minimal output.
Typically you will be using this program to convert the fgenesh annotation output for an individual sequence file to the gff format. -i fgenesh_result.txt -o fgenesh_result.gff
This will result in a GFF result similar to the following:
HEX3045G05 fgenesh exon 961 1456 12.02 + . gene_1 HEX3045G05 fgenesh exon 1702 2725 0.04 + . gene_1 HEX3045G05 fgenesh exon 3619 3982 10.41 + . gene_1 HEX3045G05 fgenesh exon 6960 7273 13.70 + . gene_2 HEX3045G05 fgenesh exon 7435 7789 21.29 + . gene_2 HEX3045G05 fgenesh exon 7904 8091 1.14 + . gene_2 HEX3045G05 fgenesh exon 8248 9163 13.79 + . gene_2 HEX3045G05 fgenesh exon 9206 9587 8.00 + . gene_2 ...
Generally the program will use the label for the sequence as reported in the fgenesh report file. Otherwise, you can specify the source sequence name using the -n or --name flag. For example: -i result.txt -o result.gff -n wheat_1
Will result in a gff file like the following:
wheat_1 fgenesh exon 961 1456 12.02 + . gene_1 wheat_1 fgenesh exon 1702 2725 0.04 + . gene_1 wheat_1 fgenesh exon 3619 3982 10.41 + . gene_1 wheat_1 fgenesh exon 6960 7273 13.70 + . gene_2 wheat_1 fgenesh exon 7435 7789 21.29 + . gene_2 wheat_1 fgenesh exon 7904 8091 1.14 + . gene_2 wheat_1 fgenesh exon 8248 9163 13.79 + . gene_2 wheat_1 fgenesh exon 9206 9587 8.00 + . gene_2 ...
This option allows you to change the name of the sequence source without having to run the fgenesh program again.
It is often useful to run a program using different parameter sets. The program therefore allows you to specify the label for a set of parameters to be able to distinguish multiple prediction results from the same program using different parameter combinations. This parameter set label will be added to the second column of the gff output file.
For example running the program with parameter set one: -i result.txt -o result.gff -p set_1
This will result in a GFF file like the following:
HEX3045G05 fgenesh:set_1 exon 961 1456 12.02 + . gene_1 HEX3045G05 fgenesh:set_1 exon 1702 2725 0.04 + . gene_1 HEX3045G05 fgenesh:set_1 exon 3619 3982 10.41 + . gene_1 HEX3045G05 fgenesh:set_1 exon 6960 7273 13.70 + . gene_2 HEX3045G05 fgenesh:set_1 exon 7435 7789 21.29 + . gene_2 HEX3045G05 fgenesh:set_1 exon 7904 8091 1.14 + . gene_2 HEX3045G05 fgenesh:set_1 exon 8248 9163 13.79 + . gene_2 HEX3045G05 fgenesh:set_1 exon 9206 9587 8.00 + . gene_2 ...
Then running the program wit parameter set two: -i result.txt -o result.gff -p set_2
This will result in a GFF file like the following:
HEX3045G05 fgenesh:set_2 exon 961 1456 12.02 + . gene_1 HEX3045G05 fgenesh:set_2 exon 1702 2725 0.04 + . gene_1 HEX3045G05 fgenesh:set_2 exon 3619 3982 10.41 + . gene_1 ...
This will allow you to later distinguish between the result for parameter set one and the parameter set two results.
It is often useful in working at the unix command line to pipe the output from one program to another. For that reason, the program can accept input from STDIN. For example, given a text file named result.txt. You can send the result to using the cat command and then the pipe '|':
cat result.txt |
Since an output file is not specified, the result will be printed to STDOUT and will appear on the screen.
Since the program can write output to STDOUT, it is possible to directly load the GFF file to your database. For example, if you have a script called, you can pipe the GFF results to this program directly: -i result.txt |
This will load the result to your database without generating a copy of the GFF file on your hard drive.
Saving the output from the fgenesh webpage will included the copywrite statement from. You will get the following warning:
--------------------- WARNING --------------------- MSG: seq doesn't validate, mismatch is ?1999,2009,<,://,/> ---------------------------------------------------
This warning is only written to STDERR, and should not affect the gff output of the program. However, you can remove the offending line of fgenesh output using the grep command before piping the text to the program.
grep -v '' fgenesh.txt |
It is also possible to parse output from the softberry website if it was saved in html text format using the --html option. This will attempt to strip the html and save a local tmp copy that is in plain text file that will then be parsed: -i infile.txt --html
The following lists some typical error messages and solutions:
This generally will be seen when the fgenesh text file includes the copywrite statement from the softberry web site.
You may see something like this if you are trying to parse a result you saved from the softberry web site in the html format. The solution to this problem is to save the program as text. You can strip the html from the program using the --html option.
You will see this message if the program detects that the fgenesh output you are trying to parse is in HTML format. If this is the case, will attempt to save a copy of the fgenesh result as a normal text file before converting to GFF format.
This program does not make use of a configuartion file or varaibles defined in the user's environment.
This program is designed to parse ab initio gene annotation results generated by the Fgenesh program. These results can be generated from a local copy of the Fgenesh program, or can be results obtained by the Fgenesh web service provided by softberry
This program requires the perl module Bio::Tools::Fgenesh. This module is part of the bioperl package
Other modules or software that the program is dependent on.
If you find a bug with this software, file a bug report on the DAWG-PAWS Sourceforge website:
This progarm has only been tested with output from the softberry website and has not been tested with the Fgenesh binary. If you find that this program does work with the standalone program, please contact the author and let me know.
A manuscript is being submitted describing the DAWGPAWS program. Until this manuscript is published, please refer to the DAWGPAWS SourceForge website when describing your use of this program:
JC Estill and JL Bennetzen. 2009. The DAWGPAWS Pipeline for the Annotation of Genes and Transposable Elements in Plant Genomes.
GNU General Public License, Version 3
James C. Estill <JamesEstill at>
STARTED: 01/31/2009
UPDATED: 03/24/2009
VERSION: $Rev: 591 $ |