cnv_fgenesh2gff.pl


NAME

cnv_fgenesh2gff.pl - Convert fgenesh gene predictions to gff format


VERSION

This documentation refers to program version $Rev: 591 $


SYNOPSIS

Usage

    cnv_fgenesh2gff.pl -i infile.txt -o outfile.gff

Required Arguments

    --infile        # Path to fgenesh result to convert
    --outfie        # Path to the gff format output


DESCRIPTION

This program converts output from the fgenesh program to the gff format. If the fgenesh output file appears to be saved from the web, the program will attempt to first strip the HTML tags from the text before converting to the GFF format.


REQUIRED ARGUMENTS

-i,--infile

Path of the input file. This should a text file of the result of the fgenesh gene prediction program. If an input file is not specified, then the program will expect input from STDIN.

-o,--outfile

Path of the gff file that is produced by the program. If an output file is not specified, the program will write output to STDOUT.


OPTIONS

--html

Use this to convert the output from the softberry website if you saved the text in html format.

-p,--param

The label used to describe the parameter set used for the the annotation program. This identifier will be appended the source column (col 2) in the GFF output.

-s,--seqname

This is the name of the sequence that was annotated. This will be used in the source column (col 1) of the gff output file. By default, the program will use the name of the sequence as specified in the fgenesh output file.

--usage

Short overview of how to use program from command line.

--help

Show program usage with summary of options.

--version

Show program version.

--man

Show the full program manual. This uses the perldoc command to print the POD documentation for the program.

-q,--quiet

Run the program with minimal output.


EXAMPLES

Typical Use

Typically you will be using this program to convert the fgenesh annotation output for an individual sequence file to the gff format.

  cnv_fgenesh2gff.pl -i fgenesh_result.txt -o fgenesh_result.gff

This will result in a GFF result similar to the following:

 HEX3045G05  fgenesh   exon     961     1456    12.02   +     . gene_1
 HEX3045G05  fgenesh   exon     1702    2725    0.04    +     . gene_1
 HEX3045G05  fgenesh   exon     3619    3982    10.41   +     . gene_1
 HEX3045G05  fgenesh   exon     6960    7273    13.70   +     . gene_2
 HEX3045G05  fgenesh   exon     7435    7789    21.29   +     . gene_2
 HEX3045G05  fgenesh   exon     7904    8091    1.14    +     . gene_2
 HEX3045G05  fgenesh   exon     8248    9163    13.79   +     . gene_2
 HEX3045G05  fgenesh   exon     9206    9587    8.00    +     . gene_2
 ...
Specify the Sequence ID

Generally the cvn_fgenesh2gff.pl program will use the label for the sequence as reported in the fgenesh report file. Otherwise, you can specify the source sequence name using the -n or --name flag. For example:

  cnv_fgenesh2gff.pl -i result.txt -o result.gff -n wheat_1

Will result in a gff file like the following:

 wheat_1    fgenesh     exon    961     1456    12.02   +     . gene_1
 wheat_1    fgenesh     exon    1702    2725    0.04    +     . gene_1
 wheat_1    fgenesh     exon    3619    3982    10.41   +     . gene_1
 wheat_1    fgenesh     exon    6960    7273    13.70   +     . gene_2
 wheat_1    fgenesh     exon    7435    7789    21.29   +     . gene_2
 wheat_1    fgenesh     exon    7904    8091    1.14    +     . gene_2
 wheat_1    fgenesh     exon    8248    9163    13.79   +     . gene_2
 wheat_1    fgenesh     exon    9206    9587    8.00    +     . gene_2
 ...

This option allows you to change the name of the sequence source without having to run the fgenesh program again.

Specify the Parameter Set

It is often useful to run a program using different parameter sets. The cnv_fgenesh2gff.pl program therefore allows you to specify the label for a set of parameters to be able to distinguish multiple prediction results from the same program using different parameter combinations. This parameter set label will be added to the second column of the gff output file.

For example running the program with parameter set one:

  cnv_fgenesh2gff.pl -i result.txt -o result.gff -p set_1

This will result in a GFF file like the following:

 HEX3045G05  fgenesh:set_1  exon   961    1456  12.02   +    .  gene_1
 HEX3045G05  fgenesh:set_1  exon   1702   2725  0.04    +    .  gene_1
 HEX3045G05  fgenesh:set_1  exon   3619   3982  10.41   +    .  gene_1
 HEX3045G05  fgenesh:set_1  exon   6960   7273  13.70   +    .  gene_2
 HEX3045G05  fgenesh:set_1  exon   7435   7789  21.29   +    .  gene_2
 HEX3045G05  fgenesh:set_1  exon   7904   8091  1.14    +    .  gene_2
 HEX3045G05  fgenesh:set_1  exon   8248   9163  13.79   +    .  gene_2
 HEX3045G05  fgenesh:set_1  exon   9206   9587  8.00    +    .  gene_2
 ...

Then running the program wit parameter set two:

  cnv_fgenesh2gff.pl -i result.txt -o result.gff -p set_2

This will result in a GFF file like the following:

 HEX3045G05  fgenesh:set_2  exon   961    1456  12.02   +    .  gene_1
 HEX3045G05  fgenesh:set_2  exon   1702   2725  0.04    +    .  gene_1
 HEX3045G05  fgenesh:set_2  exon   3619   3982  10.41   +    .  gene_1
 ...

This will allow you to later distinguish between the result for parameter set one and the parameter set two results.

Accepting Input from STDIN

It is often useful in working at the unix command line to pipe the output from one program to another. For that reason, the cnv_fgenesh2gff.pl program can accept input from STDIN. For example, given a text file named result.txt. You can send the result to cnv_fgenesh2gff.pl using the cat command and then the pipe '|':

  cat result.txt | cnv_fgenesh2gff.pl

Since an output file is not specified, the result will be printed to STDOUT and will appear on the screen.

Writing Output to STDOUT

Since the program can write output to STDOUT, it is possible to directly load the GFF file to your database. For example, if you have a script called load_gff2mydb.pl, you can pipe the GFF results to this program directly:

  cnv_fgenesh2gff.pl -i result.txt | load_gff2mydb.pl

This will load the result to your database without generating a copy of the GFF file on your hard drive.

Removing Text That Throws Warnings

Saving the output from the fgenesh webpage will included the copywrite statement from. You will get the following warning:

  --------------------- WARNING ---------------------
  MSG: seq doesn't validate, mismatch is ?1999,2009,<,://,/>
  ---------------------------------------------------

This warning is only written to STDERR, and should not affect the gff output of the program. However, you can remove the offending line of fgenesh output using the grep command before piping the text to the cnv_fgenesh2gff.pl program.

  grep -v 'www.softberry.com' fgenesh.txt | cnv_fgenesh2gff.pl
Strip HTML Tags

It is also possible to parse output from the softberry website if it was saved in html text format using the --html option. This will attempt to strip the html and save a local tmp copy that is in plain text file that will then be parsed:

  cnv_fgenesh2gff.pl -i infile.txt --html


DIAGNOSTICS

The following lists some typical error messages and solutions:


CONFIGURATION AND ENVIRONMENT

This program does not make use of a configuartion file or varaibles defined in the user's environment.


DEPENDENCIES

Required Software

Required Perl Modules

Other modules or software that the program is dependent on.


BUGS AND LIMITATIONS

Bugs

Limitations


REFERENCE

A manuscript is being submitted describing the DAWGPAWS program. Until this manuscript is published, please refer to the DAWGPAWS SourceForge website when describing your use of this program:

JC Estill and JL Bennetzen. 2009. The DAWGPAWS Pipeline for the Annotation of Genes and Transposable Elements in Plant Genomes. http://dawgpaws.sourceforge.net/


LICENSE

GNU General Public License, Version 3

http://www.gnu.org/licenses/gpl.html

THIS SOFTWARE COMES AS IS, WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. USE AT YOUR OWN RISK.


AUTHOR

James C. Estill <JamesEstill at gmail.com>


HISTORY

STARTED: 01/31/2009

UPDATED: 03/24/2009

VERSION: $Rev: 591 $

 cnv_fgenesh2gff.pl