cnv_blast2gff.pl

DESCRIPTION
REQUIRED ARGUMENTS
OPTIONS
EXAMPLES

Typical Use
Piping BLAST Result Directly to the Conversion Utility
Combining Blast Results in GFF format
Specify the Sequence ID with -name
Specify the Database name with -database

DIAGNOSTICS
CONFIGURATION AND ENVIRONMENT
DEPENDENCIES

Required Software
Required Perl Modules

BUGS AND LIMITATIONS

Bugs
Limitations

REFERENCE
LICENSE
AUTHOR
HISTORY

NAME

cnv_blast2gff.pl - Convert blast output to GFF

VERSION

This documentation refers to program version $Rev: 598 $

SYNOPSIS

Usage

    cnv_blast2gff.pl -i blast_report.bln -o blast_out.gff -n seq_id

Required Arguments

    -i      # Path to the input file
            # If not specified the program will expect input from STDIN
    -o      # Path to the output file
            # If not specified the program will write to STDOUT
    -s      # Name of the query sequence
            # If a name is not specified, the name will be
            # extracted from the blast report or use "seq"

DESCRIPTION

This program will translate a blast report for a single query sequence into the GFF format. Since this uses the general bioperl BLAST parser code, this should also be able to parse output from BLAT or wublast. This code works best for converting a blast report of a single query sequence against a single database.

REQUIRED ARGUMENTS

-i,--infile: Path of the input file. If an input file is not specified, the program will expect input from STDIN.
-o,--outfile: Path of the output file. If an output file is not specified, the program will write output to STDOUT.

OPTIONS

-s,--seqname: Identifier of the sequence that has been used as the query sequence in the blast report. This will be used as the first column in the gff output file.
-p,--program: The blast program used. This will be used to identify the source program in the second column of the GFF output file. Example of valid values include blastn, blastx, or wublast.
--feature: The type of feature. Be default, this is set to exon to facilitate using this blast report in Apollo. It is also possible to set this to an ontology complient name such as match or expressed_sequence_match.
-d,--database: The name for the database that was blasted against. If provided, this will be appended to the program variable in the second colum of the GFF output file.
-m,--align: The alignment format use in the BLAST report to be parsed. The program will assume that you are using the default alginment format for blast. Otherwise, you can specify 'tab' or '8' or '9' for tab delimited blast.
-e,--maxe: The maximum e value threshold to accept.
-l,--min-len: The minimum length to accept.
--verbose: Run the program with maximum reporting of error and status messages.
--usage: Short overview of how to use program from command line.
--help: Show program usage with summary of options.
--version: Show program version.
--man: Show the full program manual. This uses the perldoc command to print the POD documentation for the program.
-q,--quiet: Run the program with minimal output.

EXAMPLES

The following are examples on how to use the cnv_blast2gff.pl program.

Typical Use

The typical use of this program will be to convert an existing blast output to to the GFF file format:

  cnv_blast2gff.pl -i blast_result.bln -o parsed_result.gff

This will generate a GFF format file named parsed_result.gff.

Piping BLAST Result Directly to the Conversion Utility

It is also possible to directly send the blast result to the cnv_blast2gff.pl program using the standard streams.

  blastall -p blastin .. | cnv_blast2gff.pl -o blast_result.gff

This will take the blast output from NCBI's blastall program and convert the output to the gff format.

Combining Blast Results in GFF format

Since the cnv_blast2gff.pl program will write the results to the standard output stream if no file path is specified, it is possible to use standard unix commands to combined results. Consider the following set of commands:

  cnv_blast2gff.pl blast_result01.bln > combined_results.gff
  cnv_blast2gff.pl blast_result02.bln >> combined_results.gff
  cnv_blast2gff.pl blast_result03.bln >> combined_results.gff

This will combined the blast results from the 01, 02 and 03 search into a single gff file named combined_results.gff

Specify the Sequence ID with --name

The first column in the GFF output file results indicates the id of the sequence that is being annotated. By default, the cnv_blast2gff.pl program will attempt to extract this ID from the blast result. It is also possible to specify this from the command line using the --name option. For example consider you had a blast report that gave the following result``

  cnv_blast2gff.pl -i bl_result.bln -o gff_result.gff

that generated a gff file like the following

 HEX3045G05   blast:mips   exon     8537    8667    39   +    .  rire1
 HEX3045G05   blast:mips   exon     9911    9996    38   +    .  rire1
 HEX3045G05   blast:mips   exon     10025   10191   36   +    .  rire1
 HEX3045G05   blast:mips   exon     76161   76235   35   +    .  rire1
 HEX3045G05   blast:mips   exon     81151   81200   34   +    .  rire1
 ...

where HEX304GO5 indicates the sequence id. This sequence identifier could be modified using the --name option:

  cnv_blast2gff.pl -i bl_result.bln -o gff_result.gff --seqname HEX001

this would give the following result:

 HEX001   blast:mips   exon     8537    8667    39       +    .  rire1
 HEX001   blast:mips   exon     9911    9996    38       +    .  rire1
 HEX001   blast:mips   exon     10025   10191   36       +    .  rire1
 HEX001   blast:mips   exon     76161   76235   35       +    .  rire1
 HEX001   blast:mips   exon     81151   81200   34       +    .  rire1
 ...

Specify the Database name with --database

By default the cnv_blast2gff.pl program will identify the database in the second column as a suffix to the blast program, separated by a colon. The command:

  cnv_blast2gff.pl -i bl_result.bln -o gff_result.gff

that generated a gff file like the following

 HEX3045G05   blast:mips   exon     8537    8667    39   +    .  rire1
 HEX3045G05   blast:mips   exon     9911    9996    38   +    .  rire1
 HEX3045G05   blast:mips   exon     10025   10191   36   +    .  rire1
 HEX3045G05   blast:mips   exon     76161   76235   35   +    .  rire1
 HEX3045G05   blast:mips   exon     81151   81200   34   +    .  rire1
 ...

Could have the database suffix modified using the --database option as follows:

  cnv_blast2gff.pl -i bl_result.bln -o gff_result.gff --name tes

This would modify the gff output to the following:

 HEX3045G05   blast:tes   exon     8537    8667    39    +    .  rire1
 HEX3045G05   blast:tes   exon     9911    9996    38    +    .  rire1
 HEX3045G05   blast:tes   exon     10025   10191   36    +    .  rire1
 HEX3045G05   blast:tes   exon     76161   76235   35    +    .  rire1
 HEX3045G05   blast:tes   exon     81151   81200   34    +    .  rire1

DIAGNOSTICS

Expecting input from STDIN
If you see this message, it may indicate that you did not properly specify the input sequence with -i or --infile flag.

CONFIGURATION AND ENVIRONMENT

This program does not make use of a configuration file or any variables set in the user's environment.

DEPENDENCIES

Required Software

This program requires output from a BLAST program. Since this program makes use of the BioPerl blast parser, it should be possible to convert local alignment results from any of the following programs:

BLAST
PSIBLAST
PSITBLASTN
RPSBLAST
WUBLAST
bl2seq
WU-BLAST
BLASTZ
BLAT
Paracel
BTK

Required Perl Modules

The following perl modules are required for this program:

Bio::SearchIO
The SearchIO module is part of the bioperl module.

BUGS AND LIMITATIONS

Bugs

No bugs currently known
If you find a bug with this software, file a bug report on the DAWG-PAWS Sourceforge website: http://sourceforge.net/tracker/?group_id=204962

Limitations

Limited BLAST testing
This program has only been tested with BLAST output from the NCBI-BLAST package using standard and tab delimited output. If you find that there are limiations to this program that limit your use, please email the author or file a bug report on the Sourceforge website: http://sourceforge.net/tracker/?group_id=204962

REFERENCE

A manuscript is being submitted describing the DAWGPAWS program. Until this manuscript is published, please refer to the DAWGPAWS SourceForge website when describing your use of this program:

JC Estill and JL Bennetzen. 2009. The DAWGPAWS Pipeline for the Annotation of Genes and Transposable Elements in Plant Genomes. http://dawgpaws.sourceforge.net/

LICENSE

GNU General Public License, Version 3

http://www.gnu.org/licenses/gpl.html

THIS SOFTWARE COMES AS IS, WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. USE AT YOUR OWN RISK.

AUTHOR

James C. Estill <JamesEstill at gmail.com>

HISTORY

STARTED: 08/06/2007

UPDATED: 01/31/2009

VERSION: $Rev: 598 $

cnv_blast2gff.pl