cnv_blast2gff.pl |
cnv_blast2gff.pl - Convert blast output to GFF
This documentation refers to program version $Rev: 598 $
cnv_blast2gff.pl -i blast_report.bln -o blast_out.gff -n seq_id
-i # Path to the input file # If not specified the program will expect input from STDIN -o # Path to the output file # If not specified the program will write to STDOUT -s # Name of the query sequence # If a name is not specified, the name will be # extracted from the blast report or use "seq"
This program will translate a blast report for a single query sequence into the GFF format. Since this uses the general bioperl BLAST parser code, this should also be able to parse output from BLAT or wublast. This code works best for converting a blast report of a single query sequence against a single database.
Path of the input file. If an input file is not specified, the program will expect input from STDIN.
Path of the output file. If an output file is not specified, the program will write output to STDOUT.
Identifier of the sequence that has been used as the query sequence in the blast report. This will be used as the first column in the gff output file.
The blast program used. This will be used to identify the source program in the second column of the GFF output file. Example of valid values include blastn, blastx, or wublast.
The type of feature. Be default, this is set to exon to facilitate using this blast report in Apollo. It is also possible to set this to an ontology complient name such as match or expressed_sequence_match.
The name for the database that was blasted against. If provided, this will be appended to the program variable in the second colum of the GFF output file.
The alignment format use in the BLAST report to be parsed. The program will assume that you are using the default alginment format for blast. Otherwise, you can specify 'tab' or '8' or '9' for tab delimited blast.
The maximum e value threshold to accept.
The minimum length to accept.
Run the program with maximum reporting of error and status messages.
Short overview of how to use program from command line.
Show program usage with summary of options.
Show program version.
Show the full program manual. This uses the perldoc command to print the POD documentation for the program.
Run the program with minimal output.
The following are examples on how to use the cnv_blast2gff.pl program.
The typical use of this program will be to convert an existing blast output to to the GFF file format:
cnv_blast2gff.pl -i blast_result.bln -o parsed_result.gff
This will generate a GFF format file named parsed_result.gff.
It is also possible to directly send the blast result to the cnv_blast2gff.pl program using the standard streams.
blastall -p blastin .. | cnv_blast2gff.pl -o blast_result.gff
This will take the blast output from NCBI's blastall program and convert the output to the gff format.
Since the cnv_blast2gff.pl program will write the results to the standard output stream if no file path is specified, it is possible to use standard unix commands to combined results. Consider the following set of commands:
cnv_blast2gff.pl blast_result01.bln > combined_results.gff cnv_blast2gff.pl blast_result02.bln >> combined_results.gff cnv_blast2gff.pl blast_result03.bln >> combined_results.gff
This will combined the blast results from the 01, 02 and 03 search into a single gff file named combined_results.gff
The first column in the GFF output file results indicates the id of the sequence that is being annotated. By default, the cnv_blast2gff.pl program will attempt to extract this ID from the blast result. It is also possible to specify this from the command line using the --name option. For example consider you had a blast report that gave the following result``
cnv_blast2gff.pl -i bl_result.bln -o gff_result.gff
that generated a gff file like the following
HEX3045G05 blast:mips exon 8537 8667 39 + . rire1 HEX3045G05 blast:mips exon 9911 9996 38 + . rire1 HEX3045G05 blast:mips exon 10025 10191 36 + . rire1 HEX3045G05 blast:mips exon 76161 76235 35 + . rire1 HEX3045G05 blast:mips exon 81151 81200 34 + . rire1 ...
where HEX304GO5 indicates the sequence id. This sequence identifier could be modified using the --name option:
cnv_blast2gff.pl -i bl_result.bln -o gff_result.gff --seqname HEX001
this would give the following result:
HEX001 blast:mips exon 8537 8667 39 + . rire1 HEX001 blast:mips exon 9911 9996 38 + . rire1 HEX001 blast:mips exon 10025 10191 36 + . rire1 HEX001 blast:mips exon 76161 76235 35 + . rire1 HEX001 blast:mips exon 81151 81200 34 + . rire1 ...
By default the cnv_blast2gff.pl program will identify the database in the second column as a suffix to the blast program, separated by a colon. The command:
cnv_blast2gff.pl -i bl_result.bln -o gff_result.gff
that generated a gff file like the following
HEX3045G05 blast:mips exon 8537 8667 39 + . rire1 HEX3045G05 blast:mips exon 9911 9996 38 + . rire1 HEX3045G05 blast:mips exon 10025 10191 36 + . rire1 HEX3045G05 blast:mips exon 76161 76235 35 + . rire1 HEX3045G05 blast:mips exon 81151 81200 34 + . rire1 ...
Could have the database suffix modified using the --database option as follows:
cnv_blast2gff.pl -i bl_result.bln -o gff_result.gff --name tes
This would modify the gff output to the following:
HEX3045G05 blast:tes exon 8537 8667 39 + . rire1 HEX3045G05 blast:tes exon 9911 9996 38 + . rire1 HEX3045G05 blast:tes exon 10025 10191 36 + . rire1 HEX3045G05 blast:tes exon 76161 76235 35 + . rire1 HEX3045G05 blast:tes exon 81151 81200 34 + . rire1
If you see this message, it may indicate that you did not properly specify the input sequence with -i or --infile flag.
This program does not make use of a configuration file or any variables set in the user's environment.
This program requires output from a BLAST program. Since this program makes use of the BioPerl blast parser, it should be possible to convert local alignment results from any of the following programs:
The following perl modules are required for this program:
The SearchIO module is part of the bioperl module.
If you find a bug with this software, file a bug report on the DAWG-PAWS Sourceforge website: http://sourceforge.net/tracker/?group_id=204962
This program has only been tested with BLAST output from the NCBI-BLAST package using standard and tab delimited output. If you find that there are limiations to this program that limit your use, please email the author or file a bug report on the Sourceforge website: http://sourceforge.net/tracker/?group_id=204962
A manuscript is being submitted describing the DAWGPAWS program. Until this manuscript is published, please refer to the DAWGPAWS SourceForge website when describing your use of this program:
JC Estill and JL Bennetzen. 2009. The DAWGPAWS Pipeline for the Annotation of Genes and Transposable Elements in Plant Genomes. http://dawgpaws.sourceforge.net/
GNU General Public License, Version 3
http://www.gnu.org/licenses/gpl.html
THIS SOFTWARE COMES AS IS, WITHOUT ANY EXPRESS OR IMPLIED WARRANTY. USE AT YOUR OWN RISK.
James C. Estill <JamesEstill at gmail.com>
STARTED: 08/06/2007
UPDATED: 01/31/2009
VERSION: $Rev: 598 $
cnv_blast2gff.pl |