fasta_shorten.pl


NAME

fasta_shorten.pl - Change headers in a fasta file to give shorter names.


VERSION

This documentation refers to fasta_shorten version $Rev: 608 $


SYNOPSIS

Usage

    fasta_shorten.pl -i InDir -o OutDir

Required Arguments

    -i, --indir    # Directory of fasta files to process
    -o, --outdir   # Path to the base output directory


DESCRIPTION

This program will take all of the fasta files in a input directory and will shorten the name in the fasta headers. Creating shorter names is often required for programs that have a maximum length that they can use for fasta headers.


REQUIRED ARGUMENTS

-i,--indir

Path of the directory containing the sequences to process.

-o,--outdir

Path of the directory to place the program output.


OPTIONS

-l,--length

New length. The fasta header will be shortened to this length. Default length is 20.

-d,--delim-char

The delimiting character to use. Note that since PERL regular expressions can not use variables, I must limit this to a list of valid characters. Valid choices are:

pipe [-d pipe]

The pipe character '|' as used by NCBI.

comma [-d comma]

A comma ',' is used to delimit the header line.

space [-d space]

To delimit by any whitespace type in the word space.

tab [-d tab]

To delimit by the tab character, type tab in the command line.

-p,--delim-pos

The position in the delimited set to use. Be default this will be the fist position in the split array. If delim-pos is greater then the number of split characters, the first position will be used and a message sent to STDERR.

--upercase

Convert all sequence residues to uppercase.

--rename

Rename the output fasta files to the new FASTA header name.

--usage

Short overview of how to use program from command line.

--help

Show program usage with summary of options.

--version

Show program version.

--man

Show the full program manual. This uses the perldoc command to print the POD documentation for the program.

-q,--quiet

Run the program with minimal output.


DIAGNOSTICS

Error messages generated by this program and possible solutions are listed below.

ERROR: No fasta files were found in the input directory

The input directory does not contain fasta files in the expected format. This could happen because you gave an incorrect path or because your sequence files do not have the expected *.fasta extension in the file name.

ERROR: Could not create the output directory

The output directory could not be created at the path you specified. This could be do to the fact that the directory that you are trying to place your base directory in does not exist, or because you do not have write permission to the directory you want to place your file in.

WARNING: Outfile already exits

This will occur when you are choosing to rename the output fasta file, and the new file name is not unique. The existing file will be overwritten by the new file. You can avoid this problem by keeping the original sequence name, by selecting a longer string (-l) to generate unique new strings, or by choosing a different delimit character to generate unique names.

This may also occur when you are writing new fasta files to a directory that already contaings a fasta file with the same name.


CONFIGURATION AND ENVIRONMENT

This program does not currently make use of configuration files or settings in the user's environment.


DEPENDENCIES

Required Software

No external software is currently required to use this program

Required Perl Modules


BUGS AND LIMITATIONS

Bugs

Limitations


SEE ALSO

The fasta_shorten.pl program is part of the DAWG-PAWS package of genome annotation programs. See the DAWG-PAWS web page ( http://dawgpaws.sourceforge.net/ ) or the Sourceforge project page ( http://sourceforge.net/projects/dawgpaws ) for additional information about this package.


REFERENCE

A manuscript is being submitted describing the DAWGPAWS program. Until this manuscript is published, please refer to the DAWGPAWS SourceForge website when describing your use of this program:

JC Estill and JL Bennetzen. 2009. The DAWGPAWS Pipeline for the Annotation of Genes and Transposable Elements in Plant Genomes. http://dawgpaws.sourceforge.net/


LICENSE

GNU GENERAL PUBLIC LICENSE, VERSION 3

http://www.gnu.org/licenses/gpl.html


AUTHOR

James C. Estill <JamesEstill at gmail.com>


HISTORY

STARTED: 07/17/2007

UPDATED: 04/26/2008

VERSION: $Rev: 608 $

 fasta_shorten.pl