fasta_shorten.pl |
fasta_shorten.pl - Change headers in a fasta file to give shorter names.
This documentation refers to fasta_shorten version $Rev: 608 $
fasta_shorten.pl -i InDir -o OutDir
-i, --indir # Directory of fasta files to process -o, --outdir # Path to the base output directory
This program will take all of the fasta files in a input directory and will shorten the name in the fasta headers. Creating shorter names is often required for programs that have a maximum length that they can use for fasta headers.
Path of the directory containing the sequences to process.
Path of the directory to place the program output.
New length. The fasta header will be shortened to this length. Default length is 20.
The delimiting character to use. Note that since PERL regular expressions can not use variables, I must limit this to a list of valid characters. Valid choices are:
The pipe character '|' as used by NCBI.
A comma ',' is used to delimit the header line.
To delimit by any whitespace type in the word space.
To delimit by the tab character, type tab in the command line.
The position in the delimited set to use. Be default this will be the fist position in the split array. If delim-pos is greater then the number of split characters, the first position will be used and a message sent to STDERR.
Convert all sequence residues to uppercase.
Rename the output fasta files to the new FASTA header name.
Short overview of how to use program from command line.
Show program usage with summary of options.
Show program version.
Show the full program manual. This uses the perldoc command to print the POD documentation for the program.
Run the program with minimal output.
Error messages generated by this program and possible solutions are listed below.
The input directory does not contain fasta files in the expected format. This could happen because you gave an incorrect path or because your sequence files do not have the expected *.fasta extension in the file name.
The output directory could not be created at the path you specified. This could be do to the fact that the directory that you are trying to place your base directory in does not exist, or because you do not have write permission to the directory you want to place your file in.
This will occur when you are choosing to rename the output fasta file, and the new file name is not unique. The existing file will be overwritten by the new file. You can avoid this problem by keeping the original sequence name, by selecting a longer string (-l) to generate unique new strings, or by choosing a different delimit character to generate unique names.
This may also occur when you are writing new fasta files to a directory that already contaings a fasta file with the same name.
This program does not currently make use of configuration files or settings in the user's environment.
No external software is currently required to use this program
This module is required to accept options at the command line.
If you find a bug with this software, file a bug report on the DAWG-PAWS Sourceforge website: http://sourceforge.net/tracker/?group_id=204962
Multiple record fasta files may be reanamed to multiple files or throw an error.
If you find limitations to your use of this software please email the author with information regarding your operating system and what limitations you are experiencing with your use of the software.
The fasta_shorten.pl program is part of the DAWG-PAWS package of genome annotation programs. See the DAWG-PAWS web page ( http://dawgpaws.sourceforge.net/ ) or the Sourceforge project page ( http://sourceforge.net/projects/dawgpaws ) for additional information about this package.
A manuscript is being submitted describing the DAWGPAWS program. Until this manuscript is published, please refer to the DAWGPAWS SourceForge website when describing your use of this program:
JC Estill and JL Bennetzen. 2009. The DAWGPAWS Pipeline for the Annotation of Genes and Transposable Elements in Plant Genomes. http://dawgpaws.sourceforge.net/
GNU GENERAL PUBLIC LICENSE, VERSION 3
http://www.gnu.org/licenses/gpl.html
James C. Estill <JamesEstill at gmail.com>
STARTED: 07/17/2007
UPDATED: 04/26/2008
VERSION: $Rev: 608 $
fasta_shorten.pl |