The Genome Factory: April 2012

Saturday, 28 April 2012

Prokka - rapid prokaryotic annotation

Prokka is a software tool I have written to annotate bacterial, archaeal and viral genomes. It is based on years of experience annotating bacterial genomes, both automatically and via manual curation.

It's main design considerations were to be:

fast

supports multi-threading
hierarchical search databases

simple to use

no compulsory parameters
bundled databases

clean

standards-compliant output files
pipeline-friendly interface

thorough

finds tRNA, rRNA, CDS, sig_peptide, tandem repeats, ncRNA
includes /gene and /EC_number where possible, not just /product
traceable annotation sources via /inference tags

useful

produce files close-to-ready for submission to Genbank
complete log file

The first release is a monolithic, but followable Perl script. It only uses core Perl modules, but has quite a few external tool dependencies, some of which I can't bundle due to licence restrictions. Eventually I hope to have a public web-server version, and a version of it in the Galaxy Toolshed.

It currently takes about 10 minutes on a quad Intel i7 for a typical 4 Mbp genome.

You can download it from here and read the manual here.