Saturday, 30 March 2013

Bacterial genome annotation systems

Introduction

You get a bacterial isolate. You sequence it. You manage to get some contigs after mucking around with some de novo assembly software. Now what? Annotation of course! Your FASTA file is teeming with lifeless chunks of bacterial DNA yearning to be adorned with insightfully labelled features, so it can get some more attention from you, and maybe even be reunited with some old friends in Genbank/ENA. If this sounds familiar, then this blog post is for you.

What is genome annotation?

Genome annotation is the process of identifying features of interest on a genome sequence. Some of the   features relevant to bacterial genomes are protein coding genes, non-coding RNAs, and operons. Features can have all sorts of useful information associated with them in addition to their genomic location and feature type. For example, a protein-coding gene annotation could include items such as the predicted protein product, whether it has a signal peptide, a gene abbreviation and an enzyme classification number. The accuracy and richness of a genome annotation is important, and sometimes critical, to downstream biological interpretation.

In the old days, a basic ORF finder would be run over the contigs. Then the truly dedicated curators would comb over the ORFs, trim back to good looking start codon sites, delete spurious looking ORFs, and so on. Later gene predictor software and BLASTX helped bootstrap this process further. Now there are various "automatic annotation" systems which do a reasonably good job. Manual refinement of the automatic annotation can then be done using curation applications.

Below I list the tools I am aware of for performing and curating bacterial genome annotation. If I've missed any please let me know and I will add them.

Web submission systems


Standalone systems


Curation systems

  • Manatee (web interface + database backend)
  • Artemis (local app, can be combined with Chado SQL backend)
  • Apollo (local app, can be combined with Chado SQL backend)
  • Wasabi (my old non-public awful CGI/Perl/make mess that we still use internally)

Conclusion

Beware of systems claiming to do "microbial" annotation. Most of them are only designed for annotating bacteria. They will perform poorly on viruses, fungi and other microbes.