Monday 11 April 2016

What bacterial genome assemblers are people using?

Introduction

As of April 2016, there are about 70,000 genome assemblies in Genbank (draft and complete), with the majority being bacterial genomes. For genomes that have been submitted in NGS era, the COMMENT section of the Genbank file header has machine readable information about the sequencing technology, depth of coverage, and software used.

For example, the entry for Enterococcus faecium OC2A-1 contains this:

##Genome-Assembly-Data-START##
Finishing Goal           :: High-Quality Draft
Current Finishing Status :: High-Quality Draft
Assembly Method          :: Velvet v. 1.1.06
Genome Coverage          :: 104x
Sequencing Technology    :: Illumina
##Genome-Assembly-Data-END##

Method

I decided to parse this header for all the bacterial .gbff.gz (GenBank File Format, aka .gbk) files available at NCBI FTP to see what genome assembly software is being used for bacterial genomes. Now, like any user provided information, there is a lot of junk in this field, so I wrote some curated regexps to categorise them into cleaner bins. If more than one method was listed, I binned into Hybrid/Mixed. If if it was too minor or probably wrong I binned as Could not parse.

Results

CountAssembler Software
23725Not provided
9883AllPaths
5325Newbler
3783Velvet
3585CLC Genomics Workbench
3347Spades
2610IDBA
2477Celera Assembler
2082ABYSS
1815CLC NGS Cell
1782SOAPdenovo
1370Could not parse
1119HGAP
870MaSuRCA
853MIRA
793A5-MiSeq
308Ray
149Phred/Phrap/Consed
132Geneious
110SeqMan
109HGAP3
98Edena
69Hybrid/Mixed
59DNAstar
55Platanus
53NextGene
20Arachne
19DISCOVAR
9VelvetOptimiser
5Falcon
4Megahit
66618Total

Discussion

I was a little surprised to see ALLPATHS top the list due to its particular requirements for DNA library construction (overlapping PE + long mate pair), but the Broad Institute does do a lot of sequencing. A lot of people are using Velvet and Spades, but equal many using CLC Workbench or the NGS Cell product.

The most disturbing and funniest entries in the Could not parse division are listed below.

in-house software v. 10/18/2012
Unknown program v. before 2013-07-02
Direct Sequencing
DNASTAR SeqMan NGen v. 4.0.0
GS Reference Mapper v. September 2013
Trimmomatic v. 0.32;
Ion Torrent PGM
Artimis v. 10.1 
artimist v. 10.1
De Bruijn graph v. Apr-2011
BCFtools Consensus
BLASTN v. actual
BOWTIE v. Version 2.1.0
BWA v. 0.5.1
BioNumerics v. 6.6
ELAND alignment algorithm
Galaxy v. May 2012
de Bruijn graphs v. Mar-2013
MAQ v. 0.7.1
MATLAB v. R2013a

At the top we have in-house software (with a version number!). The Direct Sequencing could be a single perfect read of full chromosome from a really lucky Oxford Nanopore user. Is there anything Artimist (aka Artemis) cannot do? I need to upgrade my version of Trimmomatic and "actual" BLASTN too.

Conclusion

My main concern is the number of read aligners listed. There are some draft genomes myself and others have encountered where it appears the submitters have just aligned the reads to a close reference and submitted the consensus sequence as the assembly. These "genomes" sometimes cause problems in population studies, and I'd rather the reads be available instead.

11 comments:

  1. I have to check my Matlab as well for the assembler option ;-)
    Great post Torsten!

    For me also very worrying is the 35% of Not Provided... On those we can only guess and to include them in population studies can even be more dangerous. While all the assemblies published should also be published (when NCBI finally fixes their annotation and submission system and rules), raw reads is the only way to validate the data... I would reject any paper who presents assemblies without raw reads.

    ReplyDelete
    Replies
    1. It turns out that most of the 35% were done using "VelvetOptimiser" at the Sanger Institute. That metadata was somehow not required at the time of submission.

      Delete
    2. I want to thank Dr Emu a very powerful spell caster who help me to bring my husband back to me, few month ago i have a serious problem with my husband, to the extend that he left the house, and he started dating another woman and he stayed with the woman, i tried all i can to bring him back, but all my effort was useless until the day my friend came to my house and i told her every thing that had happened between me and my husband, then she told me of a powerful spell caster who help her when she was in the same problem I then contact Dr Emu and told him every thing and he told me not to worry my self again that my husband will come back to me after he has cast a spell on him, i thought it was a joke, after he had finish casting the spell, he told me that he had just finish casting the spell, to my greatest surprise within 48 hours, my husband really came back begging me to forgive him, if you need his help you can contact him with via email: Emutemple@gmail.com or add him up on his whatsapp +2347012841542 is willing to help any body that need his help.

      Delete
    3. Hello, I know a genuine spell caster who helped me when my Husband divorced me, he made my husband come back to me and beg me to forgive him for the pains he has caused me, if you are in need of help to solve your relationship or marriage problem, contact DR. KALA, he is the right choice. He is a great man that has been casting spells with years of experience and he will help you to get your love back and cancel your divorce.

      Contact DR. KALA on email: kalalovespell@gmail.com or WhatsApp +2347051705853

      Delete
  2. That is really interesting! I wonder if you could post the data and include taxonomy. I think it could let us see the most used assemblers per species or other taxonomic unit.

    ReplyDelete
    Replies
    1. That would be interesting Lee, but I don't have time. It was a bit of an ad hoc analysis! I know, I was naughty. Sorry.

      Delete
  3. This is a nice post. your article writing style is good but it can be more better with some practice and revision.:) But still you did a wonderful job. Thanks for this information.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. Hello, I know a genuine spell caster who helped me when my Husband divorced me, he made my husband come back to me and beg me to forgive him for the pains he has caused me, if you are in need of help to solve your relationship or marriage problem, contact DR. KALA, he is the right choice. He is a great man that has been casting spells with years of experience and he will help you to get your love back and cancel your divorce.

    Contact DR. KALA on email: kalalovespell@gmail.com or WhatsApp +2347051705853

    ReplyDelete