The Genome Factory: Recent changes in the Velvet de novo ecosystem

Saturday, 9 June 2012

Recent changes in the Velvet de novo ecosystem

A few weeks ago we managed to lure an old colleague, Dave Powell, back into academia to join the VBC team. One of his many talents is software engineering, so I immediately got him working on some projects related to the Velvet de novo assembler that I really wanted to do but could never find time.

One of the plans was to write a GUI for Velvet. Browsing sourceforge and the like, it seems a few people had attempted and failed. Last year I supervised a 4th year software engineering group project, who managed to get a basic prototype working, but the code was unmaintainable. So Dave and I planned it out, and he had a prototype done by the next day. He implemented it in JRuby, which can be distributed as a standard portable Java .jar file. We called it VAGUE.

However, there are few issues with the current command line Velvet that was limiting the usefulness of VAGUE to its beginner target audience:

Paired-end read files have to be interleaved first, even though Illumina produce them as separate files
The user needed to know what format they were in eg. fastq, fasta
It only supports GZIP compression, even though a lot of sequencing centres are using BZIP2
People don't know what K value to start with

We could have put these features into VAGUE, but it seemed much more sensible to put them directly into Velvet itself. So that's what Dave did! With a bit of code refactoring, Velvet can now do these things.

velveth now has a -separate option for paired-read files (default is -interleaved):
velveth dir 31 -shortPaired -fastq -separate reads_R1.fq reads_R2.fq
velveth now has a -fmtAuto option which auto-detects file-type and compression-type. Note that each file can be of different format and compression - very elegant.
velveth dir 31 -shortPaired -separate -fmtAuto left.fastq.gz right.fa.bz2
velveth now supports BZIP2 compressed files, and if you have pbzip2 (parallel bzip) installed it will use that, and if you have pigz (parallel gzip) it will also use that

For the issue of choosing K, I wrote a very simple Perl script which actually counts the K-mers in your actual reads and tells you the K-mer coverage for your target genome for all possible values of K. It's called VelvetK and it has a --best option which we use in VAGUE to automatically choose a reasonable K-mer value to assemble with.

I think these new features and tools will help introduce Velvet and de novo assembly to more people, and hopefully help move power to the biologists and give more time to the bioinformaticians to develop better tools.

VAGUE - Velvet GUI
Velvet 1.2.07 - minimum version required for VAGUE
VelvetK - optional VAGUE add-on to estimate K using your data
VelvetAdvisor - web page to help choose K for your data

15 comments:

JAC12 June 2012 at 01:23
Hello Torsten
Great idea! I'm trying it now but having some problems. I've downloaded and compiled velvet 1.2.07 and put the bins in the same dir as vague. Trying to run the jar (or using the vague script provided), it doesn't run as he doesn't find the velvet binaries.

Cannot run binary : Cannot run program "velveth" (in directory "/Users/jcarrico/Downloads/vague-1.0"): error=2, No such file or directory
Cannot run binary : Cannot run program "velvetg" (in directory "/Users/jcarrico/Downloads/vague-1.0"): error=2, No such file or directory
NoMethodError: undefined method `<' for nil:NilClass
update_velvet_binary at ./lib/velvet_gui.rb:557
initialize at ./lib/velvet_gui.rb:498
(root) at ./bin/vague.rb:31
load at org/jruby/RubyKernel.java:1058
(root) at file:/Users/jcarrico/Downloads/vague-1.0/vague.jar!/META-INF/main.rb:1
require at org/jruby/RubyKernel.java:1033
require at file:/Users/jcarrico/Downloads/vague-1.0/vague.jar!/META-INF/main.rb:36
(root) at script>:3
ReplyDelete
Replies
Unknown12 June 2012 at 14:40
Hi JAC,

There is a bug in v1.0 that causes this if velvet is not in the system PATH. It is fixed in v1.0.1 which will be on the website shortly. Until then, you can workaround the problem by doing something like:

PATH=/Users/jcarrico/Downloads/vague-1.0:$PATH /Users/jcarrico/Downloads/vague-1.0/vague

Thanks for the report!

Cheers,
ReplyDelete
Replies
Torsten Seemann12 June 2012 at 14:43
Hi JAC - you can download v 1.0.1 now from our website:
http://bioinformatics.net.au/software.vague.shtml
ReplyDelete
Replies
JAC12 June 2012 at 19:10
Great to read David and Torsten! Thanks!
It now works but I'm still having problems with the velvetK script. I've made it executable just in case and still doesn't work.

[10:05:56] glaurung:~/Downloads/vague-1.0.1 jcarrico$ Failed to run : java.io.IOException: Cannot run program "velvetk.pl": error=2, No such file or directory
java/lang/ProcessBuilder.java:460:in `start'
lib/runner.rb:45:in `run'
lib/runner.rb:34:in `start'
org/jruby/RubyProc.java:270:in `call'
org/jruby/RubyProc.java:224:in `call'

Cheers
ReplyDelete
Replies
Torsten Seemann12 June 2012 at 21:25
Did you put velvetk.pl in your $PATH ?

Is the #!/usr/bin/perl line in the script correct for where your Perl is?

You made it executable for all users? (chmod 755 velvetk.pl)
ReplyDelete
Replies
Torsten Seemann13 June 2012 at 00:04
I can not replicate your issue with velvetk.pl --size option not being case-insensitive. The source code is doing case-insensitive check:

$size =~ m/(.*)([KMG])$/i

And when I run it with 2.1M or 2.1m it does the same thing.

% velvetk.pl -s 2.1m reads.fq
Target genome size is 2100000 bp

% velvetk.pl -s 2.1M reads.fq
Target genome size is 2100000 bp

As for not being able to "Create Folder" from the Java file browser, that seems to be partially Java and OS dependent, but we'll dig into it deeper to see if it can be enabled somehow. I agree it is pretty essential.
ReplyDelete
Replies
Torsten Seemann13 June 2012 at 12:28
Version 1.0.2 is now out:
http://bioinformatics.net.au/software.vague.shtml
ReplyDelete
Replies
Rajat Dhakal8 August 2013 at 14:23
Hi Torsten,

When we get contigs from the VAGUE after draft genome assembly, do we know whether they are oriented 5'-3'or 3'-5'?

Regards,
Rajat
ReplyDelete
Replies
Torsten Seemann8 August 2013 at 15:38
No genome assembler can tell you which orientation the contigs should be in. DNA is a double-stranded molecule and we can't tell which strand it came from. Sorry.
ReplyDelete
Replies
Rajat Dhakal16 August 2013 at 11:19
Hi Torsten,
Thanks for answering. I have one more question to ask.

So when we design the primers for example in primer BLAST, we get primers which are in 5'3' from plus and minus strand (please make me correct if I am wrong). So by the time we get primers, we know which strand is 5'3' and which is 3'5'. Is it enough to determine the strand orientation, or we have to amplify with these primers and get amplification before concluding about orientation? How is it done in your lab?

Thanks for your time to answer this question.

Regards,
Rajat
ReplyDelete
Replies
Anonymous14 September 2013 at 14:17
I seem to be having similar trouble. When trying to run VAGUE, I am unable to find velvet binaries, and need to specify the path. How do I do this? I've tryed a few things but none of them seem to work.

Thanks,
Mike
ReplyDelete
Replies
Dudu8 July 2014 at 07:29
How do I run Vague in a Ubuntu VM in Windows, please?
I click the executable file and nothing happens. I tried to compile it with javac but I can't find a .java file.
Thanks
ReplyDelete
Replies

Add comment