Saturday, 9 June 2012

Recent changes in the Velvet de novo ecosystem

A few weeks ago we managed to lure an old colleague, Dave Powell, back into academia to join the VBC team. One of his many talents is software engineering, so I immediately got him working on some projects related to the Velvet de novo assembler that I really wanted to do but could never find time.

One of the plans was to write a GUI for Velvet. Browsing sourceforge and the like, it seems a few people had attempted and failed. Last year I supervised a 4th year software engineering group project, who managed to get a basic prototype working, but the code was unmaintainable. So Dave and I planned it out, and he had a prototype done by the next day. He implemented it in JRuby, which can be distributed as a standard portable Java .jar file. We called it VAGUE.

However, there are few issues with the current command line Velvet that was limiting the usefulness of VAGUE to its beginner target audience:

  1. Paired-end read files have to be interleaved first, even though Illumina produce them as separate files
  2. The user needed to know what format they were in eg. fastq, fasta
  3. It only supports GZIP compression, even though a lot of sequencing centres are using BZIP2
  4. People don't know what K value to start with
We could have put these features into VAGUE, but it seemed much more sensible to put them directly into Velvet itself. So that's what Dave did! With a bit of code refactoring, Velvet can now do these things.
  • velveth now has a -separate option for paired-read files (default is -interleaved):
    velveth dir 31 -shortPaired -fastq -separate reads_R1.fq reads_R2.fq

  • velveth now has a -fmtAuto option which auto-detects file-type and compression-type. Note that each file can be of different format and compression - very elegant.
    velveth dir 31 -shortPaired -separate -fmtAuto left.fastq.gz right.fa.bz2

  • velveth now supports BZIP2 compressed files, and if you have pbzip2 (parallel bzip) installed it will use that, and if you have pigz (parallel gzip) it will also use that
For the issue of choosing K, I wrote a very simple Perl script which actually counts the K-mers in your actual reads and tells you the K-mer coverage for your target genome for all possible values of K.  It's called VelvetK and it has a --best option which we use in VAGUE to automatically choose a reasonable K-mer value to assemble with. 

I think these new features and tools will help introduce Velvet and de novo assembly to more people, and hopefully help move power to the biologists and give more time to the bioinformaticians to develop better tools.

15 comments:

  1. Hello Torsten
    Great idea! I'm trying it now but having some problems. I've downloaded and compiled velvet 1.2.07 and put the bins in the same dir as vague. Trying to run the jar (or using the vague script provided), it doesn't run as he doesn't find the velvet binaries.


    Cannot run binary : Cannot run program "velveth" (in directory "/Users/jcarrico/Downloads/vague-1.0"): error=2, No such file or directory
    Cannot run binary : Cannot run program "velvetg" (in directory "/Users/jcarrico/Downloads/vague-1.0"): error=2, No such file or directory
    NoMethodError: undefined method `<' for nil:NilClass
    update_velvet_binary at ./lib/velvet_gui.rb:557
    initialize at ./lib/velvet_gui.rb:498
    (root) at ./bin/vague.rb:31
    load at org/jruby/RubyKernel.java:1058
    (root) at file:/Users/jcarrico/Downloads/vague-1.0/vague.jar!/META-INF/main.rb:1
    require at org/jruby/RubyKernel.java:1033
    require at file:/Users/jcarrico/Downloads/vague-1.0/vague.jar!/META-INF/main.rb:36
    (root) at script>:3

    ReplyDelete
  2. Hi JAC,

    There is a bug in v1.0 that causes this if velvet is not in the system PATH. It is fixed in v1.0.1 which will be on the website shortly. Until then, you can workaround the problem by doing something like:

    PATH=/Users/jcarrico/Downloads/vague-1.0:$PATH /Users/jcarrico/Downloads/vague-1.0/vague

    Thanks for the report!

    Cheers,

    ReplyDelete
  3. Hi JAC - you can download v 1.0.1 now from our website:
    http://bioinformatics.net.au/software.vague.shtml

    ReplyDelete
  4. Great to read David and Torsten! Thanks!
    It now works but I'm still having problems with the velvetK script. I've made it executable just in case and still doesn't work.


    [10:05:56] glaurung:~/Downloads/vague-1.0.1 jcarrico$ Failed to run : java.io.IOException: Cannot run program "velvetk.pl": error=2, No such file or directory
    java/lang/ProcessBuilder.java:460:in `start'
    lib/runner.rb:45:in `run'
    lib/runner.rb:34:in `start'
    org/jruby/RubyProc.java:270:in `call'
    org/jruby/RubyProc.java:224:in `call'

    Cheers

    ReplyDelete
  5. Did you put velvetk.pl in your $PATH ?

    Is the #!/usr/bin/perl line in the script correct for where your Perl is?

    You made it executable for all users? (chmod 755 velvetk.pl)

    ReplyDelete
    Replies
    1. Only no for the first. I added it to the path and now works.

      It seems also case sensitive to the input. 2.1M gives error while 2.1m works.

      Just a suggestion: Add the possibility of creating a dir when choosing the output directory. I'm running Mac OS X Lion and it doesn't offer that possibility (in case it has to do with some OS quirk in JAVA...

      Thanks!

      Delete
  6. I can not replicate your issue with velvetk.pl --size option not being case-insensitive. The source code is doing case-insensitive check:

    $size =~ m/(.*)([KMG])$/i

    And when I run it with 2.1M or 2.1m it does the same thing.

    % velvetk.pl -s 2.1m reads.fq
    Target genome size is 2100000 bp

    % velvetk.pl -s 2.1M reads.fq
    Target genome size is 2100000 bp

    As for not being able to "Create Folder" from the Java file browser, that seems to be partially Java and OS dependent, but we'll dig into it deeper to see if it can be enabled somehow. I agree it is pretty essential.

    ReplyDelete
    Replies
    1. The issue with velvetk.pl was using it directly in vague. I haven't tried it in the command line.
      But i've tried it again and it works now. Maybe I did something wrong. Sorry for the false alarm!

      Thanks for the follow-up!.

      Delete
  7. Version 1.0.2 is now out:
    http://bioinformatics.net.au/software.vague.shtml

    ReplyDelete
    Replies
    1. The create dir is 5*s .Thanks Torsten! :D

      Delete
  8. Hi Torsten,

    When we get contigs from the VAGUE after draft genome assembly, do we know whether they are oriented 5'-3'or 3'-5'?

    Regards,
    Rajat

    ReplyDelete
  9. No genome assembler can tell you which orientation the contigs should be in. DNA is a double-stranded molecule and we can't tell which strand it came from. Sorry.

    ReplyDelete
  10. Hi Torsten,
    Thanks for answering. I have one more question to ask.

    So when we design the primers for example in primer BLAST, we get primers which are in 5'3' from plus and minus strand (please make me correct if I am wrong). So by the time we get primers, we know which strand is 5'3' and which is 3'5'. Is it enough to determine the strand orientation, or we have to amplify with these primers and get amplification before concluding about orientation? How is it done in your lab?

    Thanks for your time to answer this question.

    Regards,
    Rajat

    ReplyDelete
  11. I seem to be having similar trouble. When trying to run VAGUE, I am unable to find velvet binaries, and need to specify the path. How do I do this? I've tryed a few things but none of them seem to work.

    Thanks,
    Mike

    ReplyDelete
  12. How do I run Vague in a Ubuntu VM in Windows, please?
    I click the executable file and nothing happens. I tried to compile it with javac but I can't find a .java file.
    Thanks

    ReplyDelete