tag:blogger.com,1999:blog-1071661434473559589.post9101877037912888652..comments2024-03-27T16:30:10.302+11:00Comments on The Genome Factory: Sorting FASTQ files by sequence lengthTorsten Seemannhttp://www.blogger.com/profile/12241185247897084810noreply@blogger.comBlogger7125tag:blogger.com,1999:blog-1071661434473559589.post-8992572591572634452020-09-26T07:23:16.674+10:002020-09-26T07:23:16.674+10:00Thanks!Thanks!Shrutii Sardahttps://www.blogger.com/profile/01349614249424529099noreply@blogger.comtag:blogger.com,1999:blog-1071661434473559589.post-47255987571686406912017-07-07T16:46:35.857+10:002017-07-07T16:46:35.857+10:00Great article for everyone to read especially for ...Great article for everyone to read especially for computer programmers to learn so much and try to avoid such nasty things.<a href="http://www.resumeplanets.org/">cv writing service</a>https://www.blogger.com/profile/09405505442642733937noreply@blogger.comtag:blogger.com,1999:blog-1071661434473559589.post-37850869780906308442012-11-27T16:06:49.880+11:002012-11-27T16:06:49.880+11:00Thanks, quite helpful!Thanks, quite helpful!Mark Ziemannhttps://www.blogger.com/profile/00623549232702735102noreply@blogger.comtag:blogger.com,1999:blog-1071661434473559589.post-49338812058825475852012-11-08T10:22:49.093+11:002012-11-08T10:22:49.093+11:00Agder,
Yes you are correct, the recent usearch doe...Agder,<br />Yes you are correct, the recent usearch does have<br />"usearch -sortbylength in.fa -output out.fa" but I have three problems with it<br /><br />1. The free version is limited to 32bit (which is about 3GB RAM on my Linux) which is too small for large files. I have 21M reads and it stops at 7M.<br /><br />2. It only supports FASTA, not FASTQ<br /><br />3. It crashes with an odd error on any reads.fa file I give it:<br />usearch -sortbylength in.fa -output out.fa<br />usearch_i86linux32 v6.0.152, 4.0Gb RAM<br />http://drive5.com/usearch<br />00:00 1.9Mb Reading in.fa, 3.0Gb<br />00:10 3.0Gb 7803063 (7.8M) seqs<br />---Fatal error---<br />Invalid byte 0x02x in FASTA file (null)<br />Torsten Seemannhttps://www.blogger.com/profile/12241185247897084810noreply@blogger.comtag:blogger.com,1999:blog-1071661434473559589.post-62288448481074662012-11-08T10:12:47.112+11:002012-11-08T10:12:47.112+11:00Titus, not sure what happened to your comment. I&#...Titus, not sure what happened to your comment. I've been away sick since I posted this. But thanks for the pointer! We're all still getting our head around the khmer toolset. Your recent 'handbook' is very helpful.Torsten Seemannhttps://www.blogger.com/profile/12241185247897084810noreply@blogger.comtag:blogger.com,1999:blog-1071661434473559589.post-59771789968700375772012-11-07T14:01:58.839+11:002012-11-07T14:01:58.839+11:00Hmm, my comment went away?
If you're going to...Hmm, my comment went away?<br /><br />If you're going to tack it on to diginorm or abundance trimming, why not sort the reads as you go through them? See https://github.com/ged-lab/khmer/blob/master/sandbox/filter-abund-output-by-length.pyTitus Brownhttps://www.blogger.com/profile/01789918783866021532noreply@blogger.comtag:blogger.com,1999:blog-1071661434473559589.post-3417361381567659012012-11-07T08:44:17.086+11:002012-11-07T08:44:17.086+11:00Robert Edgar's UCLUST/USEARCH tools use the me...Robert Edgar's UCLUST/USEARCH tools use the mergesort algorithm to sort large fasta files.Austin G. Davis-Richardsonhttps://www.blogger.com/profile/05689298308230215490noreply@blogger.com