Friday 21 September 2012

Problematic FASTQ output from Ion TorrentSuite 3.0

Yesterday we got an email from a Nesoni user who said that the SHRiMP aligner was failing on his FASTQ data. Then again today we got another similar report from our trusted colleague Tim Stinear with the same issue. The evidence was mounting for a bug either in Nesoni or in the FASTQ file, rather than user error. Tim had recently upgraded his PGM Sequencer to Torrent Suite v3.0 (point zero release alarm bells!), and Nesoni saved the shrimp_log.txt file and it contained this:


note: quality value format set to PHRED+33
done r/hr r/core-hr
The qv-offset might be set incorrectly! Currenty qvs are interpreted as PHRED+33 and a qv of 62 was observed. To disable this error, etiher set the offset correctly or disable this check (see README).

Wow! The PGM has improved dramatically, calling some bases at Q62, that's better than 1 in 1 million error rate... Here's a breakdown of the Q values in the file:

Symbol ASCII Q+33 Frequency
+ 43 10 1105774
, 44 11 347753
- 45 12 1167099
. 46 13 673276
/ 47 14 137220
0 48 15 225893
1 49 16 1621714
2 50 17 1731775
3 51 18 2726736
4 52 19 4280447
5 53 20 4272951
6 54 21 2556953
7 55 22 7783535
8 56 23 5153631
9 57 24 2362016
: 58 25 2406869
; 59 26 2517450
< 60 27 5762153
= 61 28 4334469
> 62 29 7109066
? 63 30 11113780
@ 64 31 13934507
A 65 32 9227417
B 66 33 12758868
C 67 34 8228985
D 68 35 9935410
E 69 36 1459950
F 70 37 2358692
G 71 38 682190
H 72 39 158322
I 73 40 168311
J 74 41 269
K 75 42 199121
L 76 43 83457
M 77 44 4971
N 78 45 464143
O 79 46 8657
P 80 47 0
Q 81 48 0
R 82 49 0
S 83 50 0
T 84 51 0
U 85 52 0
V 86 53 0
W 87 54 0
X 88 55 0
Y 89 56 0
Z 90 57 0
[ 91 58 0
\ 92 59 0
] 93 60 0
^ 94 61 0
_ 95 62 746


Ok, there really are 746 bases with Q62. Some grep work shows me they are all occuring alone in 746 reads, all in position 1 in the read, like this:

@QWRK0:02580:02076
GGGAATCAAAACGCTGATTTTTGATGAACAGAATAACGAA
+
_8,59=<<@@6@BCCDDDFFF3FEEEFAEC@@;77-7777

The table also shows quite a few >Q41 bases, which aren't typically seen in FASTQ files. These ones are probably ok (?) but the Q62 ones surely must be some artifact or bug in the version 3.0 of Torrent Suite.
In the meantime, our solution has been this:

sed 's/^_/H/' < dodgy.fastq > fixed.fastq

Be interested to see if others have encountered this problem.


2 comments:

  1. I'm seeing the exact same thing on two runs that just came off of TS 3.0. Hundreds of reads with '_' phred score in position one. Also all of the reads with the Q62 score start with 'GG'... I bet it has something to do with the last base of the key sequence also being a 'G'. I'll start a discussion on the Ion Community. Thanks for pointing this out.

    ReplyDelete
  2. Thanks for passing it on Dave. I didn't notice the GG pattern but I think you are right.

    ReplyDelete