note: quality value format set to PHRED+33 done r/hr r/core-hr The qv-offset might be set incorrectly! Currenty qvs are interpreted as PHRED+33 and a qv of 62 was observed. To disable this error, etiher set the offset correctly or disable this check (see README).
Wow! The PGM has improved dramatically, calling some bases at Q62, that's better than 1 in 1 million error rate... Here's a breakdown of the Q values in the file:
Symbol ASCII Q+33 Frequency
+ 43 10 1105774
, 44 11 347753
- 45 12 1167099
. 46 13 673276
/ 47 14 137220
0 48 15 225893
1 49 16 1621714
2 50 17 1731775
3 51 18 2726736
4 52 19 4280447
5 53 20 4272951
6 54 21 2556953
7 55 22 7783535
8 56 23 5153631
9 57 24 2362016
: 58 25 2406869
; 59 26 2517450
< 60 27 5762153
= 61 28 4334469
> 62 29 7109066
? 63 30 11113780
@ 64 31 13934507
A 65 32 9227417
B 66 33 12758868
C 67 34 8228985
D 68 35 9935410
E 69 36 1459950
F 70 37 2358692
G 71 38 682190
H 72 39 158322
I 73 40 168311
J 74 41 269
K 75 42 199121
L 76 43 83457
M 77 44 4971
N 78 45 464143
O 79 46 8657
P 80 47 0
Q 81 48 0
R 82 49 0
S 83 50 0
T 84 51 0
U 85 52 0
V 86 53 0
W 87 54 0
X 88 55 0
Y 89 56 0
Z 90 57 0
[ 91 58 0
\ 92 59 0
] 93 60 0
^ 94 61 0
_ 95 62 746
Ok, there really are 746 bases with Q62. Some grep work shows me they are all occuring alone in 746 reads, all in position 1 in the read, like this:
@QWRK0:02580:02076
GGGAATCAAAACGCTGATTTTTGATGAACAGAATAACGAA
+
_8,59=<<@@6@BCCDDDFFF3FEEEFAEC@@;77-7777
The table also shows quite a few >Q41 bases, which aren't typically seen in FASTQ files. These ones are probably ok (?) but the Q62 ones surely must be some artifact or bug in the version 3.0 of Torrent Suite.
In the meantime, our solution has been this:
sed 's/^_/H/' < dodgy.fastq > fixed.fastq
Be interested to see if others have encountered this problem.
I'm seeing the exact same thing on two runs that just came off of TS 3.0. Hundreds of reads with '_' phred score in position one. Also all of the reads with the Q62 score start with 'GG'... I bet it has something to do with the last base of the key sequence also being a 'G'. I'll start a discussion on the Ion Community. Thanks for pointing this out.
ReplyDeleteThanks for passing it on Dave. I didn't notice the GG pattern but I think you are right.
ReplyDelete