The Genome Factory: A Unix one-liner to call bacterial variants

Sunday, 14 October 2018

A Unix one-liner to call bacterial variants

Introduction

Variant finding is the generic term for finding differences between two genome sequences. These differences can take many forms, such as SNPs and small INDELs, large changes in DNA content caused by mobile elements, and structural changes like chromosomal inversions.

The genomes we want to compare could either be assemblies (complete or draft) or just sequencing reads (FASTQ files). The bulk of microbial variant finding tools focus on small differences (< 20 bp), and work by comparing a FASTQ sample to a assembled genome, typically called the "reference". A common use case is to sequence your isolate of interest, and see how it differs to the type strain in Genbank.

Calling SNPs with a one-liner

Let us assume we have a reference genome in FASTA format in the REF variable, the paired Illumina FASTQ files in R1 and R2, and the number of CPU cores you want to use in CPUS. Then, the follow "one-liner" will generate a VCF file with no intermediate files.

CPUS=4
REF=ref.fa
R1=R1.fastq.gz
R2=R2.fastq.gz

minimap2 -a -x sr -t "$CPUS" "$REF" "$R1" "$R2" \
 | samtools sort -l 0 --threads "$CPUS" \
 | bcftools mpileup -Ou -B --min-MQ 60 -f "$REF" - \
 | bcftools call -Ou -v -m - \
 | bcftools norm -Ou -f "$REF" -d all - \
 | bcftools filter -Ov -e 'QUAL<40 || DP<10 || GT!="1/1"' 
 > variants.vcf

The reads are aligned to the reference, and sorted by coordinate. Instead of saving the BAM file, we pipe it directly to a series of BCF tool steps. Note the use of -l 0 and -Ou to keep the piped data in an uncompressed form, to avoid repeated compression/decompression steps. The --min-MQ 60 ensures only uniquely mapped reads are used. The final filter step removes low quality variant calls, heterzygous calls (this is haploid bacteria), and any regions with less than 10 supporting reads.

Here is a summary of the results of this one-liner using some data for Pasteurella multocida. You can download with these links: REF (.fa.gz), R1 (.fastq.gz) and R2 (.fastq.gz).

bcftools stats variants.vcf | grep '^SN' | cut -f3-

number of samples:      1
number of records:      36618
number of no-ALTs:      0
number of SNPs: 36408
number of MNPs: 0
number of indels:       210
number of others:       0
number of multiallelic sites:   0
number of multiallelic SNP sites:       0

The one-liner found ~36,000 SNPs and 210 INDELs.

Existing software

This is not exhaustive list; just those I have encountered that are useful for bacterial data.

Full pipelines

Snippy - this is my pipeline
NASP - Northern Arizona SNP Pipeline
CFSAN SNP Pipeline - used at CFSAN, FDA
LYVE-Set - used at CDC
Breseq - Barrick Lab
SPANDX - Derek Saravoich and Erin Price

Variant caller only

VarScan - Dan Koboldt
GATK - The Broad Institute
Freebayes - Erik Garrison
Lofreq - Andreas Wilm, Niranjan Nagarajan
SAMtools/BCFtools - The originals!

Conclusion

Variant calling in bacteria is both "easy" and "hard". If your sequencing data is high quality, not contaminated, and came from a pure colony, then finding most of variants will be relatively easy. The problems crop up when alignments aren't filtered correctly, reads are a mix of isolates, and the reference is too divergent from the isolate.

I have made many mistakes in SNP calling over the years. My own pipeline Snippy is now at version 4.3 and I'm still not completely happy with its performance, but it is "good enough" for its primary task which is to generate core-genome SNP phylogenies and find differences between mutants and wildtype in mutagenesis experiments.

Should you use the one-liner presented above? Probably not - the various available SNP pipelines have done a lot of tuning and offer extra useful features. However it is an excellent learning opportunity to examine each of the steps involved and the parameters that affect the output results.

It's always a trade-off between false positives and false negatives!

Unknown16 October 2018 at 05:48
great
ReplyDelete
Replies
VIJAL NOEL14 July 2021 at 11:14
Hello, I know a genuine spell caster who helped me when my Husband divorced me, he made my husband come back to me and beg me to forgive him for the pains he has caused me, if you are in need of help to solve your relationship or marriage problem, contact DR. KALA, he is the right choice. He is a great man that has been casting spells with years of experience and he will help you to get your love back and cancel your divorce.

Contact DR. KALA on email: kalalovespell@gmail.com or WhatsApp +2347051705853
ReplyDelete
Replies
CELINE16 March 2022 at 17:57
I WANT MY EX LOVER BACK AND HOW TO GET YOUR EX LOVER BACK.I just want to give quick advice to anyone out there that is having difficulty in his or her relationship to contact Dr.Omokpo because he is the only one that is capable of bringing back broken relationships or broken marriages within the time limit of 2days. He brought back my ex lover back to me and this is a sign of appreciation to him that I am grateful for his help in my life, home and marriage. You can contact him via: dromokpo@gmail.com
ReplyDelete
Replies
Asore Corp19 March 2022 at 04:26
IS IT POSSIBLE TO ACTUALLY GET BACK FUNDS LOST TO CRYPTOCURRENCY SCAM? ABSOLUTELY YES! BUT, YOU MUST CONTACT THE RIGHT AGENCY TO ACHIEVE THIS.

Recovery Precinct is a financial regulator, private investigation and funds recovery body. We specialize in cases concerning cryptocurrency, FAKE investment schemes and recovery scam.

Visit WWW. RECOVERYPRECINCT. COM now to report your case or contact our support team via the contact information below to get started.

📪 RECOVERYPRECINCT (@) G MAIL . COM

Stay Safe !
ReplyDelete
Replies
MEGAGAME15 June 2022 at 16:56
บริการเกมสล็อตออนไลน์ปี 2022 เกมให้เลือกเล่นมากกว่า 1,000 เกมสล็อตออนไลน์ d grdegr
ReplyDelete
Replies
Patricia Donald 19 August 2022 at 08:39
Thank to Dr. Ogaga for bring back my lover in just 24 hours, when my lover left me i was sad and unhappy i wanted to kill my self because i love him so much i try to do what i can to get him but it did not work than I reach my friends and beg for them to help me beg him that am so sorry for what i have done, but it never work when i was browsing on my face-book. i saw some testimony of the great man and how he help people to bring back their lost lover back that is when i contacted him on Whats-App +2347059387282 in just 24 hours my story change, my lover that left me come back and beg after five month i was so happy that my lover is back to me all my appreciation goes to Dr Ogaga for bring back my lover back contact him if you are having any similar problem like this are any problem you may have Dr. Ogaga is the solution man, Whats-App him on +2347059387282
ReplyDelete
Replies
Heather Delaney12 December 2025 at 07:03

Are you in need of help to bring back your ex lover, stop your divorce/breakup, help to stop your partner from cheating. I was heart broken when my husband left me and moved to Texas to be with another woman. I felt my life was over and my kids thought they would never see their father again. I tried to be strong just for the kids but I could not control the pains that tormented my heart. My life was filled with sorrows and pains because my husband was the only man i have ever been with. I have tried many options but he did not come back, until i met a friend that directed me to Dr. Excellent, who helped me to bring back my husband after his reconciliation ritual. We are both living happily now and things have greatly improved in our marriage. This man is powerful and he can help you. Here his contact. Call/WhatsApp him at: +2348084273514 "Or email him at: Excellentspellcaster@gmail.com
ReplyDelete
Replies