Sunday, 6 October 2013

Understanding SNPs and INDELs in microbial genomes

Introduction

Variants are differences between two genomes. Here I describe two important types of nucleotide-level variants (SNPs and INDELs) and how they affect microbial genomes.

SNPs

A SNP is a single nucleotide polymorphism (pronounced "snip"). This is when there is single base which differs between two genomes, and the DNA around that base is otherwise unchanged.

Genome 1 | DNA | ATGCTATAGTAAATCTGCGCTAGCT
Genome 2 | DNA | ATGCTATAGTAAATGTGCGCTAGCT
                               |
                           SNP(C=>G)  

In coding-dense genomes like microbes, most SNPs will be within protein coding regions. Thus the SNP will change a codon, and potentially change the amino acid it codes for. If the amino acid coded for does not change, it is called a synonymous SNP (as the codon is a 'synonym' for the amino acid). If it does change, it is called a non-synonymous SNP.

Genome 1 | DNA | ATG AAA GTT GAT GAC CAG CAT TCC CCA TGA
Genome 2 | DNA | ATG AAA GTC GAT GAC CAG CAT TAC CCA TGA
                         ..|                 .|.  
                       SNP(T=>C)          SNP(C=>A)
                         ..|                 .|.
Genome 1 |  AA |  M   K   V   D   D   Q   H   S   P   *
Genome 2 |  AA |  M   K   V   D   D   Q   H   Y   P   *
                          |                   |
                         SYN               NON-SYN

A non-synonymous SNP can drastically alter the function of a protein because sometimes a single amino acid difference can modify the structure/shape of a protein. It could even affect the RNA transcript itself, causing it to be translated at lower efficiency or not at all. SNPs in promoter regions (-35, -10) and the ribosome binding site (RBS) can have similar effects.

A good rule of thumb is that SNPs in the 3rd position in a codon often produce synonymous SNPs, due to the particular pattern of degeneracy in the genetic code. If two SNPs occur right next to each other, the variant is sometimes called a multiple nucleotide polymorphism (MNP).

INDELs

An INDEL (INsertion/DELetion) is where a single base has been deleted, or inserted into one genome relative to another. It is a symmetrical relationship, as a deletion in one corresponds to an insertion in another. I reckon it should be called a deletion/insertion polymorphism (DIP) too, so we can all snack on SNPs and DIPs :-)

                           DEL(A)
                             |
Genome 1 | DNA | ATGCTATAGTAA-TCTGCGCTAGCT
Genome 2 | DNA | ATGCTATAGTAAATGTGCGCTAGCT
                             |
                           INS(A)  

While a SNP will either change a protein slightly or not at all, an INDEL will nearly always have a drastic affect on a protein. Because codons are groups of 3 nucleotides, removing/adding 1 nucleotide messes everything up; this is called a frame-shift mutation. This usually results in either a protein being extended, or truncated.

Genome 1 | DNA | ATG AAA GTT GAT GAC CAG CAT TCC CCA TGA
Genome 1 |  AA |  M   K   V   D   D   Q   H   S   P   *

Genome 2 | DNA | ATG AAA GTC -AT GAC CAG CAT TAC CCA TGA
                             |                          
                           DEL(G)          
                             |
Genome 2 | DNA | ATG AAA GTC ATG ACC AGC ATT ACC CAT GA? ??? ??? ???
Genome 2 |  AA |  M   K   V   M   T   S   I   T   H   X   X   X   X
                                                      |
                                            STOP Loss & read-through

In the previous case, the protein was extended into a new frame, causing it to have a different 3' end than normal. It will eventually hit another stop codon just by chance. In the case below, if a premature STOP codon is introduced, then we end up with a shorter reading frame.

Genome 3 | DNA | ATG AAA GTC GAAT GAC CAG CAT TAC CCA TGA
                               |
                             INS(A)
                               |
Genome 3 | DNA | ATG AAA GTC GAA  TGA CCA GCA TTA CCC ATG          
Genome 3 |  AA |  M   K   V   E    *   P   A   L   P   M
                                   |
                         STOP Gain & truncation

Because the terminator sequence is no longer where it needs to be, these genes may not every be transcribed, or translated. In that case they are called pseudo-genes.

If multiple deletions (or multiple insertions) occur together, it is sometimes called a micro-indel (or micro-insertion). A micro-INDEL of length 3 occasionally occurs in bacterial evolution, as it keeps the protein translation in frame.

Structural Variation

SNPs and INDELs are about low-level genomic variation. It is also possible to look at structural variants which affect the genome at larger scales. Events like gene duplications, tandem repeats, transposon insertions, inversions, and other chromosomal rearrangements are all important to consider, but this post will leave those issues for another day.

Conclusion

SNPs and INDELs are small differences between genomes. They are important drivers of bacterial evolution, by modifying how or whether genes are transcribed and translated. In my next post I will introduce my new tool Snippy for discovering these differences efficiently.

5 comments:

  1. hello Dr torsten,
    I am working on SNP and indel detection for non model plant organism after reaching and producing the vcf file i think i have hit a road block!! The format is very complex an i am wondering where and how to proceed further with the .vcf files i posses.
    I am using these .vcf files in IGV but igv says that it fails to detect an index file and i get a blank screen...any suggestion
    Thanks in advance

    ReplyDelete
    Replies
    1. This web page explains how to index your VCF files for IGV:
      http://www.broadinstitute.org/igv/VCF

      Delete
  2. Thanks! This was a helpful review. Loved the SNP DIP joke :0)

    ReplyDelete
  3. My daughter has multiple CBS insertions and a COMT insertion. Trying to make sense of what this means....

    ReplyDelete
  4. Good day friends. The truth is that GREAT MOTHER IS REAL AND UNIQUELY POWERFUL. Contact the Great Mother today and she will help you. when my husband left me for another woman, i thought that was the end but i contacted many spell casters but there was no positive responses and everything seemed not to be working and i was losing my mind until my cousin told me about the GREAT MOTHER who also helped her to bring back her man and also helped her to cure her herpes disease. I gave it a try and i contacted the Great Mother on her Website and i saw various testimonies about her on her good works how she has helped a lot of people. I explained my problems to her. she laughed and told me that, everything will be okay again and that she will help me bring back my man in 3 days time. I actually thought that it was a joke but since i had no other options, i gave it a try and i did what she instructed me do and to my greatest surprise, my man came back to me begging in 3 days time just as she told me. Contact the Great Mother now on her website:   https://Ourgreatmother1.blogspot.com   and you can also reach her on her email: Greatmotherofsolutiontemple1@gmail.com  or you can equally contact her on her whatsapp number: +17025514367

    ReplyDelete