The Genome Factory: October 2015

These notes were taken during the 1st ASM Conference on 2015 Rapid NGS Bioinformatic Pipelines for Enhanced Molecular Epidemiologic Investigation of Pathogens held at the Omni Shoreham Hotel in Washington DC, USA from 24-27 September 2015. The conference has the shorter nickname “ASM NGS” and used the Twitter hashtag #ASMNGS.

The notes are intended to be as objective as possible. Personal opinions or speculation are prefixed by the author’s initials below:

PA = Phil Ashton (Public Health England, UK) = @flashton2003
TS = Torsten Seemann (Uni. Melbourne, Australia) = @torstenseemann
FB = Fiona Brinkman (Simon Fraser University, Canada) = @fionabrinkman
EG = Emma Griffiths (Simon Fraser University, Canada) = @griffiemma
RL = Robyn Lee (McGill University, Canada) = @robyn_s_lee

Steve Musser - FDA (session chair)

Welcome remarks by Joseph Campos, Gary Procop, Eric Brown

George Weinstock - Jackson Labs

Microbial Genomics and Beyond

Applications of clinical microbial NGS

eg surge of O157:H7 outbreaks in St. Louis → salad bar at grocery chain
core genome O157 - 3.4 million nts (Leopold, 2009; see diagram of SNPs showing evolution)
eg Bacteremia in NICU (from diapers) → monitoring babies in real time with WGS, “identity” of culprits of infection found in gut microbiome, 101 spp sufficiently covered from stool samples to assess SNPs depending on tissue, sample site etc produce different spectrum of AMR genes
eg Daptomycin resistance via mutation, in Enterococcus
Daptomycin used for patients with VRE
mutations in GdpD, Cls, LiaF produce resistance, can profile in patients
eg estimating bla copies (compared to single copy MLST genes), not in plasmids but massive tandem array

Metagenomics examples

virus detection by metagenomics (febrile children study)
mWGS vs RNA-Seq (looking for “Golden Microbiome” in elite athletes); mWGS=1% Methanobrevibacter (Archaea), RNA-Seq (48% Methanobrevibacter) → transcription may be more important than looking at “who’s there”
16S amplicon seq, cheaper, faster, high throughput, more comprehensive than PCR, culture
Pathogen Detection - hospital acquired diarrhea, pathogen abundance in clinical samples, acne associated skin microbiome (top 10 ribotypes differ by 1-2 SNPs, wouldn’t have detected these unless seq entire 16S gene)

Read Length distance: PacBio, full length 16S (Fichot & Norman, Microbiome, 2013, 1:10)
Finding long read full-length amplicon sequencing for microbiome analysis works. Yeay! (PacBio Nanopore)
10x coverage, 5x around circle
single organisms, polymorphisms in multiple copies
benchmarking 16S with MinION
HMP req’s metagenomic benchmarking
BLAST takes a long time to perform, rate limiting step in real time investigations, few tweaks were able to increase speed 10x (Big Iron NCSA Blue Waters, TeraGrid)

Lynn Bry - Brigham Women’s Hospital

(Late replacement for Julian Parkhill)

Sequencing foodborne and MDR clinical isolates at BWH.
1000 micro samples per day! >100 +ve cultures across kingdoms/phyla
50% diagnosis, 10% therapy, 40% screening/surveillance MRSA, VRE, GrpB Strep, Gram-
Lots of metadata: MIC, disk diff, E-TEST, Drug resistance, zone diam, R/I.S, ESBL, D-zone CLI/ERM
HIPAA de-identified data - Year only, no location
WHONET http://whonet.org/ open source to generate antiobiograms in surveillance (Old, Windows s.w)
Crimson LIMS - prospective analysis of clinical samples & real time query
“Honest Broker” assigns new external IDs
Spades, QUAST, ResFinder, CARD, RAST, Mauve for extra chromosomal, BLAST for plasmid/transposon, where is the resistance gene ? transposon - plasmid or chrom.
SNPS: bowtie2, mpileup, bcftools, custom filtering.
Kp CRE ST258 - found many different plasmids and transposons + point mutations - WGS revealed this detail
E. cloacae CRE - ampC on chrom + porin mutations , multiple mobile elements Tn4401b / Tn6901
Serratia marcessens CRE - SRT-2 Ampc_SME-4, AmpC and KPC-3 acquire, 3 year survey 2011-2014, 2 close events
TImelimes; MiSeq (14 days), Bioinformatics (1 - 14 days), Epi (14 days)
Despite 3 week turnaround they are told it IS actionable
Rule out just as important as rule in
Mobile element analysis can refine relationship analysis
Curating new genomes and mobile elements takes the most the time
Desire to use more principled methods for outbreak calling - SaTScan, Bayesian, likelihoods

Stephen Ostroff

Priming the Innovation Pump: FDA’s Role in Advancing and Using NGS

Did not use any slides.
Was FDA employee #1
WGS - gamechanger for “splitters”
WGS Identifies previously unknown clusters, provides surveillance/warning system
WGS allows rapid sharing between Ag, food, vet communities → evidence-based traceback and risk factors for identifying risks in food supply for targeted interventions
eg Foods Program
WGA→ routine analysis allowing for automation
Provides lot of data relatively quickly
identifying stable genetic changes used to pinpoint contamination (genome provides 3-5 million data points/isolate; statistically robust, accurate and stable)
WGS has allowed cases to be solved that were unresolvable by epi investigations alone
WGS useful for identifying medical agents, drug discovery, druggable targets (targets), live virus vaccine database identifying most prevalent strains
WGS is agnostic, don’t need to know identity of organism before sequencing → although sensitivity becomes an issue

eg ensure safety of blood supply with HIV detection tests and variant identification
eg cystic fibrosis test

FDA collaborates with NIST (National Institute of Standards and Technology) to develop standards, need comprehensive repositories and sharing
GenomeTrakr (14 states and 9 regional labs)
NARMS - National AMR surveillance (meat, animal slaughter, human samples) → real time monitoring of drug and disinfectant resistance determinants

Marc Allard - FDA

GenomeTrakr: A Pathogen Database to Build a Global Genomic Network for PAthogen Traceback and Outbreak Detection

ref database: pathogen detection pipeline that can inform:

matching food/enviro isolates to clinical
track facility contamination
trace source of contamination (DB contains isolates from different geo_locs)
monitor AMR, virulence, pathogenicity

GenomeTrakr project - originally sold as PulseNet 2.0 using WGS
No surprise that 4.7 Mbp gives higher resolution that a few antigen genes
Source tracking is key application for WGS - statistically robust, high res, stable, accurate
FDA - genomics mapping, link between food & env & clinical
CDC - which clinical case inclusion / exclusion
FDA/CVM - antimicrobial resistance, phenotypic predictions from genotype
Showed some GIS phylogeograph - made by http://www.supermap.com/en/html/products.html
Minimal pathogen metadata

eg spicy tuna outbreak, Salmonella Bareilly

common PFGE patterns worldwide, not enough resolving power for inspectors to investigate (also sushi has many ingredients→ geo_loc details could help refine which ingredient is the culprit resulting in earlier intervention)
SNP phylogeny identified Scrape Tuna (Indian isolates), cluster within 2-5 SNPs, phylogeographic analysis → tips of phylo trees mapped to India, 8km between location of sequenced isolate and source of food contamination
need for global DB for detecting leads like this
paper

eg S. Braenderup: Nut butter

outbreak cluster → only few SNP diffs
tree helped inform epi questionnaire = tool for IDing matches to drill down into number of cases faster (can point to particular foods from matching isolates)

as price for sequencing drops, number of isolates that can be sequenced increases
2015-15 was big year for WGS (rolled out in 2014; 2015 focus on standards, training and proficiency)
250 isolates/week, detecting 24 clusters/week, subset of clusters are actionable → weekly meetings to make PH decisions based on this info
regular epi curve shows spike in illnesses occurs 20-48 days into outbreak, WGS will help get ahead of the epi curve to avert illness
minimal metadata (describing who, what, when, why) provides context, key to real-time investigations, better metadata contributes to earlier interventions (industry, growers, distributors) → identify certain suppliers with contamination, also resident vs transient pathogens (require different interventions)

reduced # recalls
decrease sick patients
preserves brand names
improved farm practices (packing/processing)

PFGE with poorer resolution can falsely implicate industries
multi-ingredient products → can tease out endemic vs globally imported ingredient
industry needs access to data in 1-2 weeks to be effective
industries can use NCBI Genome Workbench, FDA analysis software themselves so the gov’t aren’t seen as “bad guys”, industry can understand the problems themselves

Validation efforts:

technical performance
intralab variation, seq platform
interlab
bioinformatics pipeline

Frank M. Aerestrup - DTU, Denmark

GMI - GLobal Microbial Identifier - Dream or Future?

Reviewing http://www.globalmicrobialidentifier.org/
Using the Battle of Austerlitz as a metaphor for WGS as a "common language" in our "war"
Infectious diseases is still #1 problem - 25% of global deaths
Increasingly they have global epidemiology
Real-time surveillance can’t work without real-time data sharing
Much easier to teach genome sequencing than teaching Salmonella serotyping! (apart from more people who know serotyping in a typical micro lab!)
Need to get people to trust to share.
FB: I think its key to encourage sharing first with v minimal metadata. Get everyone comfortable. Then more metadata can be added by those more comfortable and others will follow as they see the great benefit of doing so…
Need to engage people more widely around the world.
TS: This talk needs to be taken in context of Marc Allard politely imploring DTU to put all their data in GenomeTrakr, with the implication that they are holding stuff back? FB: There is a culture of some people v worried about sharing that needs to be overcome. Hopefully now that Genometrakr has shown you can do it without getting sued, this will change. TS: I don’t think it is legal worries, i still think it is publication novelty fear (which is reasonable given worsening academic funding in most countries where papers are key metric)

Marianne Kjeldsen - Statens Serum Institut, Copenhagen, DENMARK

Three Months of Surveillance of S. Typhimurium+S. 4, 5, 12:i:- (aka monophasic typhimurium) in Denmark Based on WGS and MLVA Typing
Salmonella 93.8M cases, 2500 serovars, 17% are serovar Typhimurium (notifiable)
PA: ST36 < 2000 SNPs from ST19/ST34, hmm, would be surprised, as different clonal complex
SNP trees included strains that were excluded based on MLVA, but broadly concurrent.
Higher discrimination than MLVA, especially for the monophasic ST34 strains.

Lessons:

WGS provided higher resolution than MLVA
reliable for outbreak detection, even with single ref strain
need to consider max SNP difference
investigations will always need to consider epi data

Amy Gargis - CDC

Assuring the Quality of Next-Generation Sequencing in Clinical and Public Health Laboratories

Quality assurance, sequencing, lab developed tests, optimizing library prep per organism, DNA quality
One major issue is assuring quality of DNA extract
Publication here http://www.nature.com/nbt/journal/v33/n7/extref/nbt.3237-S1.pdf
Have to lock down bioinfx pipelines for quality control/assurance - strong difference from bioinfx community attitudes (PA)
Another paper - specific to variant calling http://journal.frontiersin.org/article/10.3389/fgene.2015.00235/abstract
clinical setting - CLIA regulates clinical labs performing tests on patient specimens → return results, CLA ensures accurate results
This is not “exciting science” but an important part of public health genomics

Deborah Moine - Nestlé

Long Reads Sequencing for Better Short Reads SNP Analysis

Need to detect contamination
When ref genome very different from sample (high SNP diffs), increases non-mappable reads → risk of false positives
PacBio generates 20Kb libraries requiring no amplification (less bias)
SMRT Cell, 250 000 nanowell → 1 DNA molecule/well
10% error (random) rate, de novo assembly HGAP
< 15Kb is short read → used to correct longest PacBio read → get 1 contig representing whole genome
1 contig 4.3 Mb, 245x coverage for Salmonella (Nestle) study
20Kb library, 2SMRT cells, 1.4 Gb after filtering
SNP analysis using new ref genome, low number of non-mapping reads
Need to look at tree and SNP distance matrix
Ref genomes generated:

21 Salmonella
5 Listeria
25 Cronobacter

SNP analysis better on full length ref than draft

Roger Barrette; Plum Island Animal Disease Center (USDA/ APHIS)

Subtractive-hybridization for Enrichment of Non-host Nucleic Acid for Improvement of Sequence-based Detection of Pathogens

Rapid ion torrent sequencing of Flu from Swine, but 87% host dna
need to decrease library bias → enrichment technique
Capture RNA oligonucleotide, biotinylated
isolate host RNA, fragments, ligation of biotinylated construct at 22oC, reverse transcription
RNAse treatment, pull out target cDNA (w/ negative beads)
Goal: decrease host, increase viral reads
“Background subtractive hybridization method” - decreased total reads, but higher proportion of pathogen-specific reads

eg Proof-of-principle → Foot & Mouth
454 preps enriched vs not (by subtractive method)
34% genome covered with no enrichment vs 75% with enrichment
need a process to increase yields and automate
Summary: this DNA:cDNA pulldown method works!
decrease library bias, increasing likelihood of agent discovery → critical for testing primary tissues
Costs Less than a 454 Jr run presumably :)

Catherine Yoshida; Public Health Agency of Canada, Guelph, ON, CANADA

The Salmonella in silico Typing Resource (SISTR): Rapid Analysis of Salmonella Draft Genome Sequence Data

Salmonella in silico typing resource - https://lfz.corefacility.ca/sistr-app/
Genoserotyping, 1 day turn-around-time, high throughput (96 samples/day)
Non-subjective interpretation
O antigen (rfb cluster), somatic
H antigens (H1=fliC, H2=fliB), flagellar
SISTR can predict >2000 serovars
Incorporates Achtman Salmonella MLST
Classical MLST =7-9 genes, cgMLST=100’s to 1000’s core genes
SISTR cgMLST=330 genes → high assignability, low levels of “missing data”, will include international scheme when finished
SISTR interface → batch upload, on the fly typing, genome browser, visualization (can change min span tree according to selected metadata)
Under epi tab can select geographical visualization (by lat_lon or GPS co-ords) → click on node and table of metadata appears
Also temporal distribution of strains
Visualization only as strong as metadata provided
Does not seem to be a command line version available
FB: Ed Toboata said in person to me after this talk he’s interested in making a command line version available. There are some good docs at https://sistr-backend.readthedocs.org/en/latest/
cgMLST cluster is 86% correlation with its serovar (from metadata)
Phylogenetics can be used to make it 95% correlated
The last 5% due to bad or missing metadata

Philip Ashton; Public Health England

Revolutionising Public Health Reference Microbiology Using Whole Genome Sequencing: A Case Study with Salmonella

WGA allows for digging deep into outbreaks and research trends
2500 serotypes -->99% clinical Enterica (50% Typhimurium & Enteritidis, other 50% other serotypes)
peak of cases in 90’s (30 000 cases /year), currently 7-8000 cases/year
rate of decrease of incidence slowing
Capacity of >3000 genomes per week with 2 miseq, 2 hiseq
pipeline: Kmer (18mer) → ID-->99.7% accurate subspeciation → can be used to detect contamination
MLST to predict serotype (for backwards compatibility) - 6887 isolates with WGS and phenotypic data -->96% match between genotype and phenotype (discrepancies due to 2 serotypes assigned to sincle serotype or eburst group, lab error, no ST/serotype lookup)
Method: short read seq typing → ST (& eburst grouping) → serotype
SISTR (PHAC)/SeqSero (CDC), SNP typing with most common serotypes, SnapperDB, FastQ → db eburst groups
eg 2014 14b outbreak (international), good traceback http://www.eurosurveillance.org/ViewArticle.aspx?ArticleId=21098

MinION will be used more in the future

great trees, including the “Death star of Salmonella” - used Bionumerics Yep - my colleague Satheesh used Bionumerics to make that, partly because last time i used phyloviz, the bubble size didn’t scale with number of isolates.
Slides here - http://www.slideshare.net/PhilipAshton1/whole-genome-microbiology-for-salmonella-public-health-microbiology

Greg Armstrong; CDC (incoming head of AMD division)

The Application of Genomics to Public Health—an Epidemiologist’s Point of View

AMD Focus
Polio - used seq longer than any other PH area
Late 2013 → seq every isolate available → world eradication program in full effect
Polio thought to be endemic in Afghanistan → seq showed isolates from Pakistan with sustained transition
Seq in early 2014 showed isolates are all same in South Asia → intensified surveillance and immunization in southern Afghanistan

Ebola - little asymptomatic infection so transmission chains are more obvious
Guinea → consensus seq uploaded to Nextflu.org → married to metadata (on MicroReact) → useful for epi’s

Listeria outbreak analysis: Normally the trees match PFGE but show a case where the PFGE didn’t match. Described how key to genomic epi is both having the genomic data AND the good epi (which in this case revealed that the outbreak was associated with carmel apples)

Showed amusing Mycobacterium tuberculosis “tree” with no branches resulting from conventional genotyping with MIRU-VNTR (i.e. identical isolates): ….then showed tree illustrating how isolates could be differentiated by WGS

HIV transmission

contact tracing based on epi data
attribution table -25% social contact, integrating WGS and epi = >80% injection drug users

inferred HepC-V transmission

Pertussis incidence increasing for 30yrs (acellular vaccine since ‘90’s)
Refer to posters for pertussis outbreak analysis. Huge increase in pertussis lately, primarily in California (California outbreak 2010)

Influenza pipeline w/ NGS → faster, cheaper, more samples, more data, better data
impacts vaccine dev (informs what strains to build vaccine against based on typing from previous season)

Pneumococcal pipeline w/ NGS → more PH data, more easily exportable, less prone to human error

Mentions the need for bioinformaticians/bioinformaticists at CDC. FB: Good to encourage students to try work terms, scholarships/fellowships, at such public health agencies if they are interested in such positions. Many full time positions acquired after working in a public health agencies temporarily as part of a work term/trainee position.

Lessons:

data integration is an issue, usually diff data streams, need to integrate with external partners
culture-independent diagnostic tests impacting ability to get isolates

Which is best pipeline like asking “how big is a piece of string?”

Stefan Niemann; Research Center Borstel, Borstel, Germany

Tracing Evolution and Spread of Mycobacterium tuberculosis Strains in Times of Antibiotic Treatment

90% of MDR-TB patients are not treated successfully
Former Soviet Union is a hot-bed of MDR-TB
Initially felt MDR-TB not easily transmitted due to decreased fitness associated with rpoB mutations
Referring to http://www.nature.com/ng/journal/v45/n10/full/ng.2744.html
Conflicting information associated Beijing sub-lineage with MDR - did 24-locus MIRU with 4987 isolates, from 99 countries, WGS on subset of 110 isolates → the associated publication: http://www.nature.com/ng/journal/v47/n3/full/ng.3195.html
First streptomycin mutations ~1970, when first treatment given - resistance mutations appeared way before DOTs initiated
MDr outbreak clone sin Eastern Europe due to antibiotic Tx and bottleneck selection
Implementation of DOTs and DOTsPlus actually increase presence of the clones in Central Asia
Compensatory mutations increasing fitness, i.e. transmissibility of drug-R clones
One of Stefan’s older papers on this: http://jcm.asm.org/content/48/10/3544.full

Resistance prediction: http://www.ncbi.nlm.nih.gov/pubmed/26116186
74% of resistance strains had a single mutation, i.e. likely causal

Whole genome MLST: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4097744/

Ruth Timme (Hugh Rand); FDA

Benchmark Datasets for Validating Foodborne Outbreak Investigations: Integrating WGS and Phylogenomic Analyses

FDA/CDC/NCBI/FSIS(?) got together to develop a uniform approach for analysis comparison and standardisation of results.
Lots of components in the benchmark - isolate, dna, raw data, meta data, output of analyses. how to compare these analyses?
I think this is really valuable, would be great to see the details as to the broader reasons of how and why to use these in the github (PA) - https://github.com/WGS-standards-and-analysis
The fact that outbreak/epi related isolates are so close makes the evolutionary genetics of it much simpler. it is harder to do broader evo studies.
If interested: https://github.com/CFSAN-Biostatistics/snp-pipeline

Madeline Galac; Univ. of North Carolina at Charlotte

Integrating Core Genome Phylogenetic Relationships and Isolate Geographic Data to Trace the 2012 Neisseria meningitidis Outbreak in New York City,

Neisseria meningitidis, outbreak associated with MSM. 102 isolates, 79 serogroup C, 2003-2013. 19 outbreak isolates.
Were all the outbreak isolates related?
Assembled illumina with velvet then used xBase to annotate (not Prokka or RAST, maybe an old study)
Found core genome w/orthomcl for single copy ortholog groups and aligned each gene MAFFT and concatenated ~500 genes. then raxml.
Outbreak group was monophyletic, also had isolates from 2008. 2012 outbreak formed a single clone within that clade.
Did a betweeness centrality analysis - i.e. the more a location is connecting other locations to each other
Map the home location onto the tree, use paup to infer the ancestral states/changes. count the changes, make into network visualisation. Brooklyn had highest betweeness. vast majority coming out of brooklyn. this was over 10 year period.
Did same thing for just the 2012 outbreak. there were specific neighbourhoods that played a more important role in this one. Aided study of transmission events.
I would be interested to see whether there was an international aspect to this MSM outbreak as for Shigella http://www.ncbi.nlm.nih.gov/pubmed/25936611 (PA)
Also, multiple SNPs between cases could also be missed steps in transmission chain? FB: Great point. Note that Neisseria are naturally competent for DNA uptake (at 10-3 rate which is really high for bacteria - just spread DNA on a plate containing the 13 bp Neisseria uptake seq in it, spread the bacteria on top, and presto you get colonies the next day transformed with the DNA you wanted to add to them!). So these multiple SNPs between cases should really be studied further to see how they evolved…
http://www.sciencemag.org/content/341/6144/328

Maria Hoffmann; FDA

Whole Genome Sequencing Provides Rapid Traceback of Clinical to Food Sources During a Foodborne Outbreak of Salmonellosis

Salmonella Bareilly associated with wide host range, first isolated in India, 1928.
Retrospective study, 100 isolates, 41 outbreak, 57 from background, going back to 1960s. finished a genome from the outbreak with pacbio.
Bareilly is paraphyletic, one of the phyla associated with only east coast, one with west coast.
Found an arsenic resistance island in salmonella heidelberg as a side effect of investigaing outbreak.
Paper here http://jid.oxfordjournals.org/content/early/2015/06/25/infdis.jiv297.full.pdf

Eija Trees; CDC, Atlanta, GA

Transforming Public Health Microbiology in the United States with Whole Genome Sequencing (WGS) - PulseNet and Beyond

WGS to decrease turn-around-time to 2-4 days from (as much as) months
125$ with Miseq to sequence E. Coli
did she miss some info in that cost total -RL? didn’t mention dna extraction for one (or the cost of the bionumerics license)....good point -RL.
MLST < rMLST < cMLST < wgMLST but all require manual curation
TS: I feel that comparison table (provided by BioNumerics) is very misleading!

David Lipman, NCBI, Bethesda, MD

Pathogen Genomics at NCBI

Ultimately want to deal with 1000, 2000 isolates a day.
Masks the repetitive/phage/mobile parts of the genome before SNP tree-ing (est 4% of genome)
FB: the % of genome masks varies greatly between species though.
TS: Density filtering of SNPs - a proxy for recombination detection? ala ClonalFrameML, Gubbins, BratNextGen
Use “maximum compatibility” trees - does not allow homoplastic sites - all sites must agree with tree.
TS: I found this reference: http://www.ncbi.nlm.nih.gov/pubmed/25634097
Database of AMR genes mentioned but don’t note the collection of sources used.

Dag Hamsen, University of Munster, Munster, Germany

Overview of Tools for Microbial NGS Data Analysis

SURPI was the pipeline used for diagnosis of neuroleptospirosis in NEJM 2014 (TS: which cited our Leptospira genome paper, yay!)
Tablet - next gen sequence assembly visualization
Mapathon - used simulated bacterial data -> BWA + GATK diploid as best combination of tools for SNPs and indels (interesting, is this Unified Genotyper or HaplotypeCaller by GATK? UG has been decommissioned by Broad in favour of HC)

David Aanensen, Imperial College, London, UK

Community and Social Data / Applications for Pathogen Genomic Surveillance

Demoing http://www.spatialepidemiology.net/ the super awesome http://microreact.org/ and also seems awesome http://www.wgsa.net/ (the latter slated for full release in December)

Johnathan Jacobs raised the excellent point via twitter: Is microreact etc open source? Tweet back noted https://github.com/ImperialCollegeLondon and indicated those not there are on the way...

Shorter (15 minute) talks

Xiangyu Deng, University of Georgia, Griffen, GA

Salmonella Serotype Determination Utilizing High-Throughput Genome Sequencing Data

30K isolates serotyped/year by US PH depts
Retrofitting WGS to phenotyping
Serotyping - including backwards compatibility
46 O-antigens, 114 H-antigens => 2500+ serotypes
SeqSero pipeline http://www.denglab.info/SeqSero
http://www.ncbi.nlm.nih.gov/pubmed/25762776
Identify correct allele by multiple rounds of mapping and BLAST
TS: would de novo assembly and BLAST be simpler? the two flagellar (h antigen, fliC and fljB) loci are sometimes 60% similar to each other (DNA or AA ?) (dna), might screw with assembly blast. I think in the right hands the assembly blast. would work.

one phenotype can be underlyed by multiple genotypes in the H antigen determining genes

98.7% accuracy for reads (what is it for assembly?), takes a few minutes (!) on 4 cores. (that’s pretty good)
Did he say 99% accurate for ass+blast? in follow up with him afterwards, he seemed to back track on this (PA)

FB: I hate to say it but it depends how you define “accuracy” - sometimes actually mean precision or recall. Can ask, since having great recall/sensitivity is great, but not at expense of crappy precision/specificity.

TS: Is there a command line version of this?

PA: Author claims “Yes” according to Kat, he said it is available on request - he emailed it to me btw.

Errol Strain - CFSAN, FDA

CFSAN SNP pipeline: a whole genome sequencing data analysis pipeline for food-borne pathogens

good practical advice for software validation http://www.fda.gov/RegulatoryInformation/Guidances/ucm085281.htm

Generate reads with known variants - for testing pipelines: https://github.com/lskatz/lyve-SET

Only use reference < 5000 SNPs away (0.1% divergent)
Some post-facto filtering of phage, manually filtered
Salmonella Newport quite diverse, 15 SNPs might be linked
How to share this kind of information, just in publications, or in some other way?
That scares me (TS) that snp thresholds are so different for different serovars. Need domain experts,

Ivan Liachko, University of Washington, Seattle, WA

Assembling whole genomes from mixed microbial communities using Hi-C

taking advantage of the innovation of reconstructing chromosome conformation in human genetics
paper on this http://www.g3journal.org/content/early/2014/05/22/g3.114.011825.abstract
Hi-C=chromosome conformation capture
cross linking occurs in cell before cell disruption. this allows you to bin contigs from the same original organism. also, within organism you get long range scaffolding information. problem can be chimeras
For the paper: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4455782/
note: some bacteria have multiple copy number chromosomes eg. Neisseria ~ 5
See also Dovetail technology: http://dovetailgenomics.com/

Fangfang Xia, University of Chicago

PATRIC pipeline

Speaker was unable to attend and present.

Rima Khabbaz, CDC, Atlanta GA

Integrating Molecular Technologies in Public Health

Office of Advanced Molecular Detection pioneering integrating WGS into PH
Goals: IT and lab infrastructure expansion, PH workforce (training and career paths for bioinformaticians), develop programs and projects for AMD innovation
AMD in Action
Foodborne diseases (centrepiece)
-culture independent diagnostics
PulseNet (won innovation award), changed how we identify food outbreaks → centralized national DB, creation of PulseNet has resulted in largest recalls ever for PH improvement
eg Listeria surveillance with WGS

#’s more manageable, well characterized human and food/enviro samples
greatly successful, created infrastructure to do WGS in PulseNet (1700 patient, ~2500 food/enviro samples seq)
currently comparing clusters generated by PFGE and WGS
WGS results in more clusters, with fewer cases/cluster

moving to Campy and E. coli surveillance and eventually Salmonella

eg Influenza

CDC Influenza Division important player in surveillance (1 of 5 centres)
monitor virus variation throughout year to inform viral strain selection for vaccine production in Sept (eg 2014 Southern Hemisphere Vaccine)
also monitor antiviral resistance
WGS has changed viral profiling pipeline → genetically profile FIRST then select subset to propagate/isolate followed by phenotypic characterization → faster, cheaper

eg HIV

million people in US living with disease (50% living in 4 states including Cali and Florida)
WGS improves transmission dynamics studies, allows faster PH response (needle exchange, better drug treatment)

eg MERS (Middle Eastern Respiratory Syndrome)

automated microfluidics in barcoding pipeline
several genomes already submitted to Genbank
human seqs track with camels

eg Bourbon virus (emerging in Kansas), tick-borne

WGS for pathogen discovery

eg AMR

WGS adds level of precision, improving knowledge of transmission (endemic in Long term care facilities/nursing homes), highlighted need for regional approach → Centres of Excellence planned

Challenges:

Innovation
Lack of standardization
Automation of analyses for high volumes of data

Charles Chiu, Univ of California

SURPI: Deep Sequencing of Infectious Disease

Omni-omics for infectious disease diagnosis
Focus on metagenomics for clinical infectious disease diagnosis
Agnostic approach-->nearly all microbes can be uniquely identified by NGS
Factors for choosing a platform: cost, speed, volume of data, turn-over-time
Target clinical unmet need: pneumonia (15-25% unknown cause), meningitis/encephalitis (40-60% unknown cause), fever/sepsis (20% unknown cause)
Key:SPEED (mins to hours), epi studies take too long “time is of the essence”
Require sensitivity and accuracy, HIPAA-compliant! (EMR integration), reference databases, user-friendly (for PH workers with no bioinformatics expertise)
Chiu and Miller 2015 for metagenomic pipeline, wet lab part fairly standard but bioinformatics analysis NOT (req “host subtraction” and essentially throw that info away, align remaining reads to pathogen databases)
Computational bottleneck (days to weeks to run this analysis)
Kraken (fast taxonomic classifier), first “unbiased, comprehensive benchmark”
Many other tools NOT benchmarked
SNAP/Bowtie2/STAR (fast nucletotide aligners), 100’s-1000’sx faster (now clinically meaningful timeframes) than BLAST
DIAMOND (fast translated nucleotide/protein aligner)
EDGE Bioinformatics (see Patrick Chaing, Los Alamos), Chris Detter
ONE Codex, best-in-class accuracy, minutes for turn-around-time, HIPAA
PathoScope, modular
GENIUS (COSMOSID)
Pathosphere, suite of tools
SUPRI: Seq based ultra rapid pathogen ID, for bioinformatics nubes, uses entirety of NCBI NT ref DB, clinical version of SUPRI under dev
Cloud version (Google cloud) and laptop version able to run on resource poor settings
NT alignment with SNAP, fast and scalable
Research vs clinical versions→ clinical mods include automated filtering, metadata tagging (background vs contamination vs pathogen), taxonomic classification, pipeline optimization, visualization, server and cloud implementation

eg neuroleptospirosis (Josh Osborne) diagnostics, 2yrs ago misclassified because actual pathogen not in the database

eg male with deafness and behavioural change, plethora of diagnostic tests in hospital, under 5hrs WGS identifies astro virus encephalitis

eg hemorrhagic encephalitis, extensive diagnostics were negative, NGS Dx IDs amoebic infection (Balamuthia, in under a week), couldn’t have made diagnosis earlier b/c Balamuthia poorly represented in ref DB but could have if DB more comprehensive

eg eosinophilic meningitis, tests for viruses, fungi and parasites negative, 2014 DB gives Malassesia (dandruff) top hit, 2015 NGS Dx Angiostrongylus (correct Dx, positive PCR from CDC), Dx had clinical impact!!

going forward, want everything to be in CLIA framework
SUPRI→ CLIA-certified pipelines with 24hr TAT, HIPAA compliant, data integration to get NGS Dx’s into patient EMRs
precision medicine consult team will access data for decision making
has capacity for genotyping but not automated in clinical version
CNS “sterile”, easier to validate sterile sites
Docker container available to disseminate SUPRI

Randall Olsen, Houston Methodist, Tx

Genomics and Transcriptomics in Clinical Microbiology

PCR based tests, MALDI-TOF, WGS to inform and improve patient care
20yrs ago H. influenzae genome seq >1 million, >1yr
“the $10 microorganism genome will soon be a reality”
“day in the life of a microbio lab” → 130 samples from 116 patients, can WGS ID unknown organisms for these?
88.5% concordance with ref method, ID’d Mycobacterium 10 days before conventional culture based diagnosis
10 organisms unable to ID by WGS because of deficiency in ref DB
Lack of fungi in ref DBs!
400 genomes in validation study (bacteria, fungi and viruses) → 600 clinically ordered WGS tests now! used to supplement routine tests (particularly for fungi and Mycobacterium, also Salmonella and Influenza A to get rapid serotype)
Invoke WGS to improve patient care eg AMR, unusual disease presentation (B. cereus → anthrax-like, acquired anthrax toxins on plasmid, informed institutional response)
“Fire drill” for outbreak detection rehearsal, “mock rapid response scenario”, is mock outbreak clonal? what actionable info in clinically relevant timeframe can be generated by WGS → seq analysis in 3 days, select subset and conduct follow up studies
WGS showed 5 clusters,informed transmission not initially appreciated in epi studies
Also examined gene expression profiles (RNA-Seq), transcriptomics show diffs in strains, has ABC capsule virulence factors overexpressed (unexpected based on genomics), non-coding regions in WGS pipeline previously not analyzed, overexpression of yesMN genes (virulence) led to discovery that there was mixed population (resulting in mods to genomics pipeline)
Combining Omics and animal models together enables testing of hypotheses and therapies, integrated as “disaster preparedness plan”

George Garrity, Michigan State Univ, Lansing Mi

A New Genomics-Driven Taxonomy: Are We There Yet?

International Code of Nomenclature of Prokayotes (2008 ed) → anchor points, provide ref organisms
2 culture collections in 2 parts of world → provides refs for changes in platforms, methods etc
regulates nomenclature but not taxonomic methodology!!
field is dynamic, in 1980 2 200 names, currently 15-16 000 (moving target)
35 000 “nomenclatural acts” since 1980
1980 purged 1000’s of names! only 5% names survive from 80yrs ago
taxon calling (“OTUs”) vs identification → need for standards (rigorously validated)
proposal for open experiment setting forth series of test cases to test methodologies (are questions asked correct?)
Analysis and Validation Methods → “Name for Life” Commercial services

objective: create infrastructure to support validation system for ID Bacteia/Archaea to incorp genomics data

Peter Sneath, “father of numerical taxonomy” (does calculations by hand, doesn’t trust computers), max likelihood 16S tree no longer calculable → Garrity to arrange info in Bergey’s Manual
Currently, thresholds for classification overlap
Principal components analysis of data (nucleotide identity, aa identity, kmer)
Latent semantic analysis against 16S data, ANI, AAI
Size of genome is problem
PCA analyses - ancillary plot should contain 85% of data
Need to develop distortion free data viz tool
Pairwise combination anywhere in the heat map
Nearest neighbours → move out and find boundaries of taxa
Classifier goes through matrices of heat maps, >2SD → flag for reclassification (sp level rearrangement)

eg Streptomyces → novel microbial products, nomenclature got “cleaner”

eg Eubacterium should be phylum

eg Mycoplasma could be more genera

take home: statistical use of genomics data to develop better taxonomy

Martin Maiden; Univ. of Oxford, Oxford, United Kingdom

Beyond Typing and Phylogeny: the Population and Functional Genomics of the Neisseria

19yrs ago everyone developing own gel-based methods → gradual adoption of PCR and nt-based detection
MLST based on housekeeping genes
7 loci used for 100 Neisseria, 7 loci ST summarizes 3 284 bp = 0.15% of 2.18 Mb genome (compresses 3200 bp in 7 digits =ST)
11 525 STs, 35K isolates, 507-780 alleles/loci → can use “bursts” to cluster STs (stable complex)

Reviewing BIGSdb http://pubmlst.org/software/database/bigsdb/

PubMLST

1300 submitters, data curated → 90 MLST scheme used for molecular typing, species ID
Autotagger to annotate genomes (can feed into NeighbourNet)

Maiden 2013, Nat Rev Microbiol (Hierarchical genome analysis)

16S-->MLST-->rMLST-->wgMLST

Mentioning Alexander von Humboldt’s Three Stages of Scientific Discovery:

first they deny its true
then they deny its important
then they credit the wrong person

Neisseria spp. - studying diverse phenotypes
Mening carriage across the meningitis belt - study published this year:
“The Diversity of Meningococcal Carriage Across the African Meningitis Belt and the Impact of Vaccination With a Group A Meningococcal Conjugate Vaccine.”
http://jid.oxfordjournals.org/content/212/8/1298.long

More data re vaccinated vs unvaccinated districts - showed herd immunity occurring in the vaccinated region vs non-vaccinated regions. FB: There is a pub associated with herd immunity that Martin mentioned to me at lunch: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3988355/

Also mentioned paper “Implications of Differential Age Distribution of Disease-Associated Meningococcal Lineages for Vaccine Development” http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4054250/

Napoleon: “History is the version of events that people decided to agree upon.”

Abu Mustafa; Kuwait Univ., Jabriya, KUWAIT

Next Generation Sequencing of Brucella melitensis Isolates from Kuwait and Comparative Genome Analyses

Brucellosis - reservoirs include camels, dogs, goats, swine, sheep
found in milk, cheese, dairy
highly infectious, aerosol transmission
potential biological agent, painful illness
top 10 impactful diseases to poverty ridden humans
difficult to diagnose relapse vs re-infection
culture, biochemical characterization, serotyping used traditionally for ID of spp/biovars
new methods needed for surveillance (in Kuwait, all B. melitensis)
identified 15 B. melitensis by PCR and standard methods (16S)
reads trimmed and filtered with FastX tool
QUAST used for assembly quality
2 chromosomes, 1.2 & 2.1 Mb

All unpublished:

going through all methods, parameters in detail at start.
Genomes seq’d immediately reveals one the B. melitensis isolates was an outlier but no other information was provided
10 variants/kb and 14 variants/kb in chrom 1 and 2, respectively
Two major variant groups, plus the one outlier seen clearly in trees.
Mentions isolate-specific variations identified, aiding epi studies as possible markers

Scott Federhen; NCBI, Bethesda, MD

Microbial Genomic Taxonomy at GenBank

“Taxonomy in the trenches”

Type vs genome from type
ProxyType scores vs ANI to type (ANI cutoffs change between spp)
Curation to correct misidentification (NCBI will just change name and add comment block instead of asking permission of author)

Planning to change names in entries that seem to be incorrectly taxonomically predicted - with a comment showing the “ANI” percentage for old name versus (higher ANI) new name as evidence. Going to notify authors of this change, but this is the first time they won’t require author acceptance to change a Genbank entry. FB: This is really notable since the first time genbank is making blanket changes to original genbank entries (rather than their curated RefSeq) in this way without author agreement. However, they had a workshop with taxonomists to consult with them on this, and got agreement on making a blanket change. Federhen says they are trying to be really careful with this one. I’m sure the authors would appreciate the notice, and these fixes are necessary, but author input may also be key to note any errors in the automated approach, and make potential improvements to taxa correction that may be even more accurate.

Kat Holt, Univ of Melbourne, Australia

What do we need from microbial genomics surveillance software?

What are the considerations for using a genomics pipeline in a PH setting?

what are we looking at (bits of genome, SNPs, MLST, Kmer, core genomes, outputs, confidence values)?
how do we know it’s right?
who is doing the analysis? what do i need? what are the inputs? will it all fit in with what I do right now?
reproducibility? robust outcomes? how will the system/pipeline change with future updates, contamination or need for troubleshooting?
are results interpretable? how is metadata integrated?
will the results allow us to make good PH decisions and how will we know?

Errol Strain- CFSAN, FDA

Datasets for the challenge:

Multistate Listeria outbreak (18 isolates)-need to do matching to NCBI enviro/food isolates -> “elementary”
Enteritidis (50 isolates), matching to known clusters -> “more difficult”

Yan Luo - CFSAN, FDA

Bowtie2-->SAMtools-->variants-->customscript for SNPlist-->SNPmatrix
Docs: http://snp-pipeline.readthedocs.org/en/latest/
Listeria:1300SNPs b/w facility 1 and 2, 6 clinical matches to facility 1
METADATA IS IMPORTANT TO INTERPRET YOUR TREE!
Salmonella: more diverse than Listeria, more clusters

Hannes Pouseele - Applied Maths

BioNumerics 7.5 - used wgMLST and wgSNP point and click GUI modules
ie. assembly based + assembly-free ; want both to agree for confidence (within caveats)
rough and fine cluster detection, resolving clusters req’s exposure etc (more metadata)
calculation engine→ “warm shoebox”
Option to have the engine in the cloud rather than physical machine purchase
26min to run one Listeria sampl (Velvet assembly took 16min alone)
included some QC highlights, possible contamination detected

Katja Einer-Jensen - Qiagen (CLC)

pipeline details at poster 6
dashboard includes running analyses side-by-side with metadata

David Aanensen - Imperial College / Sanger

population tree: ref genomes-->FastQ-->draft genomes-->core gene families
new metadata can be added as req’d to csv file
population tree looks nice (visualization), pretty slick, can select “source” metadata to overlay on tree
http://wgsa.net/

Jörg Rothgänger - Ridom

SeqSphere
cgMLST, cluster threshold <10
SRA FATSQ, epi download→ assembly, allele calling → QC and EWS → Tree
nice metadata visualization (isolation source, collection date, geo_loc)
state info missing for clinical cases
ad hoc cgMLST for Enteritidis, more complex tree

SNP typing cluster criteria from FDA

can get SeqSphere (and solve outbreaks) from the comfort of home!

Torsten Seemann - Uni Melbourne

URL: https://github.com/tseemann/nullarbor
Nullarbor pipeline (unix command line): Job name/ID→ csv file input → MLST → phylogeny
sequencing QC→ identified possible contamination/mixed population? (species ID with Kraken), assembly with Megahit to “good enough quality”
resistome report using Abricate software based on assembled contigs
core genome based on alignment to ref using Snippy
ML tree using FastTree
SNP distance matrix that epi’s get to see
Fripan uses Roary to determine pan genome
one Listeria sample seems to have 2 genomes
ref for Salmonella should be within 1000 SNPs
Message: use pan as well as core genome, combine multiple lines of evidence
TS: won prize for being first (only?) pipeline to detect L.innocua contaminant

Nabil-Fareed Alikhan - Uni Warwick

Enterobase: analyses, curation, AMR, pan-genome, core SNPs, AMR → goal: make it useful to all people
simple web interface
Enterobase updates from SRA hourly
detected some QC issues
cgMLST
2-3 clusters identified, need more metadata to resolve cluster 3
BioNumerics 7.5
1% diff=36 alleles
http://enterobase.warwick.ac.uk

Philip Ashton - Public Health England

SnapperDB
GitHub and CLIMG image (cloud infrastructure for bioinformatics, Birmingham, Warwick, Swansea)
install often hardest part of any pipeline
FASTQs→ SNPdb (PostgreSQL) --> SNP alignments→ tree
lists variants and ignored positions
generates SNP address kind of like IP address
connect isolates within 100 SNPs of each other
nice SNP address tree

Aaron Petkau - PHAC, Canada

SNVPhyl, part of Canada’s IRIDA gen epi platform
integrates genomics, epi, lab, clinical metadata
ref mapping → variant ID and filtering → wg phylogeny
implemented in Galaxy (web interface, API, provenance), QA/QC reporting, re-labeling of tree
Listeria: ref produced de novo with SPAdes, remove phage and repeats
matches defined as isolate within 0-4 SNVs
~100 SNVs between facility 1 and 2
removed ASM20, too little data, didn’t meet min coverage of 10x after filtering
3 clusters matching with clinical test dataset
One of the few methods mentioning the use of Galaxy for workflow

Martin Thompson, Centre for Genomic Epidemiology, DTU

KmerFinder and assembly→ ResFinder → MLST based on results from KmerFinder → other “Finders”
batch upload
can download results in excel file
Pipeline is available as a Docker image here: (i know it exists somewhere)
CSIphylogeny (SNP tree), BWA mapping, quality >30, depth >10, distance to nearest SNP >10
Ndtree (Kmer based)
a lot of Salmonella linkage

Zamin Iqbal - Uni Oxford

Cortex asoftware: http://cortexassembler.sourceforge.net/
Reference free de Bruiijn Graph (DBG) - sits between de novo assemnly and read alignment
4000 samples too 2 days using ~16 cores
But can save these caluclations and re-use for future analysis ie. background samples
FastTree for tree
looking only for segregating variation, matter of minutes
Map back coordinates to “close reference” (unclear)
Awesome phage sharing matrix heatmap with hierarchial clustering
Using phage to distinguish close samples on tree
AMR identification module

Nick Greenfield - One Codex

assembly free
focused on improving ref DB (>40 000 distinct genomes, reduce false positives)
no Listeria typing DB
FASTQ→ add metadata→ metagenomic classification
found 2 clusters

Bill Klimke - NCBI

quality issues and standards for NGS
need to draw attention to where all points errors could arise (wet lab/computational analyses) so they can be addressed
samples dependent on metadata and contextual data
sample mixups, contamination, digital data mixups
need better standardized ways of integrating data
QA/QC is moot if upstream errors not reduced/solved

Bruce Budowle, Univ of Texas, Austin, Tx

Microbial forensics and its needs for standards and standardization

Excluding culprits is as important as identifying culprits
Info can be limited but still useful. microbial forensics is multidisciplinary
Bioterrorism investigations complicated by background noise of sporadic and accidental foodborne pathogens at large
Food and agriculture targets eg US vs Canadian BSE in cows, who has madder cows?
Wide number of forensic outcome scenarios, possibly retaliation such as invasion
Who, what, when, where to assess plausibility of bioterrorism acts
Need to supply standards of proof with measures of certainty
Quality assurance guidelines to advise community (valid, rigorous)
Need to define validation (spans collection, shipping and storage, extraction, analysis, interpretation), criteria and outcomes that qualify (and exceptions or alternatives during extreme circumstances)
Need some stability to create gold standard, if technology always changing, not good as benchmark
Practitioners can’t afford dynamic change
Standards: references (DBs and panels), quality metrics and levels; Standard Performance Methods Requirements (SPMRs)
“protect the country”
Clonality, unknown histories, abandon concept of individualization?
Attribution decisions require more info than just genomics (+law, policy, intelligence etc)
Correcting bacterial genome metadata with AutoCurE!!
Marker selection criteria (eg gene scoring)

Goal is attribution - who committed the crime, as well as who did not commit the crime

Science does not have to say something beyond reasonable doubt, that is the requirement of the whole case. microbial forensics, have to deal with plethora of potential culprits.

high background, need epidemiology to distinguish deliberate release. Peanut guy who got arrested - microbial evidence is part of the case, not the whole shebang. was that a joke about native americans and smallpox?

validation - define limitations of technique so don’t go beyond the boundaries of your method. in exigent circumstances, can accept non-’validated’ results.

Gold standard just means more people using it than something else
Don’t want to become a prisoner of QA.

When thinking about adoption of new techniques, need to take a new look at old techniques to make a proper comparison of pros/cons

Paul Keim, Northern Arizona Univ, Flagstaff, Az

Anthrax - Molecular epidemiology and forensics from WGS and metagenomic sampling of complex specimens

The anthrax FBI investigation
B. anthracis strictly clonal, no evidence of LGT
Canonical SNPs (landmarks for naming)
Mutations causing phenotypic diffs all within markers
Outbreak of anthrax could take years to develop, and perhaps decades to detect
nASP (“pipelines are like elbows, everybody’s gotta have at least 2”), open source, ref-dependent (single or pan-genome), supports reads or assemblies, fast, scale linearly
Monsoon
~12 000 SNPs, use for inclusion vs exclusion
“A clade”, out of Africa (10 000yrs ago)
eg anthrax and heroin users in Scotland

200 suspected cases, 100 confirmed
14 deaths
Scotland, England, Germany
using canonical SNPs, 2 Turkish isolates closest to Scottish drug user isolate
concluded that heroin contaminated during smuggling process → feds ran with idea, which turned out to be too strong
expanded European screening → PCR-based assay + bigger ref populations → 2 outbreaks!
but injectional anthrax groups overlapped in time and space
req’d bilateral agreements → model collaborative project to get Germany to work with US (contracts in place)

Soviet Union weaponized spores in industrial complex (Sverdlovsk)

1979 left most filters off production facility, rupturing remaining filters, sending out plume of spores
in violation of international treaties
US obtained fixed pathology samples from victims, PCR confirmed B. anthracis
how low can you go and still ID strains? normally 50-100x coverage, 20x, 10x, 1x (would result in 10 000 miscalled SNPs at 1x - only 12 000 known SNPs in species)?
WG - FAST (focus array SNP typing)
turns out 1x can be done! oly genotype SNPs you already know!

Placement confidence landscape (with E. coli, 270 genomes x 255 000 SNPS, phylogenetic position matters, only 500 SNPs req’d to place!)
can examine AMR to see if Russians using AMR strains → absolutely WT
Based on Monte Carlo resampling, need 500 SNPs to ID/place in tree with 95% accuracy
No culture? no problem with GOOD REF!!

END

The Genome Factory

Sunday, 4 October 2015

ASM NGS 2015 - Meeting Notes