Estimating Metagenome Diversity.
University of California Los Angeles, 2018.
Abstract. Although reduced microbiome diversity has been linked to various diseases, estimating the diversity of bacterial communities (the number and the total length of distinct genomes within a metagenome) remains an open problem in microbial ecology. I describe the first analysis of microbial diversity using long reads without any assumption on the frequencies of genomes within a metagenome and without requiring a large database that covers the total diversity.
Fundamental Algorithms in Deep Sequencing: Assembly
Simons Institute for the Theory of Computing, University of California, Berkeley, January 2016
Abstract. Deep sequencing has become ubiquitous in genomics research due to plummeting costs and massive data volumes. However, it raises formidable algorithmic challenges. In this first mini course, a flipped class, we will learn how graph theory can be used to assemble genomes and will review the recent advances in DNA sequencing. To prepare for the class, students will be provided with an opportunity to enroll in the short Genome Sequencing MOOC on Coursera before the class starts. Instead of lecturing in the class, the instructor will interact with students to answer their questions about the material and the recent trends in genome assembly
Genome Assembly from Single Cells. Distinguished Lecture at Johns Hopkins University, Baltimore, Maryland, April 2015
Abstract. The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A goal of single-cell genomics (SCG) is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of SCG data is challenging because of highly non-uniform read coverage and highly elevated levels of chimeric reads/read-pairs. We describe SPAdes, an assembler for both SCG and standard (multicell) assembly that incorporates a number of new algorithmic ideas. We demonstrate that recently developed single-cell assemblers not only enable single-cell sequencing, but also improve on conventional assemblers on their own turf. We further describe (i) TrueSPAdes that assembles accurate and long (10Kb) reads generated by the recently released Illumina TrueSeq technology, (ii) transSPAdes for transcriptome assembly, and (iii) dipSPAdes for assembling highly polymorphic diploid genomes. Finally, we show that the de Bruijn graph assembly approach is well suited to assembling long and highly inaccurate SMRT reads generated by Pacific Biosciences.
Technology-Enhanced Education at UC San Diego, La Jolla, California ​May 2014
Birth and Death of Fragile Chromosomal Regions in Mammalian Evolution
Simons Institute for the Theory of Computing, University of California, Berkeley, March 2014
Abstract. A fundamental question in chromosome evolution is whether there exist fragile regions (rearrangement hotspots) where chromosomal rearrangements are happening over and over again. We demonstrate that the fragile regions do exist and further show that they are subject to a ``birth and death'' process, implying that fragility has limited evolutionary lifespan. To establish this biological result we will prove some theorems about the breakpoint graphs, the workhorse of genome rearrangement studies. We further illustrate that both breakpoint graphs and de Bruijn graphs are special cases of a more general notion of A-Bruijn graphs that found many applications in computational biology.
Commencement Speech and the Honorary Degree Award Ceremony
Simon Fraser University, Vancouver, Canada, 2013
Genome Assembly and Seven Bridges of Konigsberg (Part 1 Part2) Konigsberg (now Kaliningrad, Russia), 2013
Personalized Genomics: From Experimental to Computational Problems.
Polit.ru Lectorium. Moscow. Russia. September 2012 (in Russian)
Genome Rearrangements: from Biological Problems to Combinatorial Algorithms (and back).
Steklov Mathematical Institute, Russian Academy of Sciences, Saint Petersburg, Russia, May 2011 (in Russian)
Genome Rearrangements: from Biological Problems to Combinatorial Algorithms (and back).
Saint Petersburg Computer Science Club. Saint Petersburg, Russia, December 2010 (in Russian)
De Novo Sequencing with Short Reads: Does the Read Length Matter? Joint Genome Institute, Walnut Creek, California, 2009
UCSD Faculty Excellence Award, La Jolla, California, 2007
Panel Discussion on Information Theory. Information Theory and Applications Conference, La Jolla, California, April 2006
Gene Hunting Without Genome Sequencing: the Twenty Questions Game with Genes.
Mathematical Sciences Research Institute, Berkeley, California, May 1998.