← Back to Research

Project

Bioinformatic Meta-Analyses

Last updated on Sep 4, 2019

Bioinformatic Meta-Analyses

DNA and RNA sequence data are being deposited in public repositories at an astonishing rate. The quantity of sequence data in GenBank has doubled approximately every 18 months since 1982. This is an extraordinarily rich source of information about microbial life.

I argue that it is impossible to use all of the information in a sequence data set in a single study, and that we can learn a great deal by combining data from disparate studies that were not initially intended to be compared.

What We’ve Done

Through bioinformatic meta-analyses, the Steen Lab has:

  • Quantified uncultured microbes: Determined that high proportions of bacteria and archaea across most biomes remain uncultured
  • Explored trait conservation: Studied the degree to which microbial traits are conserved as a function of taxonomic rank
  • Optimized metagenomics: Investigated the relationship between DNA sequencing effort and the quantity of metagenome-assembled genomes (MAGs)
  • Developed ML tools: Built a deep learning-based alignment-free sequence similarity search tool

Our approach treats public databases as an untapped scientific resource โ€” one that can reveal fundamental truths about the structure and function of life on Earth.


Drew Steen
project

Drew Steen

Assistant Professor of Microbiology and Earth and Planetary Sciences

We in the Steen Lab want to understand how microbes interact with organic matter in aquatic systems. To do that, I use the tools of organic geochemistry as well as microbial ecology. These questions have lead us to work on new approaches to analyze DNA sequences from environmental microbiomes and to study the distribution of taxa and functions across all of microbial life.