University of Florida Homepage

Biology

Juannan Zhou

Assistant Professor

PH.D., University of Maryland, 2017

122 Bartram Hall

juannanzhou@ufl.edu

Research Overview

Research in the Zhou lab is centered around the fundamental biological question of how genotype determines phenotype. For example, the amino acid sequence of a protein determines its 3D structures, which in turn affects its function. And any mutations in the sequence can lead to changes in the function of the protein. This mapping from sequence to phenotype and to fitness is critical to evolutionary biology as it determines the tempo and mode of evolution. It also has many real-world applications, including predicting the evolution of antimicrobial resistance, predicting disease phenotype from genomics data, and engineering highly optimized biomolecules. The study of genotype-phenotype was pioneered by Sewall Wright’s introduction of “fitness landscapes”. Recently the advancement in high-throughput phenotyping assays, such as deep mutational scanning, have provided important tools for measuring the structure of real fitness landscapes of proteins and regulatory elements.

 

The overarching goal of our lab is to integrate machine learning and high-throughput experiments to reveal both local and global structures of the fitness landscapes of proteins and complex traits and extract general principles of adaptive evolution. We are currently developing both supervised and unsupervised methods for inferring the structure of fitness landscapes.

The supervised methods utilize labeled data, for example, those generated by high-throughput phenotyping assays. We use the supervised methods for two main purposes: (a) accurately predicting the phenotype for novel variants; (b) extracting high-level qualitative features of the fitness landscape. Currently, we are developing a kernel-based method applicable to complex traits in diploid organisms. The major advantage of this method is its capability to model any complex patterns of genetic interaction potentially present in diploid genotype-phenotype maps. Thus, it will allow us to learn the genetic architecture of complex traits such as disease risks and agricultural phenotypes, and further utilize this information for making accurate genomic predictions.

In the meantime, the lab is also working on using unsupervised models to learn the structure of fitness landscapes. Recent unsupervised models trained on large corpora of protein sequences have been shown to learn very rich biological information about protein sequences. Right now, we are exploring ways to exploit the ability of unsupervised methods to detect broad biological patterns for studying the structure of protein fitness landscapes and extracting general principles of adaptive evolution. Current research includes (a) training protein language models to predict adaptive substitutions; (b) using language models to predict the effect of combinations of mutations in novel variants; (c) exploring novel network architecture for protein language models and generative models.

In addition to the machine learning projects, our lab will also start to use wet-lab experiments to empirically measure the genotype-phenotype map for proteins as well as complex traits.

Our experimental approach is built on an engineered genetic system that measures the function of a genotype (for example, a gene variant) by (a) expressing it in yeast cells and (b) linking its molecular phenotype (for example protein activity) to certain selectable traits in the yeast host (e.g growth rate). To multiplex this procedure, we combine phenotypic selection and high-throughput sequencing technology which now allows us to measure the function for thousands up to millions of sequence variants in a single experiment.

Our lab will use this approach to map the fitness landscape of biomedically and environmentally important proteins. In the near future, we also will the high-throughput technology to study the fitness landscape of complex traits using diploid yeast.

Open positions

We are currently seeking highly motivated postdocs, graduate students, and lab technicians to join the lab. If you are interested in evolutionary biology and machine learning, please contact me at juannanzhou@ufl.edu.

Representative publications

  • Chen, W. C., Zhou, J., Sheltzer, J. M., Kinney, J. B., & McCandlish, D. M. (2021). Non-parametric Bayesian density estimation for biological sequence space with applications to pre-mRNA splicing and the karyotypic diversity of human cancer. in press PNAS. https://doi.org/10.1101/2020.11.25.399253
  • Zhou, J., & McCandlish, D. M. (2020). Minimum epistasis interpolation for sequence-function relationships. Nature communications, 11(1), 1-14. https://doi.org/10.1038/s41467-020-15512-5
  • Zhou, J., Wong, M. S., Chen, W. C., Krainer, A. R., Kinney, J. B., & McCandlish, D. M. (2020). Empirical variance component regression for sequence-function relationships. https://doi.org/10.1101/2020.10.14.339804
  • Posfai, A., Zhou, J., Plotkin, J. B., Kinney, J. B., & McCandlish, D. M. (2018). Selection for protein stability enriches for epistatic interactions. Genes, 9(9), 423. https://doi.org/10.3390/genes9090423
  • Zhou, J., Reynolds, R. J., Zimmer, E. A., Dudash, M. R., & Fenster, C. B. (2020). Variable and sexually conflicting selection on Silene stellata floral traits by a putative moth pollinator selective agent. Evolution, 74(7), 1321-1334. https://doi.org/10.1111/evo.13965