Yeast “adopt a proto-gene” project

What is the adopt a proto-gene project?

In a synergistic educational activity designed to promote literacy in evolutionary biology,  a novel “adopt a proto-gene” initiative provides resources for students and educators working with undergraduates to characterize individual proto-genes at their home institution. The project provides:

  • modules for undergraduate students to explore proto-genes in the model eukaryote Saccharomyces cerevisiae. (below on this page)
  • Virtual workshops to assist faculty in using the modules with undergraduates at their home institutions
  • summer undergraduate research experiences at the University of Pittsburgh

What is a proto-gene?

It has become increasingly clear that eukaryotic genomes are pervasively transcribed and translated.
Thousands of small, evolutionarily novel polypeptides expand the coding potential of fungal, plant and
animal genomes beyond established protein-coding genes. Genomic scientists have proposed that pervasive
translation generates a reservoir of “proto-genes” that promote de novo gene birth by exposing genetic
variation to natural selection in the form of novel polypeptides. Some proto-genes are occasionally
retained by selection and become de novo genes, but most eventually return to a non-genic state. Aside
from their evolutionary potential, how do proto-genes impact cell biology? The physiological significance
of proto-genes has not yet been systematically explored in any species. As a result of this gap in
knowledge, current models of cellular systems are missing thousands of genetic elements that are
potentially critical for understanding genotype-phenotype relationships. This missing biology is likely to
explain key molecular differences between species, to unveil novel mechanisms of evolutionary
adaptation, and to shed light on the first steps of de novo gene emergence.

The “adopt a proto-gene” initiative is support by an NSF-CAREER award to Anne-Ruxandra Carvunis, Associate Professor in the Department of Computational and Systems Biology at the University of Pittsburgh School of Medicine

Lab Modules

These modules allow students to utilize cutting-edge bioinformatics algorithms to explore proto-genes via web based tools without requiring any coding experience. Linked here are proto-genes we pre-curated to serve as good illustrations for each module.

Module 1. Genome Browser
This module provides an introduction to the SGD site and genome browser JBrowse. In this module participants will learn how to use a genome browser to view the position of genes relative to one another and how to integrate across different data types at the genome scale.  A video walk through of this module can be viewed at

Genome Browser Guide (google doc) (PDF)

Genome Browser Worksheet (google doc) (PDF)

Module 2. Cellular Localization
This module provides instructions to predict protein localization from amino acid sequence and to identify sequence or structure motifs important for predicting localization. A video walk through of this module can be viewed at

Cellular Localization Guide (google doc) (PDF)

Cellular Localization Worksheet (google doc) (PDF)

Module 3. Structure Prediction
This module provides instructions to predict protein structure from amino acid sequence and search for proteins with similar structures using cutting-edge machine learning algorithms such as ESMFold and Foldseek. Coming soon: a video walk through of this module.

Structure Prediction Guide (google doc) (PDF)

Structure Prediction Worksheet (google doc) (PDF)

Module 4. Coexpression
This module provides instructions on how to query a large coexpression network to identify genes and other proto-genes that have similar transcriptional patterns. Coming soon: a video walk through of this module.

Coexpression Guide (google doc) (PDF)

Coexpression Worksheet (google doc) (PDF)

Module 5. Ancestral Reconstruction
This module provides instructions for reconstructing the ancestral sequences that gave rise to a given extant sequence and use alignment tools to compare sequence similarities across yeast species. Coming soon: a video walk through of this module.

Ancestral Reconstruction Guide (google doc) (PDF)

Ancestral Reconstruction Worksheet (google doc) (PDF)

other links

list of proto-genes (google sheet) (excel file)

2023 workshop welcome & introduction presentation (PDF)

student presentation template (google slides)

faculty presentation template (google slides)


How to Use a Genome Browser: JBROWSE: GUIDE

How to Use a Genome Browser: JBROWSE: WORKSHEET

NOTE: If worksheet (docx) links don’t work, right click, copy link address and paste in a new tab to download