New & Noteworthy

Yeast Biochemical Pathways incorporated into Gene Ontology annotations

April 09, 2025

YeastPathways, the database of metabolic pathways and enzymes in the budding yeast Saccharomyces cerevisiae, is manually curated and maintained by the biocuration team at SGD.

This resource is jam-packed with information, but somewhat hidden from view. We have been doing different things recently to make the pathways more readily accessible. Some time ago we added a new section with pathways links on the relevant gene pages (ex. DFR1).

We also made the pathways available in SGD Search.

Now we have transformed the metabolic pathways and associated genes/enzymes into Gene Ontology (GO) annotations (ex. DFR1).

Because many fundamental molecular processes and pathways are evolutionarily conserved between yeast and higher eukaryotes, including humans, the curated metabolic pathway information has great value for the transfer of knowledge to other organisms. It is for this reason that the YeastPathways data were exported in BioPAX (Demir et al. 2010) format for import into Noctua, a tool for collaborative curation of biological pathways and gene annotations that was developed by the GO Consortium (Thomas et al. 2019). BioPAX provides a standardized format for representing biological pathways, allowing researchers to integrate pathway information from different sources and databases. Noctua can import pathway data encoded in BioPAX format to populate the pathway editor with molecular interactions, biological processes, and regulatory relationships, and can utilize BioPAX files to combine pathway data from multiple datasets for pathway curation and analysis.

Pathways curated and edited in Noctua can be exported both as GO annotations for yeast and orthologous genes in other species, or as pathway annotations in BioPAX, which facilitates the sharing of curated pathways with other researchers, databases, and pathway analysis tools using a standard format, promoting data exchange, and collaboration within the scientific community.

Categories: Data updates

Reference Genome Annotation Update R64.5

June 19, 2024

The S. cerevisiae strain S288C reference genome annotation was updated. The new genome annotation is release R64.5.1, dated 2024-05-29. Note that the underlying genome sequence itself was not altered. The chromosome sequences remain stable and unchanged.

R64.5 Annotation update summary

This annotation update included (details in table below):

R64.5 Annotation update details

ChrFeatureDescription of changeReference
IIATG12/YBR217WNew uORF chrII:657824..657835, partially overlaps CDSYang Y, et al. (2023) PMID:35363116
IVYDL204W-ANew ORF chrIV:94133..94285Wacholder A, et al. (2023) PMID:37164009
VIYFR035W-ANew ORF chrVI:226260..226550Wacholder A and Carvunis AR (2023) PMID:38048358
VIIYGR016C-ANew ORF chrVII:523353..523246Wacholder A, et al. (2023) PMID:37164009, Chang S, et al. (2023) PMID:37927910
IXEFM4/YIL064WMove start 84 nucleotides downstream, new coordinates chrIX:242027..242716Hamey JJ, et al. (2024)PMID:38199565
IXYIL059CChange ORF qualifier from Dubious to Verified because stable translation product detectedWacholder A and Carvunis AR (2023) PMID:38048358
XIIIYMR106W-ANew ORF chrXIII:480924..481187Wacholder A and Carvunis AR (2023) PMID:38048358
XIVYNL040C-ANew ORF chrXIV:552558..552478Wacholder A, et al. (2023) PMID:37164009
XIVYNL155C-ANew ORF chrXIV:342135..341911Wacholder A and Carvunis AR (2023) PMID:38048358
XVATG19/YOL082WNew uORF chrXV:168632..168679Yang Y, et al. (2023) PMID:35363116
XVIATG5/YPL149W4 new uORFs: chrXVI:271236..271277, chrXVI:271252..271302, chrXVI:271299..271307, chrXVI:271302..271307Yang Y, et al. (2023) PMID:35363116
XVIATG13/YPR185WNew uORF chrXVI:907211..907351, partially overlaps CDSYang Y, et al. (2023) PMID:35363116

Categories: Data updates

Changes to Saccharomyces cerevisiae GFF3 file

March 01, 2024

The saccharomyces_cerevisiae.gff contains sequence features of Saccharomyces cerevisiae and related information such as Locus descriptions and GO annotations. It is fully compatible with Generic Feature Format Version 3. It is updated weekly.

After November 2020, SGD updated the transcripts in the GFF file to reflect the experimentally determined transcripts (Pelechano et al. 2013, Ng et al. 2020), when possible. The longest transcripts were determined for two different growth media – galactose and dextrose. When available, experimentally determined transcripts for one or both conditions were added for a gene. When this data was absent, transcripts matching the start and stop coordinates of an open reading frame (ORF) were used. 

Old version: BDH2/YAL061W with longest transcripts expressed in GAL and in YPD.

Beginning in February 2024, SGD increased the start and stop coordinates of genes to encompass the start and stop coordinates of the longest experimentally determined transcripts, regardless of condition.  This change was made in order to comply with JBrowse 2, a newer and more extensible genome browser, which requires that parent features in GFF files (genes) are larger than child features (mRNA, CDS, etc) (Diesh et al., 2023). 

After February 2024: BDH2/YAL061W with increased start/stop coordinates.

This is a standard format used by many groups. SGD uses the GFF file to load the reference tracks in SGD’s genome browser resource.

Categories: Announcements, Data updates

Tags: biology, blog, genetics, news, Saccharomyces cerevisiae

Allele SGDIDs added to YeastMine

September 28, 2023

YeastMine is SGD’s data warehouse, powered by InterMine. We have so many templates (i.e., pre-defined queries) that provide access to so many different kinds of data!

A big area of focus for SGD and the yeast community is alleles. Alleles are different versions of genes that vary in DNA and sometimes protein sequence. Did you know that you can easily and quickly get all curated yeast allele data directly from YeastMine?

From the YeastMine home page, click ‘Templates‘ at top left. From there, filter for ‘allele’.

The Genes -> Alleles template returns data for one gene or a list of genes or the entire genome! Data include standard and systematic names for genes, gene name descriptions, allele names and descriptions, allele types, aliases, and references. SGDIDs for genes are included, and now SGDIDs for the alleles have been added. Previously, this query returned all of these data without the SGDIDs for the alleles. Based on user feedback, we have now made these allele SGDIDs available, so that they can be used to identify and distinguish different alleles. Enjoy!

There are thousands of alleles in SGD! Give the YeastMine Genes -> Alleles template a whirl! Get all the alleles for your favorite gene or list of genes.

For help using YeastMine, please see the SGD Help Pages and YouTube Channel.

Categories: Data updates, Website changes

Downloads files added to YeastMine

September 20, 2023

Back in the day, SGD maintained an FTP site to distribute data in various files. More recently, you have found these files in the SGD Downloads site. We have now moved these files to YeastMine:

From the YeastMine homepage, click Templates at top left. In the Filter, select ‘Downloads’ to constrain the list of templates.

The following templates are listed under Downloads:

Deleted Merged Features: Retrieve all deleted and merged features.

Retrieve Functional Complementation for genes: For gene(s), retrieve information about cross-species functional complementation between yeast and another species.

Retrieve GO Terms: Retrieve GO Terms, including name, ID, namespace, and definition.

Retrieve SGD chromosomal Features: Retrieve genes and other chromosomal features, including IDs, coordinates, and descriptions.

Retrieve all cross-references for all genes: Retrieve IDs for yeast gene and gene products in other databases.

Retrieve all domains of all genes: Retrieve Proteins/Genes that have a given domain.

Retrieve all interactions for all genes: Retrieve physical and genetic interactions for all genes.

Retrieve all pathways for all genes: Retrieve all metabolic pathways for all genes.

Retrieve protein properties of all proteins of ORFs: Retrieve protein properties, including pI, molecular weight, N-terminal and C-terminal sequences, codon bias, etc. of all proteins.

For help using YeastMine, please see the SGD Help Pages and YouTube Channel.

Categories: Data updates, Tutorial, Website changes

Reference Genome Annotation Update R64.4

September 08, 2023

The S. cerevisiae strain S288C reference genome annotation was updated. The new genome annotation is release R64.4.1, dated 2023-08-23. Note that the underlying genome sequence itself was not altered in any way.

This annotation update included:

R64.4 Annotation update details

ChrFeatureDescription of changeReference
IIISUT035/YNCC0015WNew ncRNA
chrIII:205766..205942 (+ strand)
Xu Z, et al. (2009) PMID:19169243,Balarezo-Cisneros LN, et al. (2021) PMID:33493158
IVYDR278CChange ORF qualifier from Uncharacterized to DubiousRequested by NCBI
IVSUT053/YNCD0033WNew ncRNA
chrIV:506334..507774 (+ strand)
Xu Z, et al. (2009) PMID:19169243,Balarezo-Cisneros LN, et al. (2021) PMID:33493158
IVSUT468/YNCD0034CNew ncRNA
chrIV:506546..507450 (- strand)
Xu Z, et al. (2009) PMID:19169243,Balarezo-Cisneros LN, et al. (2021) PMID:33493158
VIISUT532/YNCG0047CNew ncRNA
chrVII:17213..17709 (- strand)
Xu Z, et al. (2009) PMID:19169243,Balarezo-Cisneros LN, et al. (2021) PMID:33493158
VIISUT125/YNCG0048WNew ncRNA
chrVII:650855..651159 (+ strand)
Xu Z, et al. (2009) PMID:19169243,Balarezo-Cisneros LN, et al. (2021) PMID:33493158, Feng MW, et al. (2022) PMID:36712349
VIISUT126/YNCG0049WNew ncRNA
chrVII:660087..661399 (+ strand)
Xu Z, et al. (2009) PMID:19169243,Balarezo-Cisneros LN, et al. (2021) PMID:33493158
XIIFPS1/YLL043WNew uORF
uORF2 3 codons chrXII:49924..49932 (+ strand) ATGCATTAA
Cartwright SP, et al. (2017) PMID:28279185
XIVACC1/YNR016CNew uORF
4 codons chrXIV:661704..661715 (- strand) ATGTGTTTATAA
Blank HM, et al. (2017) PMID:28057705
XIVHOL1/YNR055CNew uORF
7 codons chrXIV:730381..730401 (- strand) ATGCTATTACTACCAAGTTGA
Vindu A, et al. (2021) PMID:34375581
XVYOL013W-AChange ORF qualifier from Uncharacterized to DubiousRequested by NCBI
XVISUT390/YNCP0025WNew ncRNA
chrXVI:52977..53465 (+ strand)
Xu Z, et al. (2009) PMID:19169243, Feng MW, et al. (2022) PMID:36712349
XVISUT418/YNCP0026WNew ncRNA
chrXVI:588998..589830 (+ strand)
Xu Z, et al. (2009) PMID:19169243, Feng MW, et al. (2022) PMID:36712349
XVIYPR108W-AChange ORF qualifier from Uncharacterized to DubiousRequested by NCBI

Various sequence and annotation files are available on SGD’s Downloads site.

Categories: Data updates

Tags: genome annotation update, Saccharomyces cerevisiae

Predicted 3D Structures of Yeast Complexes

January 20, 2022

In an exciting new paper, Humphreys et al. describe the use of deep-learning-based algorithms to predict structures of not only single proteins, but assemblies of proteins. The team used rapid RoseTTAFold combined with the more accurate AlphaFold to build structural models for 106 previously unidentified protein assemblies and 806 complexes that had not been structurally characterized. The complexes have up to five subunits and are involved in numerous critical roles in cell biology.

Examples of predicted complexes from Humphreys et al.

Go look for your own proteins of interest at the ModelArchive and search in the Home page. Also find the link on the resources section of the SGD Interaction and Protein pages.

Categories: Announcements, Data updates, Paper of the Week

Tags: protein complex, Saccharomyces cerevisiae, yeast protein assembly

Protein Complex Page Updates

December 01, 2021

SGD has updated our protein complex pages to have the same format as gene pages, with tabs across the top for each category of information, including a Summary page, a new Gene Ontology page, and a new Literature page for each complex. Just as we do for all of your favorite genes, Gene Ontology and Literature curation for complexes will be ongoing.

Summary page and new Literature page

If you have any questions or feedback about the updates to our complex pages, please do not hesitate to contact us at any time.

Categories: Announcements, Data updates, Website changes

Tags: protein complex, Saccharomyces cerevisiae

New links to AlphaFold 3D Predicted Protein Structure Database

November 09, 2021

  • The links through SGD give quick access to EMBLEuropean Bioinformatics Institute‘s new, highly accurate tool for predicting protein structure.
  • Given a peptide sequence for an uncharacterized protein, AlphaFold will model predicted domains and provide relative confidence levels for each portion of the prediction.
  • The predicted domains can then be compared to known protein structures (using a tool such as PDBeFold to seek matches to characterized protein families).
  • Whether or not a family is identified, the comparison will yield clues to protein function to help design the next experiments.
Structure of Hog1p

Categories: Data updates

Tags: AlphaFold, new tools

Updates to legacy gene names

November 05, 2021

SGD has long been the keeper of the official Saccharomyces cerevisiae gene nomenclature. Robert Mortimer handed over this responsibility to SGD in 1993 after maintaining the yeast genetic map and gene nomenclature for 30 years. 

The accepted format for gene names in S. cerevisiae comprises three uppercase letters followed by a number. The letters typically signify a phrase (referred to as the “Name Description” in SGD) that provides information about a function, mutant phenotype, or process related to that gene, for example “ADE” for “ADEnine biosynthesis” or “CDC” for “Cell Division Cycle”. Gene names for many types of chromosomal features follow this basic format regardless of the type of feature named, whether an ORF, a tRNA, another type of non-coding RNA, an ARS, or a genetic locus. Some S. cerevisiae gene names that pre-date the current nomenclature standards do not conform to this format, such as MRLP38RPL1A, and OM45

A few historical gene names predate both the nomenclature standards and the database, and were less computer-friendly than more recent gene names, due to the presence of punctuation. SGD recently updated these gene names to be consistent with current standards and to be more software-friendly by removing punctuation. The old names for these four genes have been retained as aliases.

ORFOld gene nameNew gene name
YGL234WADE5,7ADE57
YER069WARG5,6ARG56
YBR208CDUR1,2DUR12
YIL154CIMP2′IMP21

Categories: Announcements, Data updates

Tags: gene nomenclature

Next