Current genome and transcriptome annotations depend on several assumptions, some of which have prevented the detection of new proteins. Here are a few examples of common rules:
- mature mRNAs contain only one protein-coding open reading frame (ORF), also termed coding sequence (CDS)
- genes annotated as pseudogenes do not code for proteins
- RNAs annotated as non-coding do not encode proteins
- mRNA regions annotated as 5'UTRs and 3'UTRs do not contain any CDS
These annotations have been used to generate and/or supplement protein databases (e.g. UniProt) that are cornerstones of modern biology. Thus, mass spectrometry (MS)-based proteomics detects only annotated proteins present in these databases. Proteins encoded by unannotated (or alternative) ORFs cannot be detected. There is now irrefutable evidence for the existence of proteins encoded by alternative ORFs (altORFs) found in mRNAs or "non-coding" RNAs. These alternative proteins (altProts) cannot be detected with existing protein databases (PMID: 28629911, 29083303).
OpenProt is a novel, freely accessible platform that functionally annotates altProts in different organisms and helps discover novel proteins that have gone unnoticed until now.
OBJECTIVES OF THE SYMPOSIUM
We hope that OpenProt will allow MS-based proteomics to provide a more comprehensive characterization of biological systems.