中央研究院 生物化學研究所

The human genome contains thousands of translated short open reading frames (ORFs) that encode previously unannotated microproteins, revealing an unexpected layer of regulatory complexity beyond canonical genes. This “dark proteome” can arise from untranslated regions of mRNAs – such as upstream ORFs (uORFs) located in the 5’ untranslated region – or from transcripts previously annotated as long non-coding RNAs (lncRNAs). However, the prevalence, functions, and regulatory logic of these non-canonical ORFs remain poorly understood.
Here, we combine ribosome profiling (Ribo-Seq), CRISPR-based screens, and high-throughput functional assays to systematically interrogate the functional landscape of translated non-canonical ORFs in human cells. These studies identify hundreds of previously uncharacterized microproteins that regulate diverse and fundamental cellular processes. In particular, we uncover multiple examples in which microproteins encoded by uORFs regulate the function of the canonical protein encoded by the same mRNA, suggesting that mammalian transcripts can operate as coordinated regulatory modules analogous to bicistronic operons.
More recently, our work has revealed an additional layer of complexity in which certain transcripts function as bifunctional or “moonlighting” RNAs, acting both as regulatory lncRNAs in the nucleus and as mRNAs that encode functional microproteins in the cytoplasm. These dual-function transcripts coordinate distinct nuclear and cytoplasmic regulatory processes, linking chromatin regulation, signaling pathways, and cellular differentiation programs.
Together, these findings expand the functional landscape of the human genome, revealing a widespread class of microproteins and bifunctional RNAs that regulate cell identity, signaling, and developmental programs. Understanding this hidden layer of gene regulation opens new avenues for exploring cellular biology and identifying potential therapeutic targets.