Human cells generate a vast repertoire of noncoding RNAs, often referred to as “RNA dark matter,” including a highly complex small RNA (sRNA) transcriptome. The origin and functional relevance of most sRNAs remain unclear and are frequently presumed to represent degradation byproducts. Here, we systematically interrogated the human sRNA transcriptome to determine whether a substantial fraction of unannotated sRNAs correspond to previously unrecognized members of established functional classes, particularly microRNAs (miRNAs).
We developed and rigorously validated an integrative miRNA discovery pipeline that combines deep small RNA sequencing data with stringent structural and biogenesis criteria. Using this framework, we identified at least 2,726 novel canonical miRNA loci in a single human cell line—exceeding the 1,914 previously annotated miRNA loci. Notably, the majority of these newly identified miRNAs define novel miRNA families, suggesting a dramatic expansion of the known miRNA repertoire. Extrapolation of our results indicates that tens of thousands of human miRNAs may remain undiscovered.
Strikingly, a substantial proportion of novel miRNAs originate from exonic regions of protein-coding genes, highlighting an unexpectedly interleaved and multilayered genomic architecture. These findings demonstrate that the human sRNA transcriptome harbors a vast reservoir of functional regulatory molecules and suggest that our current understanding of miRNA-mediated gene regulation represents only a small fraction of its true complexity.