The 3D genomic structure is essential for maintaining and regulating the cellular functions of microorganisms in response to environmental changes. Currently, our knowledge of the 3D genomic structures of microorganisms largely relies on Hi-C sequencing analyses of isolated pure strains in laboratories. However, the population-level variations of these structures in the environment remains unclear due to technical limitations. The goal of this study is to capture the microdiversity of 3D genomic structures in microbial natural populations.
Methods
We used a prevalent E. coli isolate as a reference and reconstructed its genomic spatial structure using Hi-C sequencing data. We trained a deep-learning model for this strain to infer Hi-C contact maps based on DNA sequences. Next, we aligned the metagenomic reads of host-associated and environmental samples to the E. coli strain’s reference genome to identify Single Nucleotide Polymorphisms (SNPs) in nature populations. Then, we used the trained deep-learning model to infer corresponding variations in the genomic 3D structure.
Results
Our model effectively learned genomic regions with unstable structures caused by transcriptional activity, as well as regions with stable structures related to horizontal genomic islands and nucleoid-associated protein binding sites. A large-scale comparison of samples revealed a multitude of microdiversity in the 3D structure of the genome in the natural population of the E. coli strain. Approximately one-sixth of the SNPs were found to sensitively affect genomic 3D structure. Structure Variation Units (SVUs, 100bp bins) were enriched in regulatory regions of gene expression, suggesting potential adaptation to selective pressures. Association studies of SVUs and clinical traits of human gut samples revealed novel targets for diagnosing and treating of E. coli-related enteric disorders.
Conclusion
This study offers the first glimpse of variation in the 3D genomic structure of natural microbial populations. It also demonstrates a new fashion in a metagenomics-wide association study, which has great potential for therapeutic development.