Scientists at MIT have created a new mathematical approach that could help unlock the mysteries of how genes work together to influence cell behavioUr. The method, which relies solely on observational data, offers a more efficient way to study the complex interactions among the approximately 20,000 genes in the human genome.
Led by Caroline Uhler, professor in MIT's Department of Electrical Engineering and Computer Science and director of the Eric and Wendy Schmidt Center at the Broad Institute, the research team developed a technique using machine learning to identify and group related genes without performing interventional experiments.
"In genomics, it is very important to understand the mechanism underlying cell states," Jiaqi Zhang, an Eric and Wendy Schmidt Center Fellow and co-lead author of the study, tells MIT News. "If you figure out the right way to aggregate the observed data, the information you learn about the system should be more interpretable and useful."
The new approach employs statistical techniques to compute mathematical variances that reveal how genes influence each other. By analysing these relationships layer by layer, researchers can identify groups of genes that function together in regulatory programmes.
This breakthrough is particularly significant because traditional methods often require expensive and sometimes impossible interventional experiments, where scientists must actively manipulate genes to study their effects. The new technique bypasses this requirement by extracting meaningful insights from existing observational data.
The team validated their method through simulations, demonstrating that their algorithm can efficiently disentangle meaningful causal representations using only observational data. The theoretical foundation they developed helps understand what can and cannot be learned from observational data.
"While this research was motivated by the problem of elucidating cellular programmes, we first had to develop novel causal theory to understand what could and could not be learned from observational data. With this theory in hand, in future work we can apply our understanding to genetic data and identify gene modules as well as their regulatory relationships," Uhler told MIT News.
The research, which will be presented at the Conference on Neural Information Processing Systems, was funded in part by the MIT-IBM Watson AI Lab and the U.S. Office of Naval Research.