Proteins, performing complex tasks and catalyzing chemical reactions, are important for cells. Through designing artificial proteins that can perform new tasks, such as treating illnesses, sequestering carbon or extracting electricity, scientists and engineers have long tried to harness this force, but many of the methods used to produce such proteins are sluggish, complex and have a high rate of failure.
A team led by researchers at the University of Chicago’s Pritzker School of Molecular Engineering has developed an artificial intelligence-driven method that uses Big Data to design new proteins in a development that could have implications for health care, agriculture and the energy sector.
The researchers found relatively simple design rules for constructing artificial proteins by designing machine-learning models that can verify protein knowledge from genomic databases.
In the laboratory, when the team developed these artificial proteins, they discovered that they conducted chemical processes so well that they compared with those found in nature.
“We all wondered how a simple process like evolution could lead to something as powerful as a protein,” says Rama Ranganathan, Professor of Biochemistry and Molecular Biology and Molecular Engineering at Joseph Regenstein. “We found that genome data contain enormous amounts of information about the basic rules of protein structure and function, and now we’ve been able to bottle up nature’s rules to create proteins ourselves.”
The results were published in Science, a journal.
Laws of learning style for artificial intelligence
Hundreds or thousands of amino acids make up proteins, and these amino acid sequences determine the structure and function of the protein.
But it has been a challenge to learn how to construct these sequences to produce new proteins. Previous work has led to methods that can define structure, but function has been elusive.
Over the past 15 years, Ranganathan and his collaborators have realized that exponentially – genome databases contain vast quantities of knowledge about the fundamental rules of the structure and function of proteins. Based on these results, his group developed mathematical models and then began using machine learning methods to discover new knowledge about the basic protein design rules.
The chorismate mutase family of metabolic enzymes, a type of protein essential for life in many bacteria, fungi and plants, was studied for this research. The researchers were able to discover the basic design rules behind these proteins using machine learning models.
The model shows that amino acid position conservation and associations in the evolution of amino acid pairs alone are sufficient to predict new artificial sequences that would represent the protein family’s properties.
“We generally assume that to build something, you first have to deeply understand how it works,” Ranganathan said. “But if you have enough data examples, you can use deep-learning methods to learn the rules of the design while you already understand how it works or why it’s built that way.”
He and his collaborators then produced synthetic genes to code for the proteins, cloned them into bacteria, and watched the bacteria use their natural cellular machinery to produce the synthetic proteins.
The artificial proteins had the same catalytic function as the proteins of the natural chorismate mutase, they found.
“We found that genomic data contained enormous amounts of information about the basic rules of protein structure and function, and now we could bottle up nature’s rules to make proteins ourselves.”
– Prof. Rama Ranganathan, Prof.
A forum for understanding other complicated structures
The number of artificial proteins researchers might theoretically use them to generate is extremely high because the design rules are so relatively simple.
“The constraints are much smaller than we ever imagined,”The constraints are a lot smaller than we ever imagined. “There’s a simplicity to nature’s design rules, and we think similar approaches could help us find models for design in other complex systems in biology, like ecosystems or the brain.”
Although the design rules have been discovered by artificial intelligence, Ranganathan and his collaborators do not yet completely understand why the models work. Next, they are going to work to understand how the mode is