Scientists at Rice University have unveiled a breakthrough technique that could fundamentally change how DNA is designed for medical and biotechnology applications, using artificial intelligence to solve one of synthetic biology’s most stubborn problems: scale.
For years, scientists have known how to program cells to perform specific tasks, from detecting disease to attacking cancer. The hard part has never been the idea. It has been finding the exact DNA sequence that makes a cell behave the way scientists want. The number of possible designs for even a single function can quickly become overwhelming.
“There are many possible designs for any given function, and finding the right one can be like looking for a needle in a haystack,” said Rice University scientist Caleb Bashor.
Now, researchers say they have a way to massively expand that search without slowing it down. The team has developed a method that combines machine learning with enormous libraries of DNA designs, allowing computers to predict which genetic circuits will work before they are ever physically tested.
The approach, called CLASSIC, short for combining long and short range sequencing to investigate genetic complexity, allows scientists to generate and analyze DNA designs at a scale that was previously out of reach.
“We created a new technique that makes hundreds of thousands to millions of DNA designs all at once more than ever before,” Bashor said.
Bashor, an assistant professor of bioengineering and biosciences at Rice and deputy director of the Rice Synthetic Biology Institute, explained that the goal is to map DNA sequences, also known as genetic circuits, directly to the behaviors they produce in living cells. To do this, the researchers built massive libraries of genetic circuits and inserted them into human embryonic kidney cells engineered to glow when specific genes were activated. Brighter cells signaled stronger genetic activity.
The real breakthrough came from combining two sequencing methods that are usually used separately. Long-read sequencing captures entire genetic circuits in one pass, while short-read sequencing offers high precision across smaller segments.
“Most people do one or the other but we found using both together unlocked our ability to build and test the libraries,” said co-first author Ronan O’Connell.
Using short-read sequencing, the team identified unique DNA barcodes attached to each circuit, allowing them to connect individual sequences to how they performed inside cells. This produced massive datasets that were then fed into machine learning models.
With enough data, the AI was able to learn the underlying rules linking DNA structure to cellular behavior, and even predict how new, untested designs would perform.
“We use the data to train a model that can understand this landscape and predict things we were not able to generate data on,” O’Connell said.
When the researchers tested the system’s predictions against manually engineered DNA sequences, the results were striking. All 40 predictions matched real-world outcomes perfectly.
According to the team, this level of accuracy was only possible because of the sheer volume of data generated.
“This was the first time AI ML could be used to analyze circuits and make accurate predictions for untested ones because up to this point nobody could build libraries as large as ours,” said co-first author Kshitij Rai.
Beyond speed and accuracy, the researchers also discovered something unexpected. Instead of one perfect DNA solution for a given task, many different genetic designs often worked equally well. That flexibility could help engineers build biological systems that are more robust and resilient.
The team believes the combination of high-throughput DNA design and AI modeling could dramatically accelerate the development of cell-based therapies, where engineered cells act as living medicines, as well as other synthetic biology applications. The research was published in the journal Nature, marking a significant step toward a future where designing DNA becomes faster, smarter, and far more predictable.
The research was published in the journal Nature.