AI trained on bacterial genomes produces never-before-seen proteins
Briefly

AI trained on bacterial genomes produces never-before-seen proteins
"But biology doesn't generate new proteins at that level. Instead, changes have to take place at the nucleic acid level before eventually making their presence felt at the protein level. And the DNA level is fairly removed from proteins, with lots of critical non-coding sequences, redundancy, and a fair degree of flexibility. It's not necessarily obvious that learning the organization of a genome would help an AI system figure out how to make functional proteins."
"The new work was done by a small team at Stanford University. It relies on a feature that's common in bacterial genomes: the clustering of genes with related functions. Often, bacteria have all the genes needed for a given function-importing and digesting a sugar, synthesizing an amino acid, etc.-right next to each other in the genome. In many cases, all the genes are transcribed into a single, large messenger RNA. This gives the bacteria a simple way to control the activity of entire biochemical pathways at once, boosting the efficiency of bacterial metabolisms."
A genomic language model called Evo was trained on an extensive collection of bacterial genomes using next-base prediction and generative training. Bacterial genomes frequently cluster genes with related functions into contiguous regions often transcribed as single messenger RNAs, enabling coordinated control of biochemical pathways. Training on genomic sequences allows the model to learn organizational patterns, redundancy, non-coding elements, and genomic flexibility that relate to downstream protein production. The model can generate and predict protein sequences, including ones that do not resemble known proteins, by leveraging genome-level information rather than training directly on amino-acid sequences.
Read at Ars Technica
Unable to calculate read time
[
|
]