I've got a few other Monday links up at This Genomic Life: sex-differential selection, neuronal resiliency omics, and a philosophy of biology trilogy. thisgenomiclife.substack.com/p/this-weeks...
Quick hits to start your week: Designing regulatory DNA, battle of the sexes, genetic resiliency, and the conclusion to a philosophy of biology trilogy.
Cool new work from Genentech on designing regulatory DNA with an autoregressive language model, achieving impressive cell type specificity: t.co/DnyzGfGQuP
Enhancers - they're important, we can count them, study them, design them, but we can't define what they are. Some musings on enhancers with a side of amateur philosophy of science: open.substack.com/pub/thisgeno...
Don't believe the lie that precise definitions are important in science. We say much about things that we can't define precisely - like species and enhancers. Enhancers take up more genomic real estate than genes, but we can't say exactly what they are: www.nature.com/articles/s41...
It seems like a great task for an AI model, and something that might lend itself to high-throughput screening
Our group's deep mutational scan of a transcription factor is up on Genome Research today. Fantastic work by James, and excellent MD/PhD Student. See how Alpha Missense performs on predicting activation domain mutations: genome.cshlp.org/content/earl...
There is a huge unexplored sequence space out there - maybe most of it is useless, and codon optimization is the best we can do. But maybe not. With self-amplifying mRNA vaccines in the pipeline, it's worth exploring LLM-guided design.
How much better can you do than codon optimization? Are there optimal sequences that the model finds, which human reasoning never would have picked out? The paper doesn't say, but it raises the possibility.
A new paper from a Sanofi team presents CodonBERT, which does reasonably well predicting expression from different flu mRNA vaccine sequences: pubmed.ncbi.nlm.nih.gov/38951026/ Worth a read, although there isn't much interpretation of the model. Here's what I'd love to know:
mRNA-based vaccines and therapeutics are gaining popularity and usage across a wide range of conditions. One of the critical issues when designing such mRNAs is sequence optimization. Even small prote...
likely isn't optimal either, because the gene was probably not selected for max translation rate and RNA stability. For mRNA vaccines, can we do better than mere codon optimization? It's a great problem for an LLM.