Large Language Models in Molecular Biology

Serafim Batzoglou at Towards Data Science:

Will we ever decipher the language of molecular biology? Here, I argue that we are just a few years away from having accurate in silico models of the primary biomolecular information highway — from DNA to gene expression to proteins — that rival experimental accuracy and can be used in medicine and pharmaceutical discovery.

Since I started my PhD in 1996, the computational biology community had embraced the mantra, “biology is becoming a computational science.” Our ultimate ambition has been to predict the activity of biomolecules within cells, and cells within our bodies, with precision and reproducibility akin to engineering disciplines. We have aimed to create computational models of biological systems, enabling accurate biomolecular experimentation in silico. The recent strides made in deep learning and particularly large language models (LLMs), in conjunction with affordable and large-scale data generation, are propelling this aspiration closer to reality.

