Viruses are ubiquitous microorganisms that infect animals, plants, and even bacteria. Viruses are in the air, sewage, lakes, oceans, grasslands, and decaying wood. They are everywhere. They thrive in hydrothermal vents, Antarctic ice, and potentially even in outer space. Yet only a fraction has been identified and described.
Despite cohabitating with viruses, the viral universe remains a mystery. For decades, scientists have painstakingly gathered samples from around the globe and sequenced their genetic material. But viruses rapidly mutate, and these efforts only scrape the surface of the virosphere.
Most viral genetic material is biological dark matter. Shi Mang at Sun Yat-Sen University and colleagues recently wrote in a paper published in Cell.
Dark matter refers to unknown or uncharacterized genetic material in samples containing viruses and microbes. In astronomy, where dark matter describes mysterious unseen parts of the universe, viral dark matter represents viruses and genetic sequences in metagenomic data that cannot be classified or matched to known organisms.
It is a field of genomics that studies genetic material extracted from environmental samples rather than individual organisms. It helps to analyze communities of microbes such as bacteria, viruses, and fungi within a complex ecosystem like soil, oceans, and even the human gut. Genomics requires isolating and cultivating individual species. Metagenomics involves sequencing all the DNA samples. It allows insights into diverse microbial communities, which cannot be produced by culture in labs.
The AI model uses a protein prediction tool, ESM Fold, developed by researchers at Meta, formerly Facebook. A similar AI system, Alpha Fold, was developed by Google, Deep Mind in London, and won the Nobel Prize in Chemistry this year.
ESM Fold
ESM Fold is a tool that uses artificial intelligence (AI) to predict the shape of proteins based on their genetic code. Proteins are like biological "machines" that do everything from building tissues to fighting diseases, and their shape is crucial for how they work.
ESM Fold helps by using AI to predict these 3D shapes quickly and accurately. Created by Meta (formerly Facebook), this tool allows researchers to uncover the structure of many unknown proteins, including those in viruses and bacteria, which helps advance our understanding of diseases and potential treatments.
In 2022, researchers led by Babaian searched 5.7 million genomic examples in a public database and identified 1,32,000 previously unknown ones. RNA viruses evolve quickly, which makes them hard to detect using traditional methods. The usual approach involves looking for a specific protein, RNA-dependent RNA polymerase, RdRp, that viruses use to replicate. However, if this protein sequence varies significantly from known sequences, the researchers do not recognize them.
To improve virus detection, evolutionary biologist Shi Mang and his team developed a new model, Luca Prot, inspired by AI techniques like those used in Chat GPT. By combining Luca Prot with protein prediction tools, they identified 1,60,000 RNA viruses in extreme environments like hot springs and salt lakes. After analyzing many samples, AI detected 70,000 new RNA viruses from worldwide samples.
The project has opened doors to understanding viral diversity and evolutionary history. While they couldn't determine which organisms these viruses infect, the researchers are interested in exploring their influence on health and ecosystems.
Reference:
1. Using artificial intelligence to document the hidden RNA ... Accessed November 12, 2024. https://www.cell.com/cell/fulltext/S0092-8674(24)01085-7.
(Input from various sources)
(Rehash/Swati Sharma/MSM)