The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling From friends at Tatta Bio GitHub: github.com/TattaBio/OMGwww.biorxiv.org/content/10.1...
Biological language model performance depends heavily on pretraining data quality, diversity, and size. While metagenomic datasets feature enormous biological diversity, their utilization as pretraini...
Terrabacteria: redefining bacterial envelope diversity, biogenesis and evolution #NatureRevMicrowww.nature.com/articles/s41...
AntiDefenseFinder! And it is available also as an option with DefenseFinder: defensefinder.mdmlab.frwww.biorxiv.org/content/10.1...
bioRxiv - the preprint server for biology, operated by Cold Spring Harbor Laboratory, a research and educational institution
Methanogenesis outside the Euryarchaeota experimentally demonstrated by three cultivation-driven studies (two from my lab)! A long🧵.🐻with me tinyurl.com/4v4fkda6tinyurl.com/yr4p7js6tinyurl.com/mtsrj6b9
NCBI datasets is definitely an improvement over E-utils, but I wish NCBI had a REST API like ENA does. I rely on ENA every time I need to get metadata for sequences, genomes, samples, etc.
If you are interested in prophages, we have a new database: Prophage-DB. Check it out and all feedback is welcome biorxiv.org/cgi/content/...
bioRxiv - the preprint server for biology, operated by Cold Spring Harbor Laboratory, a research and educational institution
Phylogenetic reconciliation: making the most of genomes to understand microbial ecology and evolution #ISMEJournalacademic.oup.com/ismej/advanc...
A global atlas of soil viruses reveals unexplored biodiversity and potential biogeochemical impacts - Nature Microbiology www.nature.com/articles/s41...@apcamargo.bsky.social Are these new genomes already in the current version of IMG/VR or would I want to download separately and combine for now?
This study presents an extensive global compendium of metagenomically derived sequences that will serve as a foundation for understanding the role of viruses in soil ecosystems.
It's a separate dataset for now. Even though this project started a while ago, the metagenomes were not in IMG when I processed the data for the latest IMG/VR release
I'd love to see that :)