OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 03.05.2026, 12:13

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Decoding the language of microbiomes using word-embedding techniques, and applications in inflammatory bowel disease

2020·23 Zitationen·PLoS Computational BiologyOpen Access
Volltext beim Verlag öffnen

23

Zitationen

2

Autoren

2020

Jahr

Abstract

Microbiomes are complex ecological systems that play crucial roles in understanding natural phenomena from human disease to climate change. Especially in human gut microbiome studies, where collecting clinical samples can be arduous, the number of taxa considered in any one study often exceeds the number of samples ten to one hundred-fold. This discrepancy decreases the power of studies to identify meaningful differences between samples, increases the likelihood of false positive results, and subsequently limits reproducibility. Despite the vast collections of microbiome data already available, biome-specific patterns of microbial structure are not currently leveraged to inform studies. Here, we derive microbiome-level properties by applying an embedding algorithm to quantify taxon co-occurrence patterns in over 18,000 samples from the American Gut Project (AGP) microbiome crowdsourcing effort. We then compare the predictive power of models trained using properties, normalized taxonomic count data, and another commonly used dimensionality reduction method, Principal Component Analysis in categorizing samples from individuals with inflammatory bowel disease (IBD) and healthy controls. We show that predictive models trained using property data are the most accurate, robust, and generalizable, and that property-based models can be trained on one dataset and deployed on another with positive results. Furthermore, we find that properties correlate significantly with known metabolic pathways. Using these properties, we are able to extract known and new bacterial metabolic pathways associated with inflammatory bowel disease across two completely independent studies. By providing a set of pre-trained embeddings, we allow any V4 16S amplicon study to apply the publicly informed properties to increase the statistical power, reproducibility, and generalizability of analysis.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Gut microbiota and healthMachine Learning in HealthcareBioinformatics and Genomic Networks
Volltext beim Verlag öffnen