Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Sentiment Analysis of Reviews of Top Large Language Models
0
Zitationen
3
Autoren
2026
Jahr
Abstract
The rapid advancement of Large Language Models (LLMs) has led to a surge in AI-powered mobile applications, generating vast amounts of user feedback essential for understanding public perception and guiding development. This study aims to conduct large-scale sentiment analysis on user reviews of seven prominent LLM-based mobile apps ChatGPT (OpenAI), Claude (Anthropic), Meta AI (Meta), DeepSeek (DeepSeek), Grok (xAI), Gemini (Google), and Perplexity (Perplexity AI) sourced from the Google Play Store. To handle the data volume, we designed a scalable big data pipeline integrating web scraping for data collection, Apache Kafka for real-time streaming, PostgreSQL for storage, Apache Spark for distributed processing, and a pre-trained BERT-based Transformer model for sentiment classification. The pipeline processed approximately 8 million reviews totaling 1.75 GB. Results show an overall sentiment distribution of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{5 5. 4 \%}$</tex> positive and 44.6 % negative, with variations across apps: Claude, Grok, and Meta AI exhibited the highest positive sentiment (above 60 %), while Gemini and ChatGPT showed relatively higher negative feedback (around 50%). Strong alignment was observed between sentiment classifications and star ratings, though a weak Spearman correlation (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\boldsymbol{\rho} \boldsymbol{=} \mathbf{0. 1 2 4}$</tex>) indicates nuanced user opinions. Temporal trends reveal general stability with short-term fluctuations, and lexical analyses highlight themes of usability, efficiency, and performance issues. The pipeline demonstrated high throughput and accuracy, validated through proxy metrics and statistical tests, proving effective for real-world, large-scale sentiment analysis of LLM apps. These insights can inform LLM improvements, such as addressing common complaints about reliability and interface design.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.553 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.444 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.943 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.792 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.