OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 02.05.2026, 15:08

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Sentiment Analysis of Reviews of Top Large Language Models

2026·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2026

Jahr

Abstract

The rapid advancement of Large Language Models (LLMs) has led to a surge in AI-powered mobile applications, generating vast amounts of user feedback essential for understanding public perception and guiding development. This study aims to conduct large-scale sentiment analysis on user reviews of seven prominent LLM-based mobile apps ChatGPT (OpenAI), Claude (Anthropic), Meta AI (Meta), DeepSeek (DeepSeek), Grok (xAI), Gemini (Google), and Perplexity (Perplexity AI) sourced from the Google Play Store. To handle the data volume, we designed a scalable big data pipeline integrating web scraping for data collection, Apache Kafka for real-time streaming, PostgreSQL for storage, Apache Spark for distributed processing, and a pre-trained BERT-based Transformer model for sentiment classification. The pipeline processed approximately 8 million reviews totaling 1.75 GB. Results show an overall sentiment distribution of <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\mathbf{5 5. 4 \%}$</tex> positive and 44.6 % negative, with variations across apps: Claude, Grok, and Meta AI exhibited the highest positive sentiment (above 60 %), while Gemini and ChatGPT showed relatively higher negative feedback (around 50%). Strong alignment was observed between sentiment classifications and star ratings, though a weak Spearman correlation (<tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\boldsymbol{\rho} \boldsymbol{=} \mathbf{0. 1 2 4}$</tex>) indicates nuanced user opinions. Temporal trends reveal general stability with short-term fluctuations, and lexical analyses highlight themes of usability, efficiency, and performance issues. The pipeline demonstrated high throughput and accuracy, validated through proxy metrics and statistical tests, proving effective for real-world, large-scale sentiment analysis of LLM apps. These insights can inform LLM improvements, such as addressing common complaints about reliability and interface design.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationSentiment Analysis and Opinion MiningMobile Health and mHealth Applications
Volltext beim Verlag öffnen