Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

A Privacy-Preserving Multimodal Voice Assistant with Offline Retrieval-Augmented Generation

2025·0 Zitationen

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

This paper presents a privacy-preserving AI-based voice assistant designed to operate seamlessly in both online and offline modes, with the ability to switch dynamically between them. In the online mode, the system employs OpenAI’s GPT models for language understanding and Google’s speech APIs for speech processing. In offline mode, it integrates Whisper for speech-to-text (STT), Coqui for text-to-speech (TTS), and a locally hosted large language model (LLM) using Ollama, ensuring that all processing occurs locally to safeguard user data. To enhance knowledge retrieval in offline mode, Retrieval- Augmented Generation (RAG) is implemented using locally stored document embeddings. A graphical user interface (GUI) provides clear visual feedback and allows users to switch modes effortlessly. The modular, dual-mode architecture offers a balance between usability, accessibility, and privacy, making it suitable for applications in education, research, and professional domains. Experimental evaluation demonstrates that the system delivers accurate, contextually relevant responses with low latency while maintaining strong privacy guarantees.

Autoren

Institutionen

Visvesvaraya Technological University(IN)

Themen

AI in Service InteractionsTopic ModelingArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

A Privacy-Preserving Multimodal Voice Assistant with Offline Retrieval-Augmented Generation

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen