Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A Privacy-Preserving Multimodal Voice Assistant with Offline Retrieval-Augmented Generation
0
Zitationen
5
Autoren
2025
Jahr
Abstract
This paper presents a privacy-preserving AI-based voice assistant designed to operate seamlessly in both online and offline modes, with the ability to switch dynamically between them. In the online mode, the system employs OpenAI’s GPT models for language understanding and Google’s speech APIs for speech processing. In offline mode, it integrates Whisper for speech-to-text (STT), Coqui for text-to-speech (TTS), and a locally hosted large language model (LLM) using Ollama, ensuring that all processing occurs locally to safeguard user data. To enhance knowledge retrieval in offline mode, Retrieval- Augmented Generation (RAG) is implemented using locally stored document embeddings. A graphical user interface (GUI) provides clear visual feedback and allows users to switch modes effortlessly. The modular, dual-mode architecture offers a balance between usability, accessibility, and privacy, making it suitable for applications in education, research, and professional domains. Experimental evaluation demonstrates that the system delivers accurate, contextually relevant responses with low latency while maintaining strong privacy guarantees.
Ähnliche Arbeiten
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller
1999 · 5.632 Zit.
An experiment in linguistic synthesis with a fuzzy logic controller
1975 · 5.557 Zit.
A FRAMEWORK FOR REPRESENTING KNOWLEDGE
1988 · 4.548 Zit.
Opinion Paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy
2023 · 3.325 Zit.