Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Enriching Conversations: Empowering ChatGPT with Image Caption Generation
3
Zitationen
5
Autoren
2024
Jahr
Abstract
Image captioning stands as a pivotal technique for providing contextual descriptions of visual content, promising substantial enhancement in the capabilities of conversational AI systems. This work delves into the integration of image captioning methodologies into ChatGPT, aiming to fortify its capacity in understanding and responding to visual information. The study extensively explores the application of deep learning models, encompassing ResNet50, LSTM, DenseNet121, MobileNet, and MobileNetv2, in the domain of image captioning. Specifically, a comprehensive investigation is conducted into a Recurrent Neural Network employing LSTM as a decoder and a Convolutional Neural Network utilizing ResNet as an encoder. These fusion harnesses vocabulary and image features to craft precise and meaningful descriptions of visual content. Furthermore, this study pioneers an approach to identify and relate at least two salient features within any given image, forming a coherent caption that binds the relationship between these identified features. This novel capability not only refines image captioning techniques but also empowers ChatGPT to comprehend complex visual contexts within conversational settings. The outcomes of this work offer profound insights into augmenting AI capabilities, facilitating a deeper understanding and more effective interaction with visual information across various domains, thereby advancing the field of conversational AI integration with visual context.
Ähnliche Arbeiten
MizAR 60 for Mizar 50
2023 · 74.099 Zit.
ImageNet: A large-scale hierarchical image database
2009 · 60.446 Zit.
Microsoft COCO: Common Objects in Context
2014 · 41.095 Zit.
Fully convolutional networks for semantic segmentation
2015 · 36.279 Zit.
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 20.299 Zit.