Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
QA Generator Framework for Chatbot Dataset Using NLP Methods with T5 and BERT Models
0
Zitationen
5
Autoren
2025
Jahr
Abstract
The development of artificial intelligence (AI), such as chatbots in the academic field, relies on large and high-quality datasets. Manually creating these datasets is time-consuming, resource-intensive, and delays AI deployment. To address this, we propose an automated dataset generation framework using Natural Language Processing (NLP) and transformer-based models. This research uses pretrained large language models (LLM), specifically T5 and BERT, to extract meaningful information from academic PDF documents, particularly two-column structured papers, and generate question-answer (QA) pairs as chatbot datasets. The proposed framework includes a text extraction, preprocessing, and QA generator; the pair will then be evaluated using three metrics: confidence score, BERTScore, and F1-Token. Unlike prior works that primarily target comprehension benchmarks dataset and task-based such as SQuAD or UnifiedQA, our framework accelerates the creation of datasets for chatbots, reduces the need for human effort, and guarantees the maintenance of high-quality and relevant datasets by directly processing unstructured academic PDFs with a nocel of dual-model strategy where T5 is used for question generation, and BERT is used for answer extraction to answer the generated question, with a built-in filtering mechanic and an adaptive parameter adjustment strategy that dynamically tunes parameters to maintain semantic diversity and reduce redundancy while ensuring semantic quality. Experimental results shows that the framework consistently produces QA pairs with high semantic value and both BERTscore and confidence score having high scores indicating a strong performance, demonstrating the viability of the framework as a scalable tool for automated dataset creation for chatbots.
Ähnliche Arbeiten
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller
1999 · 5.632 Zit.
An experiment in linguistic synthesis with a fuzzy logic controller
1975 · 5.549 Zit.
A FRAMEWORK FOR REPRESENTING KNOWLEDGE
1988 · 4.548 Zit.
Opinion Paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy
2023 · 3.306 Zit.