OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 15.03.2026, 03:23

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

QA Generator Framework for Chatbot Dataset Using NLP Methods with T5 and BERT Models

2025·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

5

Autoren

2025

Jahr

Abstract

The development of artificial intelligence (AI), such as chatbots in the academic field, relies on large and high-quality datasets. Manually creating these datasets is time-consuming, resource-intensive, and delays AI deployment. To address this, we propose an automated dataset generation framework using Natural Language Processing (NLP) and transformer-based models. This research uses pretrained large language models (LLM), specifically T5 and BERT, to extract meaningful information from academic PDF documents, particularly two-column structured papers, and generate question-answer (QA) pairs as chatbot datasets. The proposed framework includes a text extraction, preprocessing, and QA generator; the pair will then be evaluated using three metrics: confidence score, BERTScore, and F1-Token. Unlike prior works that primarily target comprehension benchmarks dataset and task-based such as SQuAD or UnifiedQA, our framework accelerates the creation of datasets for chatbots, reduces the need for human effort, and guarantees the maintenance of high-quality and relevant datasets by directly processing unstructured academic PDFs with a nocel of dual-model strategy where T5 is used for question generation, and BERT is used for answer extraction to answer the generated question, with a built-in filtering mechanic and an adaptive parameter adjustment strategy that dynamically tunes parameters to maintain semantic diversity and reduce redundancy while ensuring semantic quality. Experimental results shows that the framework consistently produces QA pairs with high semantic value and both BERTscore and confidence score having high scores indicating a strong performance, demonstrating the viability of the framework as a scalable tool for automated dataset creation for chatbots.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

AI in Service InteractionsTopic ModelingArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen