Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Utilizing Pre-Trained Language Models and Large Language Models for 10-K Items Segmentation

2026·0 Zitationen·Journal of Information Systems

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

ABSTRACT Extracting specific items from 10-K filings is challenging because of variations in document formats and item presentation. This study aims to improve traditional rule-based approaches by introducing and comparing two advanced item segmentation methods: (1) GPT4ItemSeg, employing a novel line-ID-based prompting mechanism to utilize a large language model, ChatGPT-4o, for item segmentation, and (2) BERT4ItemSeg, combining a pre-trained language model, BERT, with a Bi-LSTM model in a hierarchical structure to overcome context window constraints. Trained and evaluated on 3,737 annotated 10-K reports, BERT4ItemSeg achieves a macro-F1 of 0.9825, surpassing GPT4ItemSeg (0.9567), conditional random field (0.9818), and rule-based methods (0.9048) for core items (1, 1A, 3, and 7). These approaches enhance item segmentation performance, improving text analytics in accounting and finance. BERT4ItemSeg offers satisfactory item segmentation performance, whereas GPT4ItemSeg can easily adapt to regulatory changes. Together, they provide an extensible framework for 10-K item segmentation that supports reliable and reproducible results. Data Availability: Data are available for download at https://www.im.ntu.edu.tw/~lu/data/itemseg/itemseg10kdata.7z. Code is available from the authors upon request. JEL Classifications: M41; C81.

Autoren

Institutionen

Themen

Topic ModelingArtificial Intelligence in Healthcare and EducationExplainable Artificial Intelligence (XAI)

Volltext beim Verlag öffnen

Utilizing Pre-Trained Language Models and Large Language Models for 10-K Items Segmentation

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen