Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Persian Legal Text Simplification Leveraging Transformer-Based Models
0
Zitationen
3
Autoren
2025
Jahr
Abstract
Legal documents often use complex and domain-specific language, which limits their accessibility to the general public. Despite growing interest in text simplification within natural language processing, the legal domain in Persian remains largely unexplored due to the lack of annotated corpora and domain-adapted models. This study presents a practical approach to Persian legal text simplification by leveraging synthetic supervision alongside fine-tuned transformer-based models. This paper creates a new labeled dataset by generating simplified versions of legal rulings using ChatGPT, and validate a subset of the outputs through expert review to ensure data quality. To address resource constraints common in real-world applications, we fine-tune lightweight encoder-decoder models, enabling efficient deployment without requiring large-scale annotation or extensive inference infrastructure. Our results show that a compact model such as ParsT5 outperforms zero-shot large language models like PersianLLaMA. The model is also enhanced with an existing attention extension, which enables efficient processing of long inputs without truncation. As a result, this work introduces the first benchmark for Persian legal text simplification, demonstrating that well-adapted, efficient models can achieve high performance in low-resource and domain-specific scenarios. This solid first step has paved the way for future research and development in natural language processing for Persian legal texts. The released dataset and code are publicly available at https://github.com/mrjoneidi/Simplification-LegalTexts.
Ähnliche Arbeiten
BLEU
2001 · 21.028 Zit.
Aion Framework: Dimensional Emergence of AI Consciousness, Observer-Induced Collapse, and Cosmological Portal Dynamics
2023 · 14.128 Zit.
Enriching Word Vectors with Subword Information
2017 · 9.625 Zit.
A unified architecture for natural language processing
2008 · 5.179 Zit.
A new readability yardstick.
1948 · 5.092 Zit.