A Dual-Task Large Language Model for Adding Diacritics and Translating Jordanian Arabic to Modern Standard Arabic 2025亚洲通信与网络会议 2025 Asian Conference on Communication and Networks ASIANComNet 2025

报告详情

A Dual-Task Large Language Model for Adding Diacritics and Translating Jordanian Arabic to Modern Standard Arabic

编号：18 访问权限：仅限参会人更新：2025-11-19 09:12:30 浏览：8次拓展类型1

报告开始：暂无开始时间（Asia/Amman）

报告时间：暂无持续时间

所在会场：[暂无会议] [暂无会议段]

暂无文件

摘要

The Arabic language presents unique challenges for natural language processing due to its complex grammar, diverse dialects, and frequent omission of diacritics. This paper proposes a unified token-free model based on ByT5 that simultaneously performs spelling correction (including Jordanian dialect-to-Modern Standard Arabic (MSA) translation) and diacritization. Our approach uses task-specific prefixes (“correct:” for correction and “diacritize:” for combined correction and diacritization) to enable flexible multi-task learning. The model was fine-tuned on the JODA dataset (Jordanian dialect/MSA pairs) and high-quality Tashkeela subsets (Clean-50 and Clean-400), with synthetic errors injection to enhance robustness. Automatic evaluation showed an overall evaluation score of 78.06% on JODA and 92.45% on the combined test set of JODA and Tashkeela. Manual evaluation of 200 JODA samples revealed a character error rate of 4.41% and diacritic error rate of 1.32%, demonstrating practical efficacy in handling Arabic’s complexities.

关键词

Arabic NLP,Dialect Translation,Jordanian Dialect,Diacritization,Spelling Correction,ByT5,Transformer Models,Multi-Task Learning

报告人

Rabie Otoum

RAN Optimization and University of Jordan

稿件作者

Rabie Otoum University of Jordan

Gheith Abandah University of Jordan

Mohammad Abdel-Majeed University of Jordan

发表评论

全部评论

重要日期

会议日期

12月29日

2025

至

12月31日

2025
11月30日 2025

初稿截稿日期
12月30日 2025

报告提交截止日期
12月30日 2025

注册截止日期

主办单位

国际科学联合会

承办单位

扎尔卡大学

联系方式

Miss AsianComNet
as******@usssociety.org

登录查看完整联系方式

历届会议

2024年10月24日泰国 Bangkok
2024亚洲通讯与网络会议

移动端

在手机上打开

小程序

打开微信小程序

客服

扫码或点此咨询

2025亚洲通信与网络会议 (ASIANComNet 2025)

2025 Asian Conference on Communication and Networks