征稿已开启

查看我的稿件

注册已开启

查看我的门票

已截止
活动简介
The World Wide Web has become increasingly popular as a source of linguistic data, not only within the NLP communities, but also with theoretical linguists facing problems of data sparseness or data diversity. Accordingly, web corpora continue to gain importance, given their size and diversity in terms of genres/text types. However, the field is still new, and a number of issues in web corpus construction still needs much research (fundamental and applied), ranging from questions of corpus design (e.g., corpus composition assessment, sampling strategies and their relation to crawling algorithms, handling of duplicated material) to more technical aspects (e.g., efficient implementation of individual post-processing steps in document cleansing and linguistic annotation, or large-scale parallelization to achieve web-scale corpus construction). Similarly, the systematic evaluation of web corpora, for example in the form of task-based comparisons to traditional corpora, has only lately shifted into focus. For almost a decade, the ACL SIGWAC, and especially the highly successful Web as Corpus (WaC) workshops have served as a platform for researchers interested in building and working with web-derived corpora. Past workshops have been co-located with major conferences on computational linguistics and/ or corpus linguistics (such as EACL, LREC, WWW, Corpus Linguistics). As part of the workshop, we will have a panel discussion dedicated to the planning of a shared task for WaC-10 (2015), including the nomination of organizers of the shared task. The tracks of the shared task will focus on the quality of web corpus creation tools, tools for linguistic annotation (at least lemmatization, possibly also POS tagging, etc.), and the quality of web corpora themselves.
征稿信息

重要日期

2014-01-30
摘要截稿日期
留言
验证码 看不清楚,更换一张
全部留言
重要日期
  • 04月26日

    2014

    会议日期

  • 01月30日 2014

    摘要截稿日期

  • 04月26日 2014

    注册截止日期

主办单位
国际计算语言学协会欧洲分会
移动端
在手机上打开
小程序
打开微信小程序
客服
扫码或点此咨询