征稿已开启

查看我的稿件

注册已开启

查看我的门票

已截止
活动简介

The modality of textual data has been somewhat under-represented in big data and data science research thus far. This is despite the fact that large amounts of data are stored in unstructured textual format. We intend that this workshop will address this shortcoming and bring together academic and industrial researchers to exchange cutting edge research in the emerging area of extremely large-scale natural language processing (NLP). This topic has emerged in several areas in parallel in recent years: information retrieval and search engines, text mining, machine learning, web-derived corpus/computational linguistics, digital libraries, high performance and parallel computing. Common to all these areas is some or all of the main parts of the NLP pipeline: collection, cleaning, annotation, indexing, storage, retrieval and analysis of voluminous quantities of naturally occurring language data from the web or large-scale national and international digitisation initiatives. By hosting this event at IEEE Big Data 2016, we hope to encourage the communities to come together to consider synergies between NLP and data science.

In this context, numerous issues should be considered including those linked to the five Vs of big data: (a) Volume: is having more data for training and testing NLP techniques always better? (b) Variety: are all types of data available on a sufficiently large scale? (c) Velocity: how are parallel methods best applied to carry out NLP on a large scale? (d) Variability: how does inconsistent data impact on the accuracy of NLP techniques? (e) Veracity: how does the accuracy of data affect inferences that can be drawn from it?

征稿信息

重要日期

2016-10-07
初稿截稿日期

征稿范围

Topics covered by the workshop include, but are not restricted to, the following:

  • Application focused papers e.g. security informatics

  • Crowdsourcing approaches to large-scale language analysis

  • Use of big data to train/test methods for low resource languages where existing NLP approaches do not exist

  • Efficient NLP for analysing large data sets

  • Challenges of scaling the NLP pipeline

  • Big Data Management for NLP

  • Storage and access for large linguistic data sets

  • Language processing via GPGPUs

  • Parallel and distributed computing techniques for language analysis e.g. HPC, MapReduce, Hadoop, Spark and cloud based machine learning

  • Visualisation methods for the analysis of large corpora

留言
验证码 看不清楚,更换一张
全部留言
重要日期
  • 会议日期

    12月05日

    2016

    12月08日

    2016

  • 10月07日 2016

    初稿截稿日期

  • 12月08日 2016

    注册截止日期

移动端
在手机上打开
小程序
打开微信小程序
客服
扫码或点此咨询