2014年合理统计数据挖掘研讨会 workshop on Statistically Sound Data Mining

征稿已开启

已截稿

注册已开启

已截止

活动简介

Even if Data Mining has its roots in Statistics, there was a long while when data miners and statisticians walked their own paths. Data miners concentrated on developing efficient algorithms that addressed the practical issues associated with huge data sets, but in doing so may sometimes have paid less attention to the reliability of patterns or even their utility. On the other hand, statisticians continued on their traditional line offering well-founded and sound methods for validating statistically meaningful patterns, but they could not offer computational means to find them. Fortunately, the situation is now changing and both data miners and statisticians are recognizing the need for cooperation. The main impetus for this new trend is coming from a third party, the application fields. In the computerized world, it is easy to collect large data sets but their analysis is more difficult. Knowing the traditional statistical tests is no more sufficient for scientists, because one should first find the most promising hidden patterns and models to be tested. This means that there is an urgent need for efficient data mining algorithms which are able to find desired patterns, without missing any significant discoveries or producing too many spurious ones. A related problem is to find a statistically justified compromise between underfitted (too generic to catch all important aspects) and overfitted (too specific, holding just due to chance) patterns. However, before any algorithms can be designed, one should first solve many principal problems, like how to define the statistical significance of desired patterns, how to evaluate overfitting, how to interprete the p-values when multiple patterns are tested, and so on. In addition, one should evaluate the existing data mining methods, alternative algorithms and goodness measures to see which of them produce statistically valid results. As we can see, there are many important problems which should be worked together with people from Data mining, Machine learning, and Statistics as well as application fields. The goal of this workshop is to offer a meeting point for this discussion. We want bring together people from different backgrounds and schools of science, both theoretically and practically oriented, to specify problems, share solutions and brainstorm new ideas. To encourage real workshopping of actual problems, the workshop is arranged in a novel way, containing an invited lecture and inspiring groupworks in addition to traditional presentations. This means that also the non-author participants can contribute to workshop results and submit a paper to the final proceedings afterwards. If you have relevant problems which you would like to be worked together in the workshop, please send them before the workshop.

征稿信息

重要日期

2014-07-20

初稿截稿日期

征稿范围

Topics of interest include but are not limited to: Useful and relevant theoretical results Search methods for statistically valid patterns and models Statistical validation of discovered patterns Evaluating statistical significance of clustering Statistical techniques for avoiding overfitted patterns Scaling statistical techniques to high-dimensionality and high data quantity, covering both theoretical problems (like multiple testing problem) and computational problems (calculating required test measures efficiently) Interesting applications with real world data demonstrating statistically sound data mining Empirical comparisons between between different statistical validation methods and possibly other goodness measures Insightful positition papers

留言

全部留言

重要日期

09月15日

2014

会议日期
07月20日 2014

初稿截稿日期
09月15日 2014

注册截止日期

主办单位

University of Eastern Finland
Monash University, Australia.

联系方式

Geoff Webb
Ge******@monash.edu

登录查看完整联系方式

移动端

在手机上打开

小程序

打开微信小程序

客服

扫码或点此咨询

2014年合理统计数据挖掘研讨会

workshop on Statistically Sound Data Mining

征稿已开启

注册已开启

活动简介

征稿信息

重要日期

征稿范围

留言

全部留言

稿件模板

重要日期

主办单位

联系方式