征稿已开启

查看我的稿件

注册已开启

查看我的门票

已截止
活动简介

Experimental reproducibility is a cornerstone of the scientific method. As computing has grown into a powerful tool for scientific inquiry, computational reproducibility has been one of the core assumptions underlying scientific computing. With "traditional" single-core CPUs, documenting a numerical result was relatively straightforward. However, hardware developments over the past several decades have made it almost impossible to ensure computational reproducibility or to even fully document a computation without incurring a severe loss of performance. This loss of reproducibility started with CPUs that used out-of-order execution to improve performance. It has accelerated with recent architectural trends towards platforms with increasingly large numbers of processing elements, namely multicore CPUs and compute accelerators (GPUs, Intel Xeon Phi, FPGAs). Programmers targeting these platforms rely on tools and libraries to produce codes or execute them efficiently. As a result, codes can run efficiently, but have execution details that can be impossible to predict and are often very difficult to understand after execution. Furthermore, parallel implementations often result in code with varying execution orders between runs, leading to nonreproducible computations. The underlying reasons are that (1) the hardware and system software allocate parallel work in ways that are not always specifiable at compile time and (2) the execution often proceeds in an opportunistic manner with the execution order changing between runs. As such, floating-point computations, which are non-commutative, can have different execution orders and execute on different processing elements between runs, leading to runs with varying results as a matter of fact. The predictability of systems is further complicated by two issues that are becoming more critical as systems grow in scale: (1) interconnect systems with latencies that are often outside the control of programmers and (2) reliability as the mean time between failure (MTBF) is now measured in hours on large systems. This situation seriously affects the ability to rely on scientific computations as a metrological substitute for experimentation!

征稿信息
留言
验证码 看不清楚,更换一张
全部留言
重要日期
  • 11月20日

    2015

    会议日期

  • 11月20日 2015

    注册截止日期

主办单位
IEEE Computer Society
美国计算机学会
历届会议
移动端
在手机上打开
小程序
打开微信小程序
客服
扫码或点此咨询