2018 IEEE/ACM 8th Workshop on Fault Tolerance for HPC at eXtreme Scale FTXS

活动简介

The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC18)

组委会

Keita Teranishi – Sandia National Laboratories

John Daly – Laboratory for Physical Sciences

征稿信息

重要日期

2018-09-10

初稿截稿日期

Authors are invited to submit original papers on the research and practice of fault-tolerance in extreme-scale distributed systems (primarily HPC systems, but including grid and cloud systems). Resilience and fault-tolerance remain a major concern for supercomputing and advances in this area are needed to allow applications to compute accurate (or within an acceptable error tolerance) answers in a timely and efficient manner in the presence of degradations or failures of platform components (both hardware and software).

征稿范围

Topics include, but are not limited to:

Failure data analysis and field studies
Power, performance, resilience (PPR) assessments / tradeoffs
Novel fault-tolerance techniques and implementations
Emerging hardware and software technology for resilience
Silent data corruption (SDC) detection / correction techniques
Advances in reliability monitoring, analysis, and control of highly complex systems
Failure prediction, error preemption, and recovery techniques
Fault-tolerant programming models
Models for software and hardware reliability
Metrics and standards for measuring, improving, and enforcing effective fault-tolerance
Scalable Byzantine fault-tolerance and security from single-fault and fail-silent violations
Atmospheric evaluations relevant to HPC systems (terrestrial neutrons, temperature, voltage, etc.)
Near-threshold-voltage implications and evaluations for reliability
Benchmarks and experimental environments including fault injection
Frameworks and APIs for fault-tolerance and fault management

作者指南

Submissions are solicited in the following categories:

Regular papers presenting innovative ideas improving the state of the art or discussing the issues seen on existing extreme-scale systems, including some form of analysis and evaluation.
Extended abstracts proposing disruptive ideas and challenging assumptions in the field, including some form of preliminary results.

Extended abstracts will be evaluated separately and given shorter oral presentations.

Submissions shall be sent electronically and must conform to SC18 proceedings style. Regular papers should not exceed ten (10) pages including all text, appendices, figures, and references. Extended abstract papers should not exceed six (6) pages. Please note that we have only placed a limit on the maximum number of pages that a submission may contain. Papers that are clear, coherent, and complete (with the understanding that the submission may represent a work-in-progress) but are shorter than this maximum are encouraged.

Papers should be submitted to: https://submissions.supercomputing.org. A sample submission form is availablehere.

Our workshop has been accepted to have its proceedings published by IEEE TCHPC (and included in IEEE Xplore).

Authors are encouraged to include reproducibility artifacts as described on the conference website:
https://sc18.supercomputing.org/submit/sc-reproducibility-initiative
Inclusion of reproducibility artifacts is optional.

Submission of papers: August 30, 2018 September 10, 2018 (anywhere-on-earth)

Author notification: September 27, 2018 October 4, 2018 (anywhere-on-earth)

Camera ready papers: October 11, 2018

Workshop: Friday, November 16, 2018

留言

全部留言

重要日期

11月16日

2018

会议日期
09月10日 2018

初稿截稿日期
11月16日 2018

注册截止日期

联系方式

Scott Levy
sl******@sandia.gov

登录查看完整联系方式

移动端

在手机上打开

小程序

打开微信小程序

客服

扫码或点此咨询

2018 IEEE/ACM 8th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS)

活动简介

组委会

征稿信息

重要日期

征稿范围

作者指南

留言

全部留言

稿件模板

重要日期

联系方式