This workshop is concerned with identifying and applying appropriate software engineering (SE) tools and practices (e.g., code generators, static analyzers, validation + verification (V&V) practices, testing, design approaches, and maintenance practices) to support and ease the development of reproducible Computational and Data-enabled Science & Engineering (CoDeSE) software for High Performance Computing (HPC). Specifically:
CoDeSE applications that include large parallel models/simulations of the physical world running on HPC systems.
CoDeSE applications that utilize HPC systems (e.g., GPUs computing, compute clusters, or supercomputers) to manage and/or manipulate large amounts of data.
Despite the increasing demand for utilizing HPC for CoDeSE applications, software development for HPC historically attracted little attention from the SE community. Paradoxically, the HPC CoDeSE community has increasingly been adopting SE techniques and tools. Indeed, the development of CoDeSE software for HPC differs significantly from the development of more traditional business information systems, from which many SE best practices and tools have been drawn.
These differences appear at various phases of the software lifecycle as described below:
Requirements
Risks due to the exploration of relatively unknown scientific/engineering phenomena
Supporting reproducible science, particularly on non-deterministic systems
Constant change as new information is gathered
Design
Data dependencies within the software
The need to identify the most appropriate parallelization strategy for CoDeSE algorithms
The presence of complex communication among HPC nodes that could degrade performance
Challenges in designing unit and system tests at appropriate scales
The need for fault tolerance and task migration mechanisms to mitigate the need to restart time-consuming computations due to software or hardware errors
V&V
Results are often unknown when exploring novel science or engineering areas, algorithms, and datasets
Challenges in applying unit and system tests at appropriate scales
Challenges in retrospectively designing and implementing tests for legacy code
Popular tools often do not work on the latest HPC architectures; they need to be tuned to handle many threads executing at the same time
Deployment
Failure of components within running systems is expected due to system size
Continuous integration on platforms with high available and infrequent downtimes
Long system lifespans necessitate porting across multiple platforms
11月12日
2017
会议日期
注册截止日期
留言