Numerical Algorithms and Fault Tolerance of Hyperexascale Computer Systems

Academician of the RAS B. N. Chetverushkin* and Corresponding Member of the RAS M. V. Yakobovskiy
Translated by I. Ruzanova

Federal Research Center Keldysh Institute of Applied Mathematics, Russian Academy of Sciences, Moscow, 125047 Russia

Correspondence to: *e-mail: chetver@imamod.ru

Received 15 September, 2016

Abstract—A new method is discussed which provides the possibility of long-term continuous calculations on a computing systems consisting of millions of operating devices, some of which may suffer failures in the course of calculation. The method relies on the properties of hyperbolized systems of partial differential equations, for which the domain of influence on the solution is localized in space. As a result, the necessary part of the solution can be rapidly recalculated without restarting the whole calculation process. The number of additional processors required for executing the recalculation is estimated.

DOI: 10.1134/S1064562417010021