Numerical Algorithms and Fault Tolerance
of Hyperexascale Computer Systems
Academician of the RAS B. N. Chetverushkin* and Corresponding Member of the RAS M. V. Yakobovskiy
Translated by I. Ruzanova
Federal Research Center Keldysh Institute of Applied Mathematics, Russian Academy of Sciences,
Moscow, 125047 Russia
Correspondence to: *e-mail: chetver@imamod.ru
Received 15 September, 2016
Abstract—A new method is discussed which provides the possibility of long-term continuous calculations on a computing systems consisting of millions of operating devices, some of which may suffer failures in the course of calculation. The method relies on the properties of hyperbolized systems of partial differential equations, for which the domain of influence on the solution is localized in space. As a result, the necessary part of the solution can be rapidly recalculated without restarting the whole calculation process. The number of additional processors required for executing the recalculation is estimated.
DOI: 10.1134/S1064562417010021