Design and analysis of an efficient algorithm for coordinated checkpointing in distributed systems

Jiannong Cao, Weijia Jia, Xiaohua Jia, To yat Cheung

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

2 Citations (Scopus)

Abstract

A synchronous checkpointing algorithm coordinates a set of processes in taking checkpoints in such a way that the set of local checkpoints always forms part of a consistent global system state. Whenever a process p requests to take a checkpoint, a set of processes, called the cohorts set of p, must be checked and some of them may also have to take their checkpoints in order to preserve system consistency. Although several synchronous checkpointing algorithms have been proposed in the literature, most of them do not address the performance issue. In this paper we propose an efficient distributed algorithm for synchronous checkpointing. Proof of correctness and analysis of efficiency of the algorithm are presented. It is shown that the algorithm has a better message and time complexity than the existing algorithms. The method proposed in this paper can also be applied to enhance the performance of rollback operation which always require synchronization of the inter-dependent processes.
Original languageEnglish
Title of host publicationProceedings of the Conference on Advances in Parallel and Distributed Computing
PublisherIEEE
Pages261-268
Number of pages8
Publication statusPublished - 1 Jan 1997
Externally publishedYes
EventProceedings of the 1997 Conference on Advances in Parallel and Distributed Computing - Shanghai, China
Duration: 19 Mar 199721 Mar 1997

Conference

ConferenceProceedings of the 1997 Conference on Advances in Parallel and Distributed Computing
Country/TerritoryChina
CityShanghai
Period19/03/9721/03/97

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'Design and analysis of an efficient algorithm for coordinated checkpointing in distributed systems'. Together they form a unique fingerprint.

Cite this