On distributed object checkpointing and recovery

Manhoi Choy, Hong Va Leong, Man Hon Wong

Research output: Chapter in book / Conference proceedingConference article published in proceeding or bookAcademic researchpeer-review

2 Citations (Scopus)

Abstract

Recovery by checkpointing on distributed shared memory systems is investigated in this paper. The notion of consistent global states on a sequentially consistent shared memory system is defined. We investigate how consistent checkpoints can be obtained in these systems. In addition, a novel lazy checkpointing approach is proposed. It allows a controlled degree of concurrency and, at the same time, limits the amount of rollback propagation during recovery. Correctness requirements for efficient checkpointing are explored first and algorithms satisfying the requirements are developed subsequently. Several interesting properties of checkpointing on distributed shared memory systems are discovered. In particular, we show that for low levels of laziness, one can achieve better concurrency with more stable storage.
Original languageEnglish
Title of host publicationProceedings of the Annual ACM Symposium on Principles of Distributed Computing
PublisherACM
Pages64-73
Number of pages10
Publication statusPublished - 1 Jan 1995
Externally publishedYes
EventProceedings of the 14th Annual ACM Symposium on Principles of Distributed Computing - Ottawa, Canada
Duration: 20 Aug 199523 Aug 1995

Conference

ConferenceProceedings of the 14th Annual ACM Symposium on Principles of Distributed Computing
Country/TerritoryCanada
CityOttawa
Period20/08/9523/08/95

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Cite this