Abstract
Recovery by checkpointing on distributed shared memory systems is investigated in this paper. The notion of consistent global states on a sequentially consistent shared memory system is defined. We investigate how consistent checkpoints can be obtained in these systems. In addition, a novel lazy checkpointing approach is proposed. It allows a controlled degree of concurrency and, at the same time, limits the amount of rollback propagation during recovery. Correctness requirements for efficient checkpointing are explored first and algorithms satisfying the requirements are developed subsequently. Several interesting properties of checkpointing on distributed shared memory systems are discovered. In particular, we show that for low levels of laziness, one can achieve better concurrency with more stable storage.
Original language | English |
---|---|
Title of host publication | Proceedings of the Annual ACM Symposium on Principles of Distributed Computing |
Publisher | ACM |
Pages | 64-73 |
Number of pages | 10 |
Publication status | Published - 1 Jan 1995 |
Externally published | Yes |
Event | Proceedings of the 14th Annual ACM Symposium on Principles of Distributed Computing - Ottawa, Canada Duration: 20 Aug 1995 → 23 Aug 1995 |
Conference
Conference | Proceedings of the 14th Annual ACM Symposium on Principles of Distributed Computing |
---|---|
Country/Territory | Canada |
City | Ottawa |
Period | 20/08/95 → 23/08/95 |
ASJC Scopus subject areas
- Software
- Hardware and Architecture
- Computer Networks and Communications