Recovery in Main Memory Databases -Le Gruenwald, Jing Huang, Margaret H. Dunham el al - Engineering Intelligent Systems, Vol.4, No. 3, September 1996

Recovery in Main Memory Databases

-Le Gruenwald, Jing Huang, Margaret H. Dunham el al -

Engineering Intelligent Systems, Vol.4, No. 3, September 1996

이 인선 97/08/21

Introduction

General MMDB Architecture– Main Memory (MM) in RAM memory– Stable Memory(SM)

optional nonvolatile memory used to hold log buffers(log tail) avoid I/O actions when transaction are committed essential to performance

– Archive Memory(AM) holds a backup of the entire database

focus on logging, checkpointing, reloading

MMDB Logging(1)

– physical logging the state of the database modified by an operation are logged it is recommended for MMDB systems

– logical logging contains descriptions of higher level operations and records the

state transition of the database the idempotent property does not hold

MMDB Logging(2) Logging rules

– Write Ahead Rule undo-log data must be written to a nonvolatile memory prior to

the updating in the database

– Commit rule if a DBMS allows a transaction to commit, the redo-log data of

it should be ensured in nonvolatile storage

– Logging After Writing the after image of an updated item should be written to the log

after its corresponding update is propagated to the database simplifies the log processing with a fuzzy checkpointing MMD

B

MMDB Logging(3)

MMDB logging differs from DRDB logging in three ways– a nonvolatile log buffer should be used to satisfy

WAL without requiring I/O prior to transaction commit

– physical logging is recommended as it is easier to use with fuzzy checkpointing

– to reduce the amount of the log needed to redo transactions after a system failure, the LAW policy should be followed

Checkpointing DRDB Commit consistent checkpointing

– periodically stop processing transactions– flush all dirty cache slots and mark the log

cache consistent checkpointing fuzzy checkpointing

– only flushes those dirty slots that have not been flushed since before the previous checkpoint

– normal replacement activity will flush most cache slots that were dirty since before the previous checkpoint

– checkpoint won’t have much flushing to do and won’t delay active transaction for very long.

Checkpointing MMDBs(1) Focuses on low-interference with normal tran

sactions and supporting efficient recovery Fuzzy checkpointing

– Hagmann first suggested using fuzzy checkpointing for MMDBs “a crash recovery scheme for a memory-resident databa

se system” IEEE transactions on computers. Vol. C-35, No. 9, septe

mber 1986 the checkpointer does not need to obtain the locks on th

e data items to be checkpointed the database is dumped in sections after dumping a section, the checkpointer writes a log rec

ord to the log a section must not overwrite its previous image (sliding

monoplexed backups)

LAW with fuzzy checkpointing

Checkpointing MMDBs(2)

– Salem and Garcia-Molina “checkpointing memory-resident databases”(‘89) compared the fuzzy checkpointing scheme with two-non-

fuzzy checkpointing schemes fuzzy checkpointing is the most efficient one ping-pong scheme

– each dirty page is flushed twice

– Lin and Dunham “segmented fuzzy checkpointing for main memory datab

ases”(‘94) checkpoints one segment at a time in a round-robin fashi

on automatically changes the segment boundaries based on

the distribution of update operations


Redo log size in the Segmented fuzzy checkpointing

– Li et al “checkpointing and recovery in partitioned main memory datab

ases(‘95) the database is divided into partitions, each of which has its o

wn log disks the time to recover from a system failure is reduced

B C1

a1 b1 c1B C2

a2 b2 c2

1 2 3 4


Non-Fuzzy Checkpointing– overhead comes from locking the checkpointed obje

cts to ensure transaction-consistency or action-consistency

– Lehman and Carey “a recovery algorithm for a high-performance memory-resid

ent database system”(‘87) transaction-consistent(at relation level)scheme no need to maintain undo-log-records in nonvolatile storage checkpointing increases the data contention with normal tra

nsaction

Checkpointing MMDBs(5)– Salem and Garcia-Molina

“checkpointing memory-resient databases” (‘89) discuss two non-fuzzy checkpointing approaches

– the first(black and white) one aborts some update transactions

– the second(Copy-On-Update) one requires some update transactions storing the original values of data items to be updated

– both have severe impact on the system performance

– Jagadish et al “recovering from main-memory lapses” (‘93) propose an action-consistent checkpointing scheme the undo-logs of active transactions are first written to th

e log, and then dirty pages are flushed to disk during normal processing, the redo-logs of the committed

transactions are written to the log ping-pong update this approach was originally used in Dali


Log-driven checkpointing– applies the log to a previous dump to generate a n

ew dump– originally used to generate remote backup of the d

atabase– is adopted to “incremental recovery in main memo

ry database systems” (‘92)– with high transaction processing rate in MMDBs, t

he size of the log can increase rapidly– it is quite inefficient compared to fuzzy checkpointi

ng

MMDB Reloading(1) Issues

– occurrence frequency of the reload process on average, a system failure occurs once every few wee

ks media failure, MM page faults

– when the system should resume its execution after a failure

28.43 minutes are needed to recover 1Giga DB [?] if the system is not available at all during recovery, many

transactions will be backlogged

– reload prioritization reload priority can be determined based on access frequ

ency, transaction deadline(“MMDB reload algorithms”) or temporal data interval from real-time applications[?]

MMDB Reloading(2) Existing reload schemes

– simple reloading the system can not be brought online until the entire data

base is memory-resident

– concurrent reloading Grenwald

– “mmdb reload algorithms” (‘91)

– two processors(RP & DP), nonvolatile shadow memory(SM) and dual address translation mechanism in the MARS system

– ordered reload with prioritization/ smart reload/ frequency reload

– the differences lie in the structure of AM, utilization of data access frequency, reload prioritization, and reload granularity

– the frequency reload yields the best transaction response time and system throughput

MMDB Reloading(3)

Lehman– “a recovery algorithm for a high-preformance”

– after the system catalogs and their indices are reloaded then regular transaction processing is allowed to resume

Levy and Silberschatz– “incremental recovery in main memory database systems”,

(‘92)

– resume transaction processing immediately after a system failure and recovers pages individually according to the demand of post-crash transaction.

– Stale/fresh marking technique

– in order to implement a page-based recovery, log records must be grouped together on a page basis during normal operation

Recovery with Existing MMDB Systems(1)

Dali from AT&T– the original recovery manager was implemented a

ccording to “recovering from main-memory lapses” (‘93)

logging only redo records during normal execution segment-level action-consistent checkpoints checkpointer write to the disk relevant parts of the undo l

og recovery has only a single pass over the log require no special h/w to preserve the data

– test led to a restructuring of its recovery manager “multi-level recovery in the Dali storage manager” (‘95) multi-level logging, post-commit actions, dirty page detec

tion, and fuzzy checkpoints


Fast Path– supports the memory-resident data and disk-

resident data– performs updates to memory resident data at

commit time– no undo operations are required when a failure

occurs– a group commit is adopted– transaction-consistent backup copy of the

database is refreshed during system shutdown or infrequently checkpoints.

– Two backup database with ping-pong backups


two real-time system examples NEC Real-Time DBMS Stone RTDB

– NEC RTDBMS has several features to ensure high throughput and accurate predictability

no page fault in-memory log buffer is nonvolatile physical logging using deferred update fuzzy checkpointing no real-time characteristics such as transaction deadline

and criticalness are utilized in the recovery components

Summary and Conclusion– Discussed 3 logging rules

nonvolatile log buffer should be used to satisfy WAL without requiring I/O prior to transaction commit

LAW should be followed to reduce the amount of log needed to redo transactions after a system failure

– described three groups of checkpointing– identified 3 issues about reloading

data should be prioritized for reload purposes

– future research investigate how real-time requirements such as transacti

on deadline and temporal data intervals can be incorporated into MMDB recovery

a crash recovery scheme for a memory-resident database

system

Robert B. Hagmann

IEEE transactions on computers. Vol. C-35, No. 9, september 1986

overview

Presents a method of doing recovery that uses the existing techniques of fuzzy dumps and log compression

design requirement– small system example

2 pages/transaction *100 transactions/s * 3600s /h * 8h = 5,760,000 pages written to the log

– transaction size must be short– checkpointed periodically every five minutes

Overview(2)

– The principal requirement of the system is “fast” recovery from a system crash

critical factor : transfer rate of the disk can be improved by using several parallel processors

design overview– fuzzy dump

simply a copy of the database taken without any synchronization

– If a DBMS uses a nonvolatile storage, some log compression can occur

– else precommitting and group commits can be used to increase performance

overview

Design details

Documents

Recovery in Main Memory Databases -Le Gruenwald, Jing Huang, Margaret H. Dunham el al - Engineering Intelligent Systems, Vol.4, No. 3, September 1996