6.830 2009 Lecture 13: Logging, ARIES where we were:  how to recover from crash while updating disk?    e.g. halfway through a transaction  transaction appends each action to a log on disk,    and only then is allowed to modify the DB on disk    appends commit record to log on disk when done  if crash, recovery s/w looks at log, knows what was happening    at time of crash, whether transaction committed, can finish or undo there are many logging designs possible  first I'll discuss some high-level design stuff, loosely based on ARIES  then we'll dive into ARIES details what kinds of records appear in a log?  SOT: LSN, TID  EOT: LSN, TID, commit/abort  UP : LSN, TID, redo info, undo info       physical: the bytes, an after-image for redo, before-image for undo                 says exactly what changes to make on disk       logical: description of op ("insert row into table t")                may imply many writes, e.g. index updates       logical usually more compact, but as we'll see not always preferable  CP : LSN, info about where recovery must start in the log  CLR: LSN, TID, info about undo Logging and locking  correctness of logging depends on recoverable locking e.g. strict 2pl  log: T1:WA, T2:WA, T2:commit, crash  recovery would have to undo T1:WA    not generally possible; cannot simply restore old value  recoverable locking makes old value correct for undo Logging and buffer cache manager  when can the BM write a dirty page from cache to disk?  (BM always obeys WAL, but still has freedom) STEAL: BM can write any dirty page to disk NO-STEAL: BM cannot write dirty page to disk until xaction commits FORCE: BM must write xaction's dirty pages to disk at commit NO-FORCE: commit w/o write to disk your SimpleDB lab3 probably uses FORCE + NO-STEAL  FORCE => no REDO? since if a transaction    committed, its updates are already on disk  NO-STEAL => no UNDO? since never write an uncommitted transactions    modifications to disk.  this combination can't obviate both REDO and UNDO    if commit then FORCE, recovery must redo    if FORCE then commit, recovery must undo Real DBs use NO-FORCE + STEAL for performance  NO-FORCE: coalesce many xactions' updates into single disk write  STEAL: increase concurrency, since if T1's modifications couldn't         be written, some other T2 that needs a buffer page might have         to wait Example of logging during normal operation: T1-------WA----------commit   T2-----------WB----------abort          T3---WC-----------------crash! log on disk:  S1 S2 U1A S3 U3C U2B C1 A2 after the crash:  nothing in memory  log from disk  DB state on disk? w.r.t. the writes to A, B, C?    A: NO-FORCE says maybe no, but BM was free to write    B: STEAL says maybe yes, but NO-FORCE says maybe yes    C: STEAL says maybe yes, but BM free not to write at a high level, what does recovery need to do about three updates?  ensure A updated  ensure B NOT updated (i.e. has value it had before start of T2)  what about C? ensure NOT updated recovery outline  there are lots of schemes that work  1. analyze log: what committed, what didn't, how far back we have to go  2. REDO some or all  3. UNDO some or all  (or, maybe, UNDO then REDO) what's tricky?  what if we crash during REDO and UNDO?    will repeating recovery still work?  DB internal state may be inconsistent, e.g. crash while rearranging b+tree    will applying REDOs and UNDOs work correctly?  why does a partial log (just operations from committed xactions) make sense?    suppose T1 reorganizes a b+tree    then T2 modifies the b+tree and commits    then crash    if we UNDO (or don't REDO) T1's operations, will T2's make sense?  how to avoid replaying the entire log? ARIES paper  best detailed description we have of a good DB recovery system  much of it was long since known in DB community, but not written down  paper has detailed answers to all of the above questions  but it is very hard to read! ARIES update record format  LSN  Type  TID  prevLSN  redo page #  redo after-image (just bytes that changed, not whole page)  (in fact UP record can have multiple page #s and after-images,   if e.g. need to change heap file and index atomically)  undo command, e.g. "DELETE FROM TABLE xxx WHERE yyy" ARIES uses "physiological" logging  physical redo  logical undo  we'll talk about why this "physiological" combination makes sense later ARIES in-memory data structures  xactionTable    TID, lastLSN  dirtyPgTable    pgNo, recLSN ARIES checkpoint log record  helps recovery decide how far back it must look in the log  ARIES writes a checkpoint periodically  for each open transaction:    TID, lastLSN (where to start undo)  for each dirty cached page:    pgNo, recLSN (recoveryLSN, *first* log record that dirtied it)  checkpoint just records this modest amount of state    doesn't force any pages to disk  "master record" at fixed sector on disk contains the block #    of the most recent completed checkpoint ARIES page format (on disk and in cache)  pageLSN: every page contains the LSN of the last log record that modified it  recovery can skip REDO write if pageLSN >= record's LSN ARIES periodically writes dirty pages to disk  STEAL, so can write page of uncommitted xaction  NO-FORCE, so might not write pages of committed xaction ====> slides ARIES Example  talk through why the timeline generates the log we see ARIES Data Structures  at time of crash, what is the state of:    in-mem data structures,    on-disk pages    checkpoint in log Analysis  master record tells us where to look for latest checkpoint  checkpoint tells us initial xactionTable and dirtyPgTable  scan from CP forward to get table contents at crash  why do we need the table contents?    xactionTable tells us set of losers      lastLSN tells us where to start undo-ing each    dirtyPgTable tells us which DB pages might need redo-ing      recLSN tells us first log entry to redo What does UNDO do?  they are logical, since pre-image often not correct  this means they depend on the DB being "action consistent"  i.e. DB reflects logged updates    all updates completely applied ====> after slides Why physical redo?  Logical operations are only safe on an action consistent disk.  But DB is not action consistent after a crash.  Physical after-images are always safe to replay IN ORDER.    Since they repeat exactly the original writes.    Must redo for loser transactions.    E.g. one of the rearranged an index. Why logical undo?  (already explained this)  Safe, because DB is action consistent after redo phase.  Neccessary -- cannot use physical undo!    Maybe original operation rearranged an index.    Cannot undo just that operation, invalidates subsequent writes      of non-loser transactions.