6.830 2009 Lecture 12: Locking, Logging lecture overview  two-phase locking  crash recovery  logging remember two-phase locking:  goal: produce conflict serializable schedules  rule: xaction can't acquire any lock after first release  growing phase, shrinking phase Is 2pl cascadeless?  remember cascadeless schedule:    if T2 sees T1's output, T1 commits before T2 does the read    i.e. one abort never forces another  no:  T1      T2  RA SA    WA XA  WB XB     RelA          RA SA  abort  T2's RA sees value from an aborted transaction    DB would have to force T2 to abort also -- cascade  we want to force T2 to wait for T1 to commit or abort before it reads strong strict 2pl (ss2pl):  don't release locks until after transaction commits or aborts  (strict: hold write locks until after commit)  (strong: hold read locks until after commit)  (also called rigorous. book calls it just strict.)  hides its modifications from other readers until committed  thus:    1. hold shared lock before each read    2. hold exclusive lock before each write    3. keep locks until after commit  if you already have the lock, you can use the data w/o locking again  if you have shared lock, can upgrade to excl if no other xaction    has any lock on that item most DB systems use ss2pl are all ss2pl lock grants compatible with conflict-serializability?  i.e. does ss2pl always hand out locks in a way that leads to c-s?  suppose we have two transactions.    T1: RA, RB, WB    T2: RA, RB, WA  any schedule that starts with  RA  RB      RA      RB  cannot be serializable. if T1 now does WB, T2 must come first in serial  order, but that means its WA would have had to come first also.  if T2 now does WA, it must come second since T1 read old value of A,  but then it would have had to see T1's WB. what would ss2pl do for concurrent T1 and T2?  ss2pl always grants a lock if no conflicting locks held    it does NOT e.g examine the transactions and work out a schedule  so it would grant shared locks for A and B to both transactions.  then what?  2pl would force both transactions to wait forever to upgrade.  "deadlock": a cycle in the lock waiting graph    easy to construct these cases    can be common for some workloads how to cope with deadlocks?  1. order all the locks, require xactions to acquire in order     for example above:     T1: RA,RB,WB -> SA,SB,XB     T2: RA,RB,WA -> SA,XA,SB     this avoids deadlocks altogether!     problem: now holding locks for longer, less concurrency     may also require applications to pre-declare what locks they need       would be nice if locking was invisible to apps  2. timeout lock waits, abort one of the transactions, restart it     problem: how long to wait?       too short: needless restarts       too long: idle time  3. look at lock waits-for graph, detect cycles, abort+restart one     good: can catch right away     bad: examining graph takes time     this is how real DBs work there are schemes other than locking that provide serializability Optimistic Concurrency Control (last week's paper)  at a high level, let xactions r/w whatever they want,    never block waiting for someone to release a lock,    but check schedule before commit, and abort+restart if not c-s  xaction has three phases:    read: read anything, write to private memory    validate: look for conflicts, maybe abort/restart    write: copy private writes to real DB  how to validate that it was a c-s schedule?    give each xaction a timestamp (TS) at time of validation    then check equivalance to serial exec in TS order  when Tj commits, check against every Ti w/ TS(Ti) < TS(Tj)    (that is, only check against "earlier" transactions)  always OK:    |---Ti---|              |---Tj---|    i.e. Tj really did serially follow Ti  OK if Ti didn't write anything read by Tj:    |---Ti---|       |------W--|    R-R -- always OK    W-R -- forbidden, since Tj might not see    R-W -- OK, since Tj writes after Ti reads    W-W -- OK, since Tj writes second  OK if Ti doesn't write anything that Ti reads or writes:    |----V----|      |-----V----|    R-R -- always OK    W-R -- forbidden, since Tj might not see    R-W -- OK, since Tj writes after Ti reads    W-W -- forbidden, since Ti might write second  when might OCC be faster than locking?    OCC may allow more concurrency      T1   T2           RA      WA           RA           v           w      v      w    esp if many read-only xactions    or if spare CPUs, may as well use them speculatively  why is locking more popular than OCC?    OCC book keeping not hugely cheaper than lock overhead    OCC can roll back and re-do pretty frequently      where locking would block instead      rollback is more expensive than blocking Crash Recovery general topic  something goes wrong  you want DB to recover and continue, hopefully automatically  you do NOT want to lose data  you do NOT want corrupt data or internal DB data-structures Kinds of faults  client application failure (not our problem)  xaction aborts (voluntary, or deadlock): can be common  o/s bug or panic: once a month? once a year?  DB bug or assertion failure (rare)  power failure  operator error (e.g. unplug the wrong cable)  permanent disk failure  fire in machine room single disk failure is pretty easy: write all data to two disks what about fire in the machine room?  you lose ALL disks and your servers  need a second site with servers, network, and DATA  a plan:    rent "hot standby" server somewhere far away    main server sends every transaction over Internet to standby      it executes it to keep data on disk up to date    on failure:      rearrange communication to make standby the real server  you lose the last few transactions    since typically can't afford to wait for standby to acknowledge    before main server commits rest of lecture: crash recovery  power fails, eventually restored, want DB to keep going failure model:  crash is fail-stop    s/w or h/w didn't write random garbage onto the disk  contents of memory are lost [diagram: processes, buffers]    dirty buffers    state of running processes  contents of disk are preserved [diagram: tables, indices]    though very last write may be partial Recovery goal: "transaction consistent" state  restore disk state as it would have been after some serial execution  serial order must include all transactions that committed    so must finish their disk updates  must not include any transaction that didn't commit    certainly not any aborted xaction    and not any that didn't commit -- cannot resume a transaction  DB data structures on disk must be internally consistent why might this be hard?    1. transactions that hadn't yet committed, but might       have modified disk    2. xaction aborted, was in the middle of un-doing its changes    3. transactions recently committed but not yet on disk    4. DB halfway through e.g. rearranging a B+Tree on disk basic challenge: atomic multi-step updates  xaction has many updates, after crash want all visible, or none  not a good idea to just update "real" data on disk  idea:    prepare a copy of the data in a temporary location on disk    somehow do one final disk write that indicates the new location    crash happens either before or after that final write shadow pages  all DB disk pages part of a single big tree  transaction prepares a new tree with its updates    shares unchanged part with old tree  final write of tree root makes new tree "real"  problem: how to handle concurrent xactions?    DBs don't use shadow pages any more Log-based Recovery  [diagram: buffer cache, tables on disk, single log on disk]  transaction's updates appended to log on disk  only "take effect" when xaction writes EOT record to disk    again, single write makes multiple updates atomic  updates then applied in buffer cache    written to disk when convenient    so disk may be missing updates, and may have updates      of non-committed transactions  crash recovery reads the log:    redo updates of commited xactions    undo updates to non-committed xactions Write-Ahead Logging  DB must write log record on disk before applying update to on-disk DB  Why?  Example: T1: WA, updated A on disk before logging, crash,    now recovery does not know that it should un-do the WA