6.830 2009 Lecture 11: Serializability, Locking Today's topics:  serializability  locking reminder: transaction contains operations, then commit/abort  begin transaction    RA    WA    RB    WB  commit reminder: a schedule is the way DB chooses to interleave transactions  we want interleaving to increase concurrency and throughput Schedule 1:  T1  T2  RA  WA      RA      WA  RB  WB      RB      WB reminder: "serializable schedule"  some results (DB state, output) after schedule executes  the schedule is serializable if the results are identical    to result from some serial (one-at-a-time) execution    of the transactions  why? helps programmer reason about transaction code;    programmer need not consider interleaving reminder: View Serializability  defn: schedule S is view serializable if there's a serial order S' s.t.  1. Ti's reads have same sources in S and S'  2. final write of every object done by same xaction in S and S'  intuition:    if T1 in S views same inputs as T1 in S', will write same values    and if order of writes same in S and S', final values will be the same  unlike serializability, *view* serializability is defined in terms of    order of reads and writes, *not* on the computations and results  so it's more tractable to write an algorithm to check it Is Schedule 1 view serializable?  yes, equivalent to S' = T1 T2    T1 in S sees same values as if it ran first    T2 in S sees same values as if it ran second    T2 write A and B second in both S and S' Schedule 2:  T1  T2  RA      RA      WA  WA  RB  WB      RB      WB Is Schedule 2 view serializable?  not equivalent to S' = T1 T2    b/c T1's write to A comes second  not equivalent to S' = T2 T1    b/c T2's write to B comes second v-s says "no" for some serializable schedules, e.g.  T3   T4       WA=1  RA  WA=2       WA=2  this is serializable: same as T3 T4 (A=2 either way)  not v-s:    not T3,T4: RA diff src - rule 1    not T4,T3: diff final writer - rule 2 sadly, no cheap algorithm for deciding view serializability  n! orderings of the transactions, must check against each  what about a defn that is cheap to check? conflict serializability  S is c-s if exists serial S' s.t. you can transform    S to S' by exchanging non-conflicting operations  O1 and O2 conflict if same data and one or both is a write  here are all the possible situations:        T1  T2        RA                 RA no conflict (same values read either way)        WA                 RA conflict (RA would get different value)        RA                 WA conflict (RA would get different value)        WA                 WA conflict (final value would be different) thus a schedule S for T1/T2 is conflict-serializable if for all pairs of  conflicting ops {O1 in T1, O2 in T2}, either  O1 always precedes O2, or  O2 always precedes O1 So for our original pair of transactions    T1    T2  1 RA  5 RA  2 WA  6 WA  3 RB  7 RB  4 WB  8 WB the conflicts are: 1-6, 2-5, 2-6, 3-8, 4-7, 4-8 is Schedule 1 conflict serializable?  1 RA  2 WA       5 RA       6 WA  3 RB  4 WB       7 RB       8 WB let's check each conflict: they all put T1 before T2, so "yes" is Schedule 2 conflict-serializable?  1 RA       5 RA       6 WA  2 WA already in trouble: 1->6 puts T1 before T2, but 2-6 puts T2 before T1 algorithm to check conflict-serialiazability is easy (compared to view-s):  make a graph with transactions as nodes  edge from Ti to Tj if:    Ti r/w some A before Tj writes A, or    Ti writes some A before Tj reads A  if no cycles, schedule is conflict-serializable most DB systems use conflict serializable schedules conflict-serializability gives up some serializable schedules:  T1  T2  T3  RA      WA  WA          WA it is serializable (final value of A same as T1 T2 T3)  and view-serializable  but not conflict serializable (can't move T2 either before or after T1)  note the "blind write" by T2:    view and conflict only differ if there are blind writes now let's extend serializability to deal with aborts what's wrong with this schedule?  T1  T2  RA  WA      RA      commit  RB  abort this schedule is not "recoverable".  T1 abort -> will un-do WA  now T2 committed based on uncommitted data!  that is not serializable -- no order could result in T2 seeing    that value of A  we want "recoverable":    if T2 sees T1's output, T1 commits before T2 commits what's wrong with this schedule?  T1  T2  T3  RA  WA      RA      WA          RA  abort we'd need to abort T2 and T3 too: "cascading rollback" cascadeless schedule: if T2 sees T1's output, T1 commits before  T2 does the read  i.e. one abort never forces another cascadeless schedules are also recoverable so: we want our concurrency control scheme to be  * equivalent to some serial execution  * cascadeless (and thus also recoverable) there are many ways to achieve these two properties  we'll talk about locking locking, the basic idea:  a lock associated with each data item (row, page, table, db)  transaction acquires locks as it goes along  acquire a shared lock before reading  acquire an exclusive lock before writing what if another xaction T2 has a lock that T1 needs?  lock compatibility table says if it's OK for T1 to have it too      T1      S  X T2 S y  n    X n  n i.e. reads do not conflict, but writes conflict with everything if T1 needs a lock, T2 has one already, not compatible:  T1 must *wait* for T2 to release the lock Example:  T1   T2   T1L  T2L  WA        XA      (RA)       wait for SA  RA            Rel A                 SA       RA       WA        XA (upgrade)                 Rel A why does locking help achieve conflict serializability?  T1 has a lock, T2 wants a conflicting lock    e.g. T1 write, T2 wants to read    WILL BE a T1->T2 conflict, whether we wait or not!  locking reduces chance of any future T2->T1 conflict, i.e. a cycle  Example: forcing T2 to wait prevents this non-serializable schedule,    by preventing the T2->T1 conflict:  WA      RA      WA  RA    When should a transaction release locks?  in above example, have to hold lock on A until last use  but that's not enough; consider this example:  WA     RA     WB  RB    it is not OK for T1 to release its lock on A right after the WA.  we cannot allow this schedule -- it has a conflict cycle.  in general, we want to avoid T2 seeing T1's results, and    then influencing T1's operation  i.e. we never want T1 to release a lock to T2, then have T2    release a lock to T1 two-phase locking  rule: xaction can't acquire any lock after first release  growing phase, shrinking phase  formalization of previous discussion, avoids T1->T2->T1 lock handoff Example:  T1  T2  T1L      T2L  RA      SA  WA      XA (upg)  WB      XB          (lock pt)          Rel A      RA           SA      WA           XA (upg)  RB  WB          Rel B      RB           SB      WB           XB                   (lock pt)                   Rel A, B End of growing phase called "lock point"  Order of lock points == serial order  suppose T1's lock point is before T2's lock point    their reads and writes and acquires are still interleaved    during growing phase  but we hope that all conflicting operation pairs have T1 then T2  can T1 see any T2 write? no    would require T1 to get a lock after T2 got exclusive lock    i.e. requires T1's locking point to be after T2's  can T2 *not* see any T1 write? no    would require T2 to read before T1's write    i.e. T1 acquire after T2 release  can some T2 write be overwritten by a T1 write? no    T1 write after T2 write requires T1 acquire after T2 release