[ Announcements | Schedule and Readings | Assignments and Quizzes | Syllabus/Policies ]

6.893 Schedule and Readings

Schedule

Numbered references (e.g., [1]) refer to the
complete list of readings.

Fundamentals

9/8 Intro
PS 1 Assigned [PDF (modified 9/9/04)]
9/13The Relational Model and SQL
Readings: What Goes Around Comes Around [1], Codd [2]
9/15Logical Design and Physical Database Fundamentals
Readings: External memory algorithms and data structures [5], An Introduction to Database Normalization [4], The Entity-Relationship Model [3]
PS 1 Due
Slides (ppt)
9/20Introduction to Modern Relational Database Systems
Readings: Anatomy of a Database System [6], System R [7]
PS 2 Assigned
Slides (ppt)
 

Query Processing

9/22Optimization Fundamentals
Readings: Selinger Optimizer [8], Optional: Survey of Selectivity Estimation Methods [9].
9/27Join Algorithms and Memory Management
Readings: Join Strategies [10], Buffer Management [11]
Slides (ppt)
9/29 Indexing
Readings: GiST [12] , R*-trees [13]
Reading Questions
10/4Distributed databases
Readings: Dewitt and Gray [14], R* [15]
PS 2 Due
Reading Questions
10/6Data warehouses
Readings: Overview [16], DataCube [17], (Optional) Variant Indices [18]
Reading Questions
Project Proposals Due
10/11 Columbus Day Holiday, No Class.
10/13 Extensibility and Object Databases
Readings: Extensible types [19], ObjectStore [20]
Reading Questions
PS 3 Assigned
 

Transactions

10/18 Introduction to Transactions
Readings: Concurrency Control and Recovery [21], Review 6.033 Ch 8.
No reading questions.
10/20 Transactions Part 2
Readings: Haerder and Reuter [22]
Reading Questions
10/25 Locking and Consistency
Readings: Granularity of Locking and Degrees of Consistency [23]
Reading Questions
Slides from today's lecture
10/27 Optimistic Concurrency Control
Readings: Optimistic Concurrency Control [24], Concurrency Control Performance Modeling [25]
Reading Questions
PS 3 Due
11/1 Exam 1
11/3 Distributed Transactions and Replication
Readings: Transaction Management in R* [26], Dangers of Replication and a Solution [27]
Reading Questions
 

Networked Data Management

11/8 Web Services
Readings: Combining Systems and Databases [28], Data Management in Application Servers [29]
Reading Questions
11/10 Semistructured Data
Readings: Introduction to XML [30], Query Optimization for XML [31], X is for XQuery [32]
Reading Questions
11/15 Continuous Queries
Readings: NiagaraCQ [33], Ariel [34]
Reading Questions
11/17 Adaptive Query Processing
Readings: Eddies [35], Eddies Overhead [36]
Reading Questions
11/22 Stream Databases
Readings: Aurora [37]
Reading Questions
 

Advanced Topics

11/24 Data Mining and Sequence Queries
Readings: AQuery [38]
Reading Questions
11/29 Approximate Querying
Readings: Informix under CONTROL [39]
Reading Questions
12/1 Exam 2
12/6 Consistency and Availability in Data Streams
Readings: High Availability Algorithms for Stream Processing [40]
Reading Questions
12/8 Last day of class
Presentations
Project Reports Due

Complete List of Readings

Most readings are from the "Red Book", otherwise known as "Readings in Database Systems, 4th Edition", edited by Michael Stonebraker and Joseph Hellerstein. This book is available from the MIT Press and copies have been ordered and should be available in the Coop. Note that this version of the book is substantially different than the 3rd edition and includes a number of new readings (and articles that are not otherwise available.)

  1. Michael Stonebraker and Joseph Hellerstein. What Goes Around Comes Around . Red Book. Copies will be made available on the first day of class. Read Sections 1-4, 9, 10, and 11.

  2. E.F. Codd. A relational model of data for large shared data banks . Communications of the ACM, 1970. [PDF]. This is linked to the ACM website and is only accessible with an ACM account or from an MIT IP. Focus on Sections 1.3 and all of Section 2.

  3. The Entity-Relationship Model -- toward a unified view of data . [PDF]. We will not discuss this paper in class -- I have included it because ER modeling is a technqiue the is widely used in database design that may be useful as a personal reference.

  4. Mike Hillyer. Introduction to Normalization. [HTML] Note that this is an extremely informal discussion of the topic of database normalization, and does not cover all of the issues in depth. It is useful as a reference to the different types of normalization we discussed in class.

  5. J. Vitter. External memory algorithms and data structures: dealing with massive data. ACM Computing Surveys 33(2), 2001. Pages 209-271. [PDF]. Read sections 1-3, 5, 9, and 10 -- we will revisit R-Trees (Section 11) later in the course.

  6. Joseph Hellerstein and Michael Stonebraker. The Anatomy of a Database System. Red Book. Read Sections 1-4; we will return to Section 5 when we discuss concurrency control and recovery.

  7. M.M. Astrahan et al. System R: Relational Approach to Database Management. ACM TODS 1(2), 1976. Pages 97-137. [PDF]. Requires an MIT IP for access.

  8. Patricia Selinger, M. Astrahan, D. Chamberlin, Raymond Lorie, and T. Price. Access Path Selection in a Relational Database Management System. Proceedings of ACM SIGMOD, 1979. Pages 22-34. Red Book. We will discuss most of this paper next time in class -- read the whole thing, paying careful attention to the cost estimation methods and dynamic progamming based optimization algorithm.

  9. Michael Mannino, Paichen Chu, and Thomas Sager. Statistical Profile Estimation in Database Systems. ACM Computing Surveys 20(3), 1988. Pages 191-221. [PDF] Requires an MIT IP for access. This paper is optional -- it discusses many of the techniques that used to make query optimization and cost estimation practical in modern database systems. Will will cover some of the ideas at a high level in class.

  10. L.D. Shapiro. Join Processing in Database Systems with Large Main Memories. Red Book. This paper presents a thorough discussion of different strategies for computing a join, including methods that take advantage of hash and btree indices. We will discuss the tradeoffs between join strategies in class, so think carefully about the pros and cons of each approach.

  11. Buffer Management in Relational Database Systems . TODS, 11(4), 1986. Pages 473-489. [PDF]. Link only accessible from MIT IPs. We will not discuss this paper in detail in class and you do not need to read it front-to-back, but you should be familiar with the basic idea behind hot-set based buffer management. For the extra-curious, the classic paper in this area is from Chou and Dewitt in 1978, called "An Evaluation of Buffer Management Strategies for Relational Database Systems" -- it proposes a hot-set model and a evaluates a buffer management strategy called DBMIN that is similar to the strategy proposed in this paper (as far as I can tell, the DBMIN paper is not available in digital format from ACM -- it can be found elsewhere on the web.)

  12. Joseph M. Hellerstein, Jeffrey F. Naughton, and Avi Pfeffer. Generalized Search Trees for Database Systems. VLDB 1995. Red Book. Rather than focusing on the details of the different tree types discussed in this paper, try to understand the basic GiST abstraction.

  13. Norbert Beckman, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. The R*-tree: An efficient and robust access method for points and rectangles". SIGMOD, 1990. Red Book. R/R* trees are an important generalization of B-trees. Make sure you understand what types of queries they are useful for, and how their properties differ from those of B-trees.

  14. David Dewitt and Jim Gray. Parallel Database Systems: The Future of High Performance Database Processing. Communications of the ACM. 1992. Red Book. This is a good general introduction to parallel database systems and the issues involved in their design.

  15. Lothar F. Mackert and Guy M. Lohman. R* Optimizer Validation and Performance Evaluation for Distributed Queries. VLDB 1986. Red Book.

  16. Surajit Chaudhuri and Umeshwar Dayal. An overview of data warehousing and OLAP technology. SIGMOD Record, 26(1), 1997. Red Book.

  17. Jim Gray, Surajit Chaudhuri, Adam Bosworth, Andrew Layman, Don Reichart, Murali Venkatrao. DataCube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery, 1997. Red Book.

  18. Patrick O'Neil and Dallan Quass. Improved Query Performance with Variant Indices. ACM SIGMOD 1997. Red Book. This paper is optional, but it's worth reading to at least get a sense of the unusual types of indices that are often used in warehouses. The overview paper mentions this type of index briefly.

  19. Michael Stonebraker. Inclusion of New Types in Relational Database Systems. Proceedings of ICDE, 1986. Red Book.

  20. Charles Lamb, Gordon Landis, Jack Orenstein, Dan Weinreb. The ObjectStore Database System. Communications of the ACM, 34(10), 1991. [PDF] Requires an MIT IP for access.

  21. Michael Franklin. Concurrency Control and Recovery. The Handbook of Computer Science and Engineering, A. Tucker, ed., CRC Press, Boca Raton, 1997 [PDF]. There are no reading questions for this paper -- it's a farily accessible overview, which you should read and understand.

  22. Theo Haerder and Andreas Reuter. Principles of Transaction Oriented Database Recovery. ACM Computing Surverys 15(4), 1983. [PDF]. Requires an MIT IP for access.

  23. Jim Gray et al. Granularity of Locking and Degrees of Consistency in a Shared Data Base. From "Modeling in Data Base Management Systems", G.M. Nijssen, (ed.), 1976. In Red Book.

  24. H.T. Kung and John T. Robinson. "On Optimistic Methods for Concurrency Control." TODS, June, 1981. In Red Book.

  25. Rakesh Agrawal, Mike Carey, and Miron Livny. Concurrency Control Performance Modeling: Alternatives and Implications. ACM Transactions On Database Systems 12(4), 1987. In Red Book. (Note that the paper that appears in the Red Book does not contain pages 613-622 -- this is intentional and you shouldn't read them if you get the paper from elsewhere.) You also do not need to read Section 6.

  26. C.Mohan, Bruce Lindsay, and R. Obermarck. Transaction Management in the R* Distributed Database Management Systems. ACM Transactions On Database Systems 11(4), 1986. In Red Book.

  27. Jim Gray et al. The Dangers of Replication and a Soloution. ACM SIGMOD 1996. In Red Book.

  28. Eric Brewer. Combining Systems and Databases: A Search Engine Retrospective. In Red Book.

  29. Dean Jacobs. Data Management in Application Servers. In Red Book.

  30. Andre Bergholz. Extending your Markup: An XML Tutorial. IEEE Internet Computing, August, 2003. [PDF]. This is a basic introduction to XML, DTDs, and XML schemas. If you are already familiar with XML, you do not need to read it.

  31. Ray Goldman and Jennifer Widom. Query Optimization for XML. VLDB, 1999. [PDF].

  32. Jason Hunter. X is For XQuery. Oracle Technology Network. [HTML]. This provides a very lightweight introduction to the XQuery language, an emerging standard for XML query processing.

  33. Jianjun Chen, David DeWitt, Fend Tian, and Yuan Wang. NiagaraCQ: A Scalable Continuous Query System for the Internet Databases. ACM SIGMOD, 2000. In Red Book.

  34. Eric N. Hanson, Chris Carnes, Lan Huang, Mohan Konyala, Lloyd Noronha, Sashi Parthasarathy, J. B. Park, and Albert Vernon. Scalable Trigger Processing. ICDE 1999. In Red Book.

  35. Ron Avnur and Joseph M. Hellerstein. Eddies: Continuously Adaptive Query Processing. SIGMOD Conference 2000. In Red Book.

  36. Amol Deshpande. An Initial Study of Overheads of Eddies. SIGMOD Record, 33(1), March 2004. [PDF]

  37. Daniel J. Abadi, Don Carney, Ugur Cetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. Aurora: a new model and architecture for data stream management.. VLDB Journal 12(2): 120-139, August 2003. [PDF]. Read Sections 1-6. Note that this IS NOT the Aurora paper in the Red Book.

  38. Alberto Lerner and Dennis Shasha. AQuery: A Query Language for Ordered Data, Optimization Techniques, and Experiments.. VLDB, 2003. [PDF].

  39. Joseph Hellerstein, Ron Avnur, and Vijayshankar Raman. Informix under CONTROL: Online Query Processing. Data Mining and Knowledge Discovery, 12, 281-314, 2000. In Red Book.

  40. Jeong-Hyon Hwang, Magdalena Balazinska, Alexander Rasin, Ugur Cetinemel, Mike Stonebraker, and Stan Zdonik. High-Availability Algorithms for Distributed Stream Processing. ICDE 2005. [PDF]
Complete references will be added as the class progresses.

Last modified: Thu Dec 2 15:59:49 EST 2004