The database group at MIT conducts research on all areas of database systems and information management. Projects range from the design of new user interfaces and query languages to low-level query execution issues, ranging from design of new systems for database analytics and main memory databases to query processing in next generation pervasive and ubiquitous environments, such as sensor networks, wide area information systems, personal databases, and the Web.

Professor Madden offers a class in Database Systems (6.830).

Intel Science and Technology Center in Big Data

In the Big Data ISTC, our mission is to produce new data management systems and compute architectures for Big Data. Together, these systems will help people process data that exceeds the scale, rate, or sophistication of current data processing systems. We are working to demonstrate the effectiveness of these solutions on real applications in science, engineering, and medicine, making our results broadly available through open sourcing.


In the DataHub project, we are building an experimental hosted platform (GitHub-like) for organizing, managing, sharing, collaborating, and making sense of data. The hosted platform provides easy to use tools/interfaces for:

  • Managing your data (ingestion, curation, sharing, collaboration)
  • Using others' data (discovering, linking)
  • Making sense of data (query, analytics, visualization)

In CarTel, we are building a system for managing data in the face of intermittent and variable connectivity. We are focusing, in particular, on automotive applications that involve high-rate sensing of road, traffic, and infrastructure conditions. The two key technologies we are developing are CafNet, a carry-and-forward network stack, and a distributed, signal-oriented, priority-dgriven query processor.


In RelationalCloud, we are investigating research challenges to enable Database-as-a-Service (DaaS) within the Cloud Computing paradigm. In particular, we are focusing on the problems of (i) characterizing workloads and assigning them on different data management solutions (ranging from multi-tenant database, to high-profile clustered main-memory solutions) and (ii) highly dynamic allocation of resources to accomodate evolving and bursty workloads in a transparent manner. Our long-term vision aims at combining multiple dedicated data management solutions behind a unifying DaaS interface: "One Data Service to manage them all".


The goal of the H-Store project is to investigate how recent architectural and application trends affect the performance of online transaction processing databases (such as those that back many e-commerce sites, banks and reservation systems), and to study what performance benefits would be possible with a complete redesign of OLTP systems in light of these trends. Our idea is to build a main memory system with a dramatically simplified concurrency control and recovery model, which the goal of executing many times as many transactions per second as existing databases that rely on logging, expensive locking based conccurency control, and disk based recovery. Our early results show that a simple prototype built from scratch using modern assumptions can outperform current commercial DBMS offerings by around a factor of 80 on OLTP workloads. We are currently working to build a full-featured system that demonstrates these performance wins in a more robust prototype.


StatusQuo is a new programming system for developing database applications. Programmers often go at length to make their applications perform, such as using stored procedures, rewriting their applications into map / reduce tasks or custom query languages, etc. StatusQuo frees the programmers from doing any of that. By leveraging program analysis techniques, the system optimizes applications and makes them perform. You can now write as inefficient code as you like and StatusQuo will automatically handle the rest for you.

Past Projects

Qurk is a database that answers queries using people.

Crowdsourcing platforms such as Amazon's Mechanical Turk make it possible to organize crowd workers to perform tasks like translation or image labelling on demand. Building these workflows is challenging: how much should you pay crowd workers? can you trust the output of each worker? How can you coordinate workers to perform complicated high-level tasks? Qurk helps you build crowd-powered data processing workflows using a PIG-like language while tackling these challenges on your behalf.


C-Store is a read-optimized relational DBMS that contrasts sharply with most current systems, which are write-optimized. Among the many differences in its design are: storage of data by column rather than by row, careful coding and packing of objects into storage including main memory during query processing, storing an overlapping collection of column-oriented projections, rather than the current fare of tables and indexes, a non-traditional implementation of transactions which includes high availability and snapshot isolation for read-only transactions, and the extensive use of bitmap indexes to complement B-tree structures.


WaveScope is a software platform to make it easy to develop, deploy, and operate wireless sensor networks that exhibit high data rates. In contrast to the "first generation" of wireless sensor networks that are characterized by relatively low sensor sampling rates, there are several important emerging applications in which high rates of hundreds to tens of thousands of sensor samples per second are common. These include civil and structural engineering applications, including continuous monitoring of physical structures, industrial equipment, and fluid pipelines; "Smart space" applications that continuously monitor sensors in a a space to support ubiquitous computing or security applications; and, scientific data gathering applications, such as outdoor acoustic monitoring systems for continuous habitat monitoring.


This is an NSF-funded project to investigate the management of uncertainty in database systems. We are looking at probabilistic models and approximate query processing techniques in a variety of real world settings.

Query Processing In Sensor Networks (QPSN)

The goal of the QPSN project is to provide a declarative-query interface for collecting data from sensor networks. This approach greatly simplifies sensor network programming while still providing a power-efficient framework that is expressive enough for a wide variety of data collection tasks. See TinyDB for information on our prototype sensor network query processor implementation, as well as our recent papers on Model based data acqusition (VLDB '04), Event-detection in sensor networks (VLDB '05), Time-series modeling (EWSN '06), and Model-based views for databases (SIGMOD '06).

Haystack: The universal information client

Haystack is a tool designed to let every individual manage all of their information in the way that makes the most sense to them. By removing the arbitrary barriers created by applications only handling certain information "types", and recording only a fixed set of relationships defined by the developer, it aims to let users define whichever arrangements of, connections between, and views of information they find most effective.



Administrative Assistant
  • Sheila Marian


  • Firas Abuzaid
  • Leilani Battle
  • Anant Bhardwaj
  • Rachel Harding
  • Albert Kim
  • Yi Lu
  • Oscar Moll
  • Anil Shanbhag
  • Rebecca Taft
  • Manasi Vartak

Research Staff

  • Albert Carter (Staff Programmer, Big Data @ MIT)
  • Stavros Papadapolous (ISTC Researcher, and Visting Researcher)
  • Nesime Tatbul (ISTC Researcher, and Visting Researcher)



    Evangelos Taratoris
Recent and Selected Publications
  • Manasi Vartak, Sajjadur Rahman, Samuel Madden, Aditya Parameswaran, Neoklis Polyzotis. SEEDB: Efficient Data-Driven Visualization Recommendations to Support Visual Analytics. In Proceedings of PVLDB, 2015. [PDF]
  • Anant Bhardwaj, Amol Deshpande, Aaron J. Elmore, David Karger, Samuel Madden, Aditya Parameswaran, Harihar Subramanyan, Eugene Wu, Rebecca Zhang. Collaborative Data Analytics with DataHub. In Proceedings of PVLDB, 2015. [PDF]
  • Eric Blais, Albert Kim, Aditya Parameswaran, Piotr Indyk, Samuel Madden, Ronitt Rubinfeld. Rapid Sampling for Visualizations with Ordering Guarantees. In Proceedings of PVLDB, 2015. [PDF]
  • Anant Bhardwaj, Souvik Bhattacherjee, Amit Chavan, Amol Deshpande, Aaron J. Elmore, Samuel Madden, Aditya Parameswaran. DataHub: Collaborative Data Science & Dataset Version Management at Scale. In Proceedings of CIDR, 2015. [PDF]
  • Barzan Mozafari, Purna Sarkar, Michael Franklin, Michael Jordan, Samuel Madden. Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning. In Proceedings of PVLDB, 2014. [PDF]
  • Ugur Cetintemel, Jiang Du, Timothy Kraska, Samuel Madden, David Maier, John Meehan, Andrew Pavlo, Michael Stonebraker, Erik Sutherland, Nesime Tatbul, Kristin Tufte, Hao Wang, Stan Zdonik. S-Store: A Streaming NewSQL System for Big Velocity Applications. Demo. In Proceedings of PVLDB, 2014. [PDF]
  • Manasi Vartak, Samuel Madden, Aditya Parameswaran, Neoklis Polyzotis. SEEDB: Automatically Generating Query Visualizations. Demo. In Proceedings of PVLDB, 2014. [PDF]
  • Eugene Wu, Leilani Battle, Samuel Madden. The Case for Data Visualization Management Systems (Vision Paper). In Proceedings of PVLDB, 2014. [PDF]
  • Alvin Cheung, Samuel Madden, Armando Solar-Lezama, Owen Arden, Andrew Myers. Using Program Analysis to Improve Database Applications. In IEEE Data Engineering Bulletin, 2014. [PDF]
  • Stephen Tu, Wenting Zheng, Eddie Kohler, Barbara Liskov, Samuel Madden. Speedy Transactions in Multicore In-Memory Databases. In Proceedings of SOSP, 2013. [PDF]
  • Daniel Abadi, Peter Boncz, Stavros Harizopoulos, Stratos Idreos, Samuel Madden. The Design and Implementation of Modern Column-Oriented Database Systems. In Foundations and Trends in Databases, 2013. [PDF]
  • Barzan Mozafari, Carlo Curino, Samuel Madden. Resource and Performance Prediction for Building a Next Generation Database Cloud. In Proceedings of CIDR, 2013. [PDF]
  • Eugene Wu, Samuel Madden. Explanatory Lineage. In Proceedings of CIDR, 2013. [PDF]
  • Alvin Cheung, Owen Arden, Samuel Madden, Andrew Myers, Armando Solar-Lezama. StatusQuo: Making Familiar Abstractions Perform Using Program Analysis. In Proceedings of CIDR, 2013. [PDF]
  • Stephen Tu, Frans Kaashoek, Nickolai Zeldovich, Samuel Madden. Processing Analytical Queries over Encrypted Data. In Proceedings of VLDB, 2013. [PDF]
  • Sameer Agarwal, Barzan Mozafari, Aurojit Panda, Henry Milner, Samuel Madden, Ion Stoica. BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data. In Proceedings of EuroSys, 2013. [PDF]
  • Eugene Wu, Samuel Madden, Michael Stonebraker. SubZero: a Fine-Grained Lineage System for Scientific Databases. In Proceedings of ICDE, 2013. [PDF]
  • Alvin Cheung, Armando Solar-Lezama, Samuel Madden. Using Program Synthesis for Social Recommendations. In Proceedings of CIKM, 2012. [PDF]
  • Byung-Gon Chun, Carlo Curino, Russell Sears, Alexander Shraer, Samuel Madden, Raghu Ramakrishnan. Mobius: Unified Messaging and Data Serving for Mobile Apps. In Proceedings of MobiSys, 2012. [PDF]
  • Alvin Cheung, Owen Arden, Samuel Madden, Andrew Myers. Automatic Partitioning of Database Applications. In Proceedings of PVLDB, 2012. [PDF]
  • Aubrey L. Tatarowicz, Carlo Curino, Evan Jones, Samuel Madden. Lookup Tables: Fine-Grained Partitioning for Distributed Databases. In Proceedings of ICDE, 2012. [PDF]
  • Adam Seering, Philippe Cudre-Mauroux, Samuel Madden, Michael Stonebraker. Efficient Versioning for Scientific Array Databases. In Proceedings of ICDE, 2012. [PDF]
  • Adam Marcus, Eugene Wu, David Karger, Samuel Madden, Robert C. Miller. Human-powered Sorts and Joins. In Proceedings of PVLDB, 2012. [PDF]
  • Priya Gupta, Nickolai Zeldovich, Samuel Madden. A Trigger-Based Middleware Cache for ORMs. In Proceedings of ACM Middleware, 2011. [PDF]
  • Adam Marcus, Michael S. Bernstein, Osama Badar, David Karger, Samuel Madden, Robert C. Miller. Processing and Visualizing the Data in Tweets. In SIGMOD Record, 2011. [PDF]
  • Alvin Cheung, Armando Solar-Lezama, Samuel Madden. Partial Replay of Long-Running Applications. In Proceedings of Symposium on the Foundations of Software Engineering (FSE), 2011. [PDF]
  • Carlo Curino, Evan Jones, Samuel Madden, Hari Balakrishnan. Workload-Aware Database Monitoring and Consolidation. In Proceedings of SIGMOD, 2011. [PDF]
  • Lenin Ravindranath, Calvin Newport, Hari Balakrishnan, Samuel Madden. Improving Wireless Network Performance Using Sensor Hints. In Proceedings of NSDI, 2011. [PDF]
  • Arvind Thiagarajan, Lenin Ravindranath, Hari Balakrishnan, Samuel Madden, Lewis Girod. Accurate, Low-Energy Trajectory Mapping for Mobile Devices. In Proceedings of NSDI, 2011. [PDF]
  • Adam Marcus, Eugene Wu, David Karger, Samuel Madden, Robert C. Miller. Crowdsourced Databases: Query Processing with People. In Proceedings of CIDR (Outrageous Ideas Track), 2011. [PDF]
  • Eugene Wu, Carlo Curino, Samuel Madden. No Bits Left Behind. In Proceedings of CIDR (Outrageous Ideas Track), 2011. [PDF]
Comments? Corrections? Contact webmaster at db.csail.mit.edu