MIT Database Group

Overview

The database group at MIT conducts research on all areas of database systems and information management. Projects range from the design of new user interfaces and query languages to low-level query execution issues, ranging from design of new systems for database analytics and main memory databases to query processing in next generation pervasive and ubiquitous environments, such as sensor networks, wide area information systems, personal databases, and the Web.

Professor Madden offers a class in Database Systems (6.830).

Projects

Intel Science and Technology Center in Big Data

In the Big Data ISTC, our mission is to produce new data management systems and compute architectures for Big Data. Together, these systems will help people process data that exceeds the scale, rate, or sophistication of current data processing systems. We are working to demonstrate the effectiveness of these solutions on real applications in science, engineering, and medicine, making our results broadly available through open sourcing.

DataHub

In the DataHub project, we are building an experimental hosted platform (GitHub-like) for organizing, managing, sharing, collaborating, and making sense of data. The hosted platform provides easy to use tools/interfaces for:

Managing your data (ingestion, curation, sharing, collaboration)
Using others' data (discovering, linking)
Making sense of data (query, analytics, visualization)

CarTel

In CarTel, we are building a system for managing data in the face of intermittent and variable connectivity. We are focusing, in particular, on automotive applications that involve high-rate sensing of road, traffic, and infrastructure conditions. The two key technologies we are developing are CafNet, a carry-and-forward network stack, and a distributed, signal-oriented, priority-dgriven query processor.

RelationalCloud

In RelationalCloud, we are investigating research challenges to enable Database-as-a-Service (DaaS) within the Cloud Computing paradigm. In particular, we are focusing on the problems of (i) characterizing workloads and assigning them on different data management solutions (ranging from multi-tenant database, to high-profile clustered main-memory solutions) and (ii) highly dynamic allocation of resources to accomodate evolving and bursty workloads in a transparent manner. Our long-term vision aims at combining multiple dedicated data management solutions behind a unifying DaaS interface: "One Data Service to manage them all".

H-Store

The goal of the H-Store project is to investigate how recent architectural and application trends affect the performance of online transaction processing databases (such as those that back many e-commerce sites, banks and reservation systems), and to study what performance benefits would be possible with a complete redesign of OLTP systems in light of these trends. Our idea is to build a main memory system with a dramatically simplified concurrency control and recovery model, which the goal of executing many times as many transactions per second as existing databases that rely on logging, expensive locking based conccurency control, and disk based recovery. Our early results show that a simple prototype built from scratch using modern assumptions can outperform current commercial DBMS offerings by around a factor of 80 on OLTP workloads. We are currently working to build a full-featured system that demonstrates these performance wins in a more robust prototype.

StatusQuo

StatusQuo is a new programming system for developing database applications. Programmers often go at length to make their applications perform, such as using stored procedures, rewriting their applications into map / reduce tasks or custom query languages, etc. StatusQuo frees the programmers from doing any of that. By leveraging program analysis techniques, the system optimizes applications and makes them perform. You can now write as inefficient code as you like and StatusQuo will automatically handle the rest for you.

Past Projects

Qurk

Qurk is a database that answers queries using people.

Crowdsourcing platforms such as Amazon's Mechanical Turk make it possible to organize crowd workers to perform tasks like translation or image labelling on demand. Building these workflows is challenging: how much should you pay crowd workers? can you trust the output of each worker? How can you coordinate workers to perform complicated high-level tasks? Qurk helps you build crowd-powered data processing workflows using a PIG-like language while tackling these challenges on your behalf.

C-Store

C-Store is a read-optimized relational DBMS that contrasts sharply with most current systems, which are write-optimized. Among the many differences in its design are: storage of data by column rather than by row, careful coding and packing of objects into storage including main memory during query processing, storing an overlapping collection of column-oriented projections, rather than the current fare of tables and indexes, a non-traditional implementation of transactions which includes high availability and snapshot isolation for read-only transactions, and the extensive use of bitmap indexes to complement B-tree structures.

WaveScope

WaveScope is a software platform to make it easy to develop, deploy, and operate wireless sensor networks that exhibit high data rates. In contrast to the "first generation" of wireless sensor networks that are characterized by relatively low sensor sampling rates, there are several important emerging applications in which high rates of hundreds to tens of thousands of sensor samples per second are common. These include civil and structural engineering applications, including continuous monitoring of physical structures, industrial equipment, and fluid pipelines; "Smart space" applications that continuously monitor sensors in a a space to support ubiquitous computing or security applications; and, scientific data gathering applications, such as outdoor acoustic monitoring systems for continuous habitat monitoring.

MACAQUE

This is an NSF-funded project to investigate the management of uncertainty in database systems. We are looking at probabilistic models and approximate query processing techniques in a variety of real world settings.

Query Processing In Sensor Networks (QPSN)

The goal of the QPSN project is to provide a declarative-query interface for collecting data from sensor networks. This approach greatly simplifies sensor network programming while still providing a power-efficient framework that is expressive enough for a wide variety of data collection tasks. See TinyDB for information on our prototype sensor network query processor implementation, as well as our recent papers on Model based data acqusition (VLDB '04), Event-detection in sensor networks (VLDB '05), Time-series modeling (EWSN '06), and Model-based views for databases (SIGMOD '06).

Haystack: The universal information client

Haystack is a tool designed to let every individual manage all of their information in the way that makes the most sense to them. By removing the arbitrary barriers created by applications only handling certain information "types", and recording only a fixed set of relationships defined by the developer, it aims to let users define whichever arrangements of, connections between, and views of information they find most effective.

People

Faculty

Administrative Assistant

Sheila Marian

Ph.D.

Firas Abuzaid
Leilani Battle
Anant Bhardwaj
Rachel Harding
Albert Kim
Yi Lu
Oscar Moll
Anil Shanbhag
Rebecca Taft
Manasi Vartak

Research Staff

Albert Carter (Staff Programmer, Big Data @ MIT)
Stavros Papadapolous (ISTC Researcher, and Visting Researcher)
Nesime Tatbul (ISTC Researcher, and Visting Researcher)

Postdoc

M.Eng

Evangelos Taratoris

Alumni

Daniel Abadi (Yale)
Ziawasch Abedjan
Peter Bailis (Stanford University)
Magdalena Balazinska (U. Wash.)
Joshua Blum
Daniel Bruckner (PhD Student, UC Berkeley)
Alvin Cheung (U. Washington)
Philippe Cudre-Mauroux (U. of Fribourg, Switzerland)
Carlo Curino (Microsoft Research)
Jennie Duggan (Northwestern University)
Aaron Elmore (University of Chicago)
Jakob Eriksson (U. Illinois at Chicago)
Stavros Harizopoulos
Alekh Jindal (Microsoft Research)
Evan Jones
Barzan Mozafari (U. Michigan, Ann Arbor)
Ryan Newton (U. of Indiana)
Arvind Thiagarajan (Twitter, Inc.)
Michael Farry (Charles River Analytics)
Miguel Ferraria
Thomer Gil
David Goehring (UC Berkeley)
Lewis Girod
Michael Gubanov
George Huo (Google)
Edmond Lau (Quora)
Umberto Malesci
Adam Marcus
Nirmesh Malviya
Yuan Mei (Facebook>
Todd Mostak (MapD, Inc)
Daniel Myers
Nizameddin Ordulu (MemSQL)
Alex Pagan
Aditya Parameswaran (UIUC)
Lev Popov (Facebook)
Elizabeth Reid (Apple)
Adam Seering (Vertica)
Aubrey Tatarowicz (Google)
Timur Tokmouline
Stephen Tu (PhD Student, UC Berkeley)
Aizana Turmukhametova (Tamr, Inc.)
Eugene Wu (Columbia)
Yang Zhang

Recent and Selected Publications

Comments? Corrections? Contact webmaster at db.csail.mit.edu