Overview
The database group at MIT conducts
research on all areas of database
systems and information management.
Projects range from the design of new
user interfaces and query languages to
low-level query execution issues, ranging
from design of new systems for database analytics and
main memory databases to query processing in
next generation pervasive and ubiquitous
environments, such as sensor networks,
wide area information systems, personal
databases, and the Web.
Professor Madden offers a class in Database Systems (6.830).
|
|
Projects |
|
Intel Science and Technology Center
in Big Data In the Big
Data ISTC, our mission is to produce new data management
systems and compute architectures for Big Data. Together, these
systems will help people process data that exceeds the scale,
rate, or sophistication of current data processing systems. We are
working to demonstrate the effectiveness of these solutions on
real applications in science, engineering, and medicine, making
our results broadly available through open sourcing.
|
|
|
DataHub In
the DataHub
project, we are building an experimental hosted platform
(GitHub-like) for organizing, managing, sharing, collaborating,
and making sense of data. The hosted platform provides easy to
use tools/interfaces for:
- Managing your data (ingestion, curation, sharing, collaboration)
- Using others' data (discovering, linking)
- Making sense of data (query, analytics, visualization)
|
|
|
CarTel
In CarTel,
we are building a system for managing
data in the face of intermittent and
variable connectivity. We are
focusing, in particular, on automotive
applications that involve high-rate
sensing of road, traffic, and
infrastructure conditions. The two
key technologies we are developing are
CafNet, a carry-and-forward network
stack, and a distributed,
signal-oriented, priority-dgriven
query processor.
|
|
|
RelationalCloud
In RelationalCloud,
we are investigating research
challenges to enable
Database-as-a-Service (DaaS)
within the Cloud Computing
paradigm. In particular, we are
focusing on the problems of (i)
characterizing workloads and
assigning them on different data
management solutions (ranging
from multi-tenant database, to
high-profile clustered
main-memory solutions) and (ii)
highly dynamic allocation of
resources to accomodate evolving
and bursty workloads in a
transparent manner. Our
long-term vision aims at
combining multiple dedicated
data management solutions behind
a unifying DaaS interface: "One
Data Service to manage them
all".
|
|
|
H-Store
The goal of the H-Store project is to investigate how recent
architectural and application trends affect the performance of online
transaction processing databases (such as those that back many
e-commerce sites, banks and reservation systems), and to study what
performance benefits would be possible with a complete redesign of
OLTP systems in light of these trends. Our idea is to build a main
memory system with a dramatically simplified concurrency control and
recovery model, which the goal of executing many times as many
transactions per second as existing databases that rely on logging,
expensive locking based conccurency control, and disk based recovery.
Our early results show that a simple prototype built from scratch
using modern assumptions can outperform current commercial DBMS
offerings by around a factor of 80 on OLTP workloads. We are currently
working to build a full-featured system that demonstrates these
performance wins in a more robust prototype.
|
|
|
StatusQuo StatusQuo is a new programming system for developing database
applications. Programmers often go at length to make their applications perform, such as using
stored procedures, rewriting their applications into map / reduce tasks or custom query languages, etc.
StatusQuo frees the programmers from doing any of that.
By leveraging program analysis techniques, the system optimizes applications and
makes them perform. You can now write as inefficient code as you like and
StatusQuo will automatically handle the rest for you.
|
|
|
Past Projects |
|
Qurk Qurk is a database that answers queries using people. Crowdsourcing platforms
such as Amazon's Mechanical Turk make it possible to organize crowd
workers to perform tasks like translation or image labelling on
demand. Building these workflows is challenging: how much should you
pay crowd workers? can you trust the output of each worker? How can
you coordinate workers to perform complicated high-level tasks? Qurk
helps you build crowd-powered data processing workflows using a
PIG-like language while tackling these challenges on your behalf.
|
|
|
C-Store C-Store is a
read-optimized relational DBMS that contrasts sharply with most
current systems, which are write-optimized. Among the many differences
in its design are: storage of data by column rather than by row,
careful coding and packing of objects into storage including main
memory during query processing, storing an overlapping collection of
column-oriented projections, rather than the current fare of tables
and indexes, a non-traditional implementation of transactions which
includes high availability and snapshot isolation for read-only
transactions, and the extensive use of bitmap indexes to complement
B-tree structures.
|
|
|
WaveScope WaveScope is a software
platform to make it easy to develop, deploy, and operate wireless
sensor networks that exhibit high data rates. In contrast to the
"first generation" of wireless sensor networks that are characterized
by relatively low sensor sampling rates, there are several important
emerging applications in which high rates of hundreds to tens of
thousands of sensor samples per second are common. These include civil
and structural engineering applications, including continuous
monitoring of physical structures, industrial equipment, and fluid
pipelines; "Smart space" applications that continuously monitor
sensors in a a space to support ubiquitous computing or security
applications; and, scientific data gathering applications, such as
outdoor acoustic monitoring systems for continuous habitat monitoring.
|
|
|
MACAQUE
This is an NSF-funded project to
investigate the management of
uncertainty in database systems. We
are looking at probabilistic models
and approximate query processing
techniques in a variety of real world
settings.
|
|
|
|
Haystack: The universal information client
Haystack is a tool designed to let
every individual manage all of their
information in the way that makes the
most sense to them. By removing the
arbitrary barriers created by
applications only handling certain
information "types", and recording
only a fixed set of relationships
defined by the developer, it aims to
let users define whichever
arrangements of, connections between,
and views of information they find
most effective.
|
|
|
People
Faculty
Administrative Assistant
Ph.D.
- Firas Abuzaid
- Leilani Battle
- Anant Bhardwaj
- Rachel Harding
- Albert Kim
- Yi Lu
- Oscar Moll
- Anil Shanbhag
- Rebecca Taft
- Manasi Vartak
Research Staff
- Albert Carter (Staff Programmer, Big Data @ MIT)
- Stavros Papadapolous (ISTC Researcher, and Visting Researcher)
- Nesime Tatbul (ISTC Researcher, and Visting Researcher)
Postdoc
M.Eng
|
Alumni
|
|
Recent and Selected Publications
|
|