| | Overview 
			| The database group at MIT conducts
			      research on all areas of database
			      systems and information management.
			      Projects range from the design of new
			      user interfaces and query languages to
			      low-level query execution issues, ranging
			      from design of new systems for database analytics and
			      main memory databases to query processing in
			      next generation pervasive and ubiquitous
			      environments, such as sensor networks,
			      wide area information systems, personal
			      databases, and the Web. 
				Professor Madden offers a class in   Database Systems (6.830).
				
			      
			     |  |  | Projects |  | 
			      
				|   | Intel Science and Technology Center
				  in Big Data  In the Big
					Data ISTC, our mission is to produce new data management
				      systems and compute architectures for Big Data. Together, these
				      systems will help people process data that exceeds the scale,
				      rate, or sophistication of current data processing systems. We are
				      working to demonstrate the effectiveness of these solutions on
				      real applications in science, engineering, and medicine, making
				      our results broadly available through open sourcing.
				   |  |  | 
			      
				|   | DataHub  In
				      the DataHub
				      project, we are building an experimental hosted platform
				      (GitHub-like) for organizing, managing, sharing, collaborating,
				      and making sense of data.  The hosted platform provides easy to
				      use tools/interfaces for:
				       
					Managing your data (ingestion, curation, sharing, collaboration)
					Using others' data (discovering, linking)
					Making sense of data (query, analytics, visualization)
				       |  |  | 
			      
				|   | CarTel 
				      In CarTel,
				      we are building a system for managing
				      data in the face of intermittent and
				      variable connectivity.  We are
				      focusing, in particular, on automotive
				      applications that involve high-rate
				      sensing of road, traffic, and
				      infrastructure conditions.  The two
				      key technologies we are developing are
				      CafNet, a carry-and-forward network
				      stack, and a distributed,
				      signal-oriented, priority-dgriven
				      query processor.
				      
				   |  |  | 
       	       	       	      
       	       	       	       	|   | RelationalCloud 
       	       	       	       	      In RelationalCloud,
       	       	       	       	      we are investigating research
       	       	       	       	      challenges to enable
       	       	       	       	      Database-as-a-Service (DaaS)
       	       	       	       	      within the Cloud Computing
       	       	       	       	      paradigm. In particular, we are
       	       	       	       	      focusing on the problems of (i)
       	       	       	       	      characterizing workloads and
       	       	       	       	      assigning them on different data
       	       	       	       	      management solutions (ranging
       	       	       	       	      from multi-tenant database, to
       	       	       	       	      high-profile clustered
       	       	       	       	      main-memory solutions) and (ii)
       	       	       	       	      highly dynamic allocation of
       	       	       	       	      resources to accomodate evolving
       	       	       	       	      and bursty workloads in a
       	       	       	       	      transparent manner. Our
       	       	       	       	      long-term vision aims at
       	       	       	       	      combining multiple dedicated
       	       	       	       	      data management solutions behind
       	       	       	       	      a unifying DaaS interface: "One
       	       	       	       	      Data Service to manage them
       	       	       	       	      all".
       	       	       	       	   |  |  | 
			      
				|   | H-Store 
				      The goal of the H-Store project is to investigate how recent
				      architectural and application trends affect the performance of online
				      transaction processing databases (such as those that back many
				      e-commerce sites, banks and reservation systems), and to study what
				      performance benefits would be possible with a complete redesign of
				      OLTP systems in light of these trends. Our idea is to build a main
				      memory system with a dramatically simplified concurrency control and
				      recovery model, which the goal of executing many times as many
				      transactions per second as existing databases that rely on logging,
				      expensive locking based conccurency control, and disk based recovery.
				      Our early results show that a simple prototype built from scratch
				      using modern assumptions can outperform current commercial DBMS
				      offerings by around a factor of 80 on OLTP workloads. We are currently
				      working to build a full-featured system that demonstrates these
				      performance wins in a more robust prototype.
				   |  |  | 
			      
				|   | StatusQuo  StatusQuo is a new programming system for developing database 
				      applications. Programmers often go at length to make their applications perform, such as using
				      stored procedures, rewriting their applications into map / reduce tasks or custom query languages, etc. 
				      StatusQuo frees the programmers from doing any of that.
				      By leveraging program analysis techniques, the system optimizes applications and 
				      makes them perform. You can now write as inefficient code as you like and 
				      StatusQuo will automatically handle the rest for you.
				   |  |  |  |  | Past Projects |  | 
			      
				|   | Qurk  Qurk  is a database that answers queries using people.  Crowdsourcing platforms
				      such as Amazon's Mechanical Turk make it possible to organize crowd
				      workers to perform tasks like translation or image labelling on
				      demand. Building these workflows is challenging: how much should you
				      pay crowd workers? can you trust the output of each worker? How can
				      you coordinate workers to perform complicated high-level tasks? Qurk
				      helps you build crowd-powered data processing workflows using a
				      PIG-like language while tackling these challenges on your behalf.
				   |  |  | 
			      
				|   | C-Store  C-Store is a
				      read-optimized relational DBMS that contrasts sharply with most
				      current systems, which are write-optimized. Among the many differences
				      in its design are: storage of data by column rather than by row,
				      careful coding and packing of objects into storage including main
				      memory during query processing, storing an overlapping collection of
				      column-oriented projections, rather than the current fare of tables
				      and indexes, a non-traditional implementation of transactions which
				      includes high availability and snapshot isolation for read-only
				      transactions, and the extensive use of bitmap indexes to complement
				      B-tree structures.
				   |  |  | 
			      
				|   | WaveScope  WaveScope is a software
				      platform to make it easy to develop, deploy, and operate wireless
				      sensor networks that exhibit high data rates. In contrast to the
				      "first generation" of wireless sensor networks that are characterized
				      by relatively low sensor sampling rates, there are several important
				      emerging applications in which high rates of hundreds to tens of
				      thousands of sensor samples per second are common. These include civil
				      and structural engineering applications, including continuous
				      monitoring of physical structures, industrial equipment, and fluid
				      pipelines; "Smart space" applications that continuously monitor
				      sensors in a a space to support ubiquitous computing or security
				      applications; and, scientific data gathering applications, such as
				      outdoor acoustic monitoring systems for continuous habitat monitoring.
				      
				   |  |  | 
			      
				|   | MACAQUE 
				      This is an NSF-funded project to
				      investigate the management of
				      uncertainty in database systems.  We
				      are looking at probabilistic models
				      and approximate query processing
				      techniques in a variety of real world
				      settings.
				   |  |  |  |  | 
			      
				|   | Haystack: The universal information client  
				Haystack is a tool designed to let
				every individual manage all of their
				information in the way that makes the
				most sense to them. By removing the
				arbitrary barriers created by
				applications only handling certain
				information "types", and recording
				only a fixed set of relationships
				defined by the developer, it aims to
				let users define whichever
				arrangements of, connections between,
				and views of information they find
				most effective.
				   |  | 
 | | People 
		    | Faculty
			    Administrative Assistant Ph.D.
			       
		               Firas Abuzaid
			       Leilani Battle
			       Anant Bhardwaj
		               Rachel Harding
                               Albert Kim
		               Yi Lu
		               Oscar Moll
			       Anil Shanbhag
                               Rebecca Taft
			       Manasi Vartak
			       Research Staff
			     
                               Albert Carter (Staff Programmer, Big Data @ MIT)
                               Stavros Papadapolous (ISTC Researcher, and Visting Researcher)
			       Nesime Tatbul (ISTC Researcher, and Visting Researcher)
              
			     Postdoc
			     M.Eng
				 | Alumni |  |  | Recent and Selected Publications | 
 |