MACAQUE - Managing Ambiguity and Complexity in Acqusitional Query Environments

The vision of ubiquitous computing promises to spread information technology throughout our lives. Though this vision can be compelling, it also threatens to overwhelm us with a flood of information, much of which is spurious, irrelevant, or misleading. Thus, the challenge of realizing this vision is separating the relevant, timely, and useful information out of this flood of data. The data management community has made significant progress towards achieving this goal, by providing tools that load and clean the data, languages and systems that can query the data and algorithms that mine the data for patterns and relationships that are of interest.

These efforts have largely been focused on mitigating data complexity once it has been captured and stored inside of a traditional computing infrastructure. In contrast, we propose a set of techniques designed to take an active role in managing this wealth of data by managing when, where, and with what frequency data is acquired from distributed information systems. There are many modern systems where the capability of local nodes to generate data far outstrips the resources available to transmit or store that data. Nodes in a sensor network, for example, typically have processors that run at several megahertz, with data collection hardware capable of collecting many kilosamples per second, but radios than only transmit kilobytes per second aggregate across all of the nodes in the network. Worse yet, these nodes are battery powered, and, when sampling at maximum rates, only have sufficient energy to last for a few days. In addition to limited resources, data from real world environments is often noisy, lossy, and hard to interpret. This noise and uncertainty can be misleading, particularly when the user is summarizing and aggregating data using a high- level language like SQL.

In the MACAQUE (for "Management of Ambiguity and Complexity in an Acquisitional QUery Environment") project, we are developing several sytems is designed to focus the resources of the computer system (e.g., network bandwidth or battery capacity) and attention of the user on capturing, refining, and interpreting portions of the data that are most relevant while de- emphasizing and decreasing the captured resolution of less relevant data. At the same time, we uses statistical and probabilistic techniques to identify data that is spurious, incorrect, or unreliable, and to infer missing data values.


Our effort on MACAQUE is divided into several related sub-projects, including:






This project is funded by the NSF award IIS-0448124.

