Overview

Uncertain data streams, where data is incomplete, imprecise, and even misleading, have been observed in a variety of environments. In many cases, the raw data collected is not directed queriable and hence, needs to undergo sophisticated query processing to derive useful high-level information. Also, feeding uncertain data streams directly to existing stream systems can produce results of unknown quality. The goal of this project is to design and develop a stream processing system that captures data uncertainty from data collection to query processing to final result generation. This project takes a principled approach grounded in probability and statistical theory to support uncertainty as a first-class citizen, and efficiently integrate this approach into high-volume stream processing. Specifically, the project has two following main contributions:

  1. To capture uncertainty of raw data streams emanating from sensing devices.
  2. To capture uncertainty as data propagates through various query processing operators.