The overall research goal of our proposal is to design, develop, and evaluate a data management system that provides fundamental support for query processing on large scientific datasets, which often involve much uncertainty in the content of data as well as in the query evaluation process. Our work distinguishes from existing work in two key aspects: (1) It supports both relational algebra and array algebra for query processing, where the uncertainty in query processing arises from the uncertainty of data content. Our support of both algebras entails broader applicability in scientific domains. (2) In real-world applications, uncertainty in query processing arises not only from the data itself, but also from the uncertainty in the user data interest, especially from the novice users of large scientific databases. To address the second issue, we propose interactive data exploration where the database management system navigates the user in a large data space and learns the user interest based on the user feedback on a small number of database samples.

Funding Sources

We gratefully acknowledge the funding provided by the following agency:

National Science Foundation.

  1. CAREER: Efficient, Robust RFID Stream Processing for Tracking and Monitoring. Yanlei Diao (PI). National Science Foundation IIS-0746939.
  2. III-COR-small: Capturing Data Uncertainty in High-Volume Stream Processing. Yanlei Diao (PI) and Anna Liu (co-PI). National Science Foundation IIS-0812347.
  3. III-COR-small: High-Performance Complex Processing of Continuous Uncertain Data. Yanlei Diao (PI) and Anna Liu (co-PI). National Science Foundation IIS-1218524.

Last Update: August 2016