Detached Provenance Analysis

DSpace Repository


Dokumentart: PhDThesis
Date: 2020-03-31
Language: English
Faculty: 7 Mathematisch-Naturwissenschaftliche Fakultät
Department: Informatik
Advisor: Grust, Torsten (Prof. Dr.)
Day of Oral Examination: 2020-03-05
DDC Classifikation: 004 - Data processing and computer science
Keywords: Datenbank , SQL , Datenherkunft
Other Keywords:
Data Provenance
Order a printed copy: Print-on-Demand
Show full item record


Data provenance is the research field of the algorithmic derivation of the source and processing history of data. In this work, the derivation of Where- and Why-provenance in sub-cell-level granularity is pursued for a rich SQL dialect. For example, we support the provenance analysis for individual elements of nested rows and/or arrays. The SQL dialect incorporates window functions and correlated subqueries. We accomplish this goal using a novel method called detached provenance analysis. This method carries out a SQL-level rewrite of any user query Q, yielding (Q1, Q2). Employing two queries facilitates a low-invasive provenance analysis, i.e. both queries can be evaluated using an unmodified DBMS as backend. The queries implement a split of responsibilities: Q1 carries out a runtime analysis and Q2 derives the actual data provenance. One drawback of this method is that a synchronization overhead between Q1 and Q2 is induced. Experiments quantify the overheads based on the TPC-H benchmark and the PostgreSQL DBMS. A second set of experiments carried out in row–level granularity compares our approach with the PERM approach (as described by B. Glavic et al.). The aggregated results show that basic queries (typically, a single SFW expression with aggregations) perform slightly better in the PERM approach while complex queries (nested SFW expressions and correlated subqueries) perform considerably better in our approach.

This item appears in the following Collection(s)