Detached Provenance Analysis

DSpace Repositorium (Manakin basiert)


Dateien:

Zitierfähiger Link (URI): http://hdl.handle.net/10900/99433
http://nbn-resolving.de/urn:nbn:de:bsz:21-dspace-994336
http://dx.doi.org/10.15496/publikation-40814
Dokumentart: Dissertation
Erscheinungsdatum: 2020-03-31
Sprache: Englisch
Fakultät: 7 Mathematisch-Naturwissenschaftliche Fakultät
Fachbereich: Informatik
Gutachter: Grust, Torsten (Prof. Dr.)
Tag der mündl. Prüfung: 2020-03-05
DDC-Klassifikation: 004 - Informatik
Schlagworte: Datenbank , SQL , Datenherkunft
Freie Schlagwörter:
Data Provenance
Lizenz: http://tobias-lib.uni-tuebingen.de/doku/lic_mit_pod.php?la=de http://tobias-lib.uni-tuebingen.de/doku/lic_mit_pod.php?la=en
Gedruckte Kopie bestellen: Print-on-Demand
Zur Langanzeige

Abstract:

Data provenance is the research field of the algorithmic derivation of the source and processing history of data. In this work, the derivation of Where- and Why-provenance in sub-cell-level granularity is pursued for a rich SQL dialect. For example, we support the provenance analysis for individual elements of nested rows and/or arrays. The SQL dialect incorporates window functions and correlated subqueries. We accomplish this goal using a novel method called detached provenance analysis. This method carries out a SQL-level rewrite of any user query Q, yielding (Q1, Q2). Employing two queries facilitates a low-invasive provenance analysis, i.e. both queries can be evaluated using an unmodified DBMS as backend. The queries implement a split of responsibilities: Q1 carries out a runtime analysis and Q2 derives the actual data provenance. One drawback of this method is that a synchronization overhead between Q1 and Q2 is induced. Experiments quantify the overheads based on the TPC-H benchmark and the PostgreSQL DBMS. A second set of experiments carried out in row–level granularity compares our approach with the PERM approach (as described by B. Glavic et al.). The aggregated results show that basic queries (typically, a single SFW expression with aggregations) perform slightly better in the PERM approach while complex queries (nested SFW expressions and correlated subqueries) perform considerably better in our approach.

Das Dokument erscheint in: